SoundMapper4D©
Broadbent setting up audio recorders in 2015 in the U of Alabama arboretum for bird identification and localization work
Launch SoundMapper4D Web App
SoundMapper4D: Passive Acoustic 4D Bird Detection, Identification and Localization from GPS-Time-Synchronized Recorder Arrays
SoundMapper4D combines GPS-millisecond-time-stamped multi-position audio recordings with AI-based bird call identification to map species and movement through forest study areas in four dimensions (X, Y, Z, and time). Using arrays of Wildlife Acoustics SM3 recorders, the system cross-correlates all microphone pairs to estimate Time Differences of Arrival (TDOA), multilaterates 3D bird positions via template-matched normalized cross-correlation, and identifies species using BirdNET AI — achieving sub-meter 3D localization accuracy (CEP50 = 0.55 m) and 86% species identification accuracy (top-3) without any prior knowledge of where birds might be or what birds they are
SoundMapper4D: Passive Acoustic 4D Bird Detection, Identification and Localization from GPS-Time-Synchronized Recorder Arrays
SoundMapper4D combines GPS-millisecond-time-stamped multi-position audio recordings with AI-based bird call identification to map species and movement through forest study areas in four dimensions (X, Y, Z, and time). Using arrays of Wildlife Acoustics SM3 recorders, the system cross-correlates all microphone pairs to estimate Time Differences of Arrival (TDOA), multilaterates 3D bird positions via template-matched normalized cross-correlation, and identifies species using BirdNET AI — achieving sub-meter 3D localization accuracy (CEP50 = 0.55 m) and 86% species identification accuracy (top-3) without any prior knowledge of where birds might be or what birds they are
Current focus areas include the Ordway-Swisher Biological Station (NEON site) and the San Felasco Big Plot. The project was initiated at the University of Alabama in 2015 (see news link here), had additional support from IFAS start-up funds in 2017 and continues to advance with SPEC Lab funding in both algorithmic approach and field data collection as a long-term SPECLab project.
Citation: Broadbent, E.N., Almeyda Zambrano, A.M. SoundMapper4D: Passive Acoustic 4D Bird Detection, Identification and Localization from GPS-Time-Synchronized Recorder Arrays. Accessed [Version] at https://www.speclab.org/soundmapper4d on [Date].
Validation
Validated at OSBS on 54 GPS-surveyed playback events across 35 species. See figures below.
Metric Calibration Validation
Detection rate 97% (35/36) 89% (16/18)
Localization CEP50 0.56 m 0.55 m
Species ID (top-1) 69% 69%
Species ID (top-3) 83% 81%
Vertical RMSE 1.26 m 1.28 m
Species ID (top-1) —— 76%
Species ID (top-3) —— 86%
Validation data is available for download in the app (3-minute selected segment) or in full here (5 GB, 60 minutes).
Beta Access Notice
All workflows are under active development and ongoing validation with additional field sites and recording configurations.
License
Beta preview use only. Commercial use is not permitted. Duplication, redistribution, or reverse engineering of the system is prohibited. Cite SoundMapper4D © for any use.
Beta Preview Notice: SoundMapper4D© is under active development (v3.2, April 2026). All outputs — including species identifications, localizations, and associated metrics — are provided as-is with no warranty of accuracy or completeness. The authors and affiliated institutions are not responsible for any errors, omissions, misidentifications, or any decisions, actions, or consequences arising from the use of this system or its outputs. Workflows, algorithms, and validation are subject to change. Commercial use is not permitted. Duplication, redistribution, or reverse engineering of the system is prohibited.
Versions
v3.2 - 04/01/26 - major updates to algorithms, workflows, and GUIs
v3.1 - 03/01/26 - initial public facing release
Citation: Broadbent, E.N., Almeyda Zambrano, A.M. SoundMapper4D: Passive Acoustic 4D Bird Detection, Identification and Localization from GPS-Time-Synchronized Recorder Arrays. Accessed [Version] at https://www.speclab.org/soundmapper4d on [Date].
Validation
Validated at OSBS on 54 GPS-surveyed playback events across 35 species. See figures below.
Metric Calibration Validation
Detection rate 97% (35/36) 89% (16/18)
Localization CEP50 0.56 m 0.55 m
Species ID (top-1) 69% 69%
Species ID (top-3) 83% 81%
Vertical RMSE 1.26 m 1.28 m
Species ID (top-1) —— 76%
Species ID (top-3) —— 86%
Validation data is available for download in the app (3-minute selected segment) or in full here (5 GB, 60 minutes).
Beta Access Notice
All workflows are under active development and ongoing validation with additional field sites and recording configurations.
License
Beta preview use only. Commercial use is not permitted. Duplication, redistribution, or reverse engineering of the system is prohibited. Cite SoundMapper4D © for any use.
Beta Preview Notice: SoundMapper4D© is under active development (v3.2, April 2026). All outputs — including species identifications, localizations, and associated metrics — are provided as-is with no warranty of accuracy or completeness. The authors and affiliated institutions are not responsible for any errors, omissions, misidentifications, or any decisions, actions, or consequences arising from the use of this system or its outputs. Workflows, algorithms, and validation are subject to change. Commercial use is not permitted. Duplication, redistribution, or reverse engineering of the system is prohibited.
Versions
v3.2 - 04/01/26 - major updates to algorithms, workflows, and GUIs
v3.1 - 03/01/26 - initial public facing release
## Abstract
SoundMapper4D is a computational framework for localizing bird vocalizations in three-dimensional space and time (4D) using GPS-synchronized arrays of acoustic recorders (e.g. Wildlife Acoustics SM3/Song Meter Micro, AudioMoth, or any recorder producing timestamped WAV files). The system combines template-matched normalized cross-correlation (NCC) for per-microphone arrival time estimation, pairwise Time Difference of Arrival (TDOA) multilateration for 3D position estimation, and BirdNET AI for automated species identification. Validated on 54 GPS-surveyed playback events across 35 species at the Ordway-Swisher Biological Station (OSBS), the system achieves sub-meter localization accuracy (CEP50 = 0.56 m calibration, 0.49 m validation) with 97.2% calibration detection rate (88.9% validation) and 76.2% top-1 species identification accuracy (85.7% top-3). The framework is deployed as both a local desktop application and a cloud-hosted web service.
## 1. Introduction
Passive acoustic monitoring (PAM) of avian biodiversity requires not only detecting and identifying bird calls but also determining the spatial position of vocalizing individuals. Traditional point-count surveys provide presence/absence data but lack precise spatial information. Acoustic localization arrays can resolve source positions by exploiting time differences of arrival (TDOA) across spatially distributed microphones.
SoundMapper4D addresses the full pipeline from raw multi-channel field recordings to georeferenced 4D bird maps: **detection** (finding when and where a bird is calling), **localization** (estimating XYZ position via multilateration), and **identification** (determining species via deep learning). The system is designed for GPS-synchronized acoustic recorder arrays deployed in field settings, and has been validated with Wildlife Acoustics SM3 recorders.
### 1.1 Study Sites
Current focus areas include:
- **Ordway-Swisher Biological Station (OSBS)**, a NEON site in north-central Florida
- **San Felasco Big Plot**, a long-term ecological research site
## 2. System Architecture
### 2.1 Hardware Configuration
The recording array consists of any number of acoustic recorders, each contributing one or more microphone channels at known 3D positions. In the OSBS validation experiment, 8 Wildlife Acoustics Song Meter 3 (SM3) units were used, each containing two independent microphone channels (Left and Right) at different vertical positions on the same mast, yielding 16 independent receivers with 120 unique microphone pairs (N(N-1)/2 where N=16).
Recorders must be GPS-synchronized to millisecond precision, ensuring that audio segments loaded from the same wall-clock window across different units are time-aligned for valid TDOA computation. For SM3 recorders, GPS sync can be coded to be indicated by the `$` separator in the filename convention. For other recorder types, synchronization method (GPS, NTP, hardware trigger, etc.) should be documented in the mic position CSV.
### 2.2 Software Pipeline
SoundMapper4D supports two pipeline architectures depending on the use case:
#### Production Pipeline (Detection-First — no ground truth)
The production pipeline (`production.py`) uses a **detection-first** architecture optimized for quality:
```
Phase 1: DETECT & IDENTIFY
Run BirdNET at maximum sensitivity (1.5) on every mic channel
→ Overlap 2.5s (0.5s step for 3s analysis windows)
→ Low confidence threshold (0.05) — false positives filtered by Phase 2
→ Per-channel detections with species, confidence, amplitude, timestamp
Phase 2: CLUSTER & VOTE
Group same-species detections across channels (DetectionClusterer)
→ Require ≥3 channels detecting same species within 0.6s
→ WeightedSpeciesVote consensus: weight = confidence × 10^(amplitude_dB/20)
→ Multi-channel events with voted species identity
Phase 3: LOCALIZE
For each event, extract audio from ALL available channels
→ Dual TDOA: run both NCC and GCC-PHAT, keep method with stronger correlations
→ Multi-start L-BFGS-B + Nelder-Mead (centroid + every mic as initial guesses)
→ 3D position with uncertainty estimate
```
The detection-first approach has several advantages over blind time-window scanning:
- Only localizes confirmed vocalizations (no wasted computation on silent windows)
- Species identity and confidence are known before localization
- Cross-channel agreement filters false positives naturally
- All channels contribute to TDOA regardless of whether BirdNET detected the bird on that channel
#### Cal/Val Pipeline (Template Matching — with ground truth)
The cal/val pipeline (`calval.py`) uses template matching against known bird library clips for accuracy assessment:
```
Phase 1: Template Matching (per-mic arrival times)
Bird library clip (44.1 kHz) → resample to 24 kHz
→ Bandpass filter (800-9000 Hz, 4th-order Butterworth)
→ Per-window Normalized Cross-Correlation (NCC)
→ Top-K peak detection (K=8) per microphone
→ Cross-mic consistency clustering (0.5s spread)
→ Iterative outlier rejection (3ms threshold, 5 rounds)
Phase 2: Speed of Sound (SOS) Calibration
Grid search 330-355 m/s (0.5 m/s step)
→ Multilaterate each event at each SOS
→ Maximize count of events within 5m of ground truth
→ Optimal SOS: 343.0 m/s (Florida January, ~15-20°C)
Phase 3: 3D Multilateration + Species ID
TDOA matrix from per-mic arrival differences
→ L-BFGS-B + Nelder-Mead optimization
→ 15ms residual gate (reject grossly inconsistent solutions)
→ BirdNET species identification on 9s audio segment
```
## 3. Methods
### 3.1 Template-Matched Arrival Time Estimation
For each playback event, the known bird library clip is cross-correlated against each microphone's recording using properly normalized cross-correlation (NCC). Unlike global normalization, which washes out signals in long search windows (140 seconds), per-window NCC divides by the local standard deviation at each lag position using running cumulative sums for O(n) computational complexity:
```
NCC[k] = Σ(rec[k+i] · src_zm[i]) / (M · σ_rec[k] · σ_src)
```
where `σ_rec[k]` is computed from running cumulative sums:
```
win_sum = cumsum[k+M] - cumsum[k]
win_sum_sq = cumsum_sq[k+M] - cumsum_sq[k]
σ_rec[k] = sqrt(win_sum_sq/M - (win_sum/M)²)
```
A critical implementation detail: library clips recorded at 44.1 kHz must be resampled to match the recorder's sample rate (24 kHz for SM3; verify for other recorder types) before cross-correlation. Failure to resample produces meaningless NCC values and near-zero detection rates.
### 3.2 Bandpass Filtering
A 4th-order Butterworth bandpass filter (800-9000 Hz) is applied to both the template and recording before NCC computation. This was the single most impactful improvement, increasing the detection rate from 20/36 to 36/36 calibration events by rejecting low-frequency wind noise and high-frequency environmental interference that diluted NCC scores.
### 3.3 Multi-Peak Cross-Mic Consistency Clustering
In bird-rich forests, the NCC may match both the playback speaker and real birds of the same species responding to the playback. To disambiguate:
1. For each microphone, the top K=8 NCC peaks above threshold (0.08) are retained
2. For events with insufficient matches, a retry at lower threshold (0.04) is performed
3. All peaks across all microphones are clustered by temporal proximity (maximum 0.5s spread)
4. The cluster with the most microphones is selected as the playback speaker
### 3.4 Iterative Outlier Rejection
After initial multilateration, per-microphone TDOA residuals are computed against the estimated source position. The microphone with the worst median residual is removed if it exceeds an adaptive threshold (3 ms for arrays with >5 receivers, 5 ms for smaller arrays). This process repeats for up to 5 rounds, provided at least 3 microphones remain. A final residual gate (15 ms for >5 receivers, 25 ms for smaller arrays) rejects grossly inconsistent solutions.
### 3.5 Multilateration
Source position is estimated by minimizing the weighted TDOA residual squared error using bounded optimization (L-BFGS-B with ±200m XY and ±50m Z bounds, refined by Nelder-Mead). The production pipeline uses **multi-start optimization** — running from the array centroid and every mic position as initial guesses — to escape local minima in the non-convex TDOA cost surface. The speed of sound is calibrated via grid search over 330-355 m/s using the ground-truth calibration objective (maximize events within 5m of known positions).
### 3.6 Species Identification
**Production pipeline (detection-first):** BirdNET runs on every mic channel at maximum sensitivity (1.5) and overlap (2.5s) before localization. Per-channel detections are clustered across channels, and the consensus species is determined by `WeightedSpeciesVote`:
```
weight = BirdNET_confidence × 10^(amplitude_dB / 20)
```
This weights louder, more-confident detections proportionally higher, making identification robust even when some channels produce low-confidence or incorrect identifications. Cross-channel agreement (≥3 channels) also filters false positives.
**Cal/Val pipeline (template matching):** For each localized event, a 9-second audio segment is extracted from the closest microphone at the template-matched arrival time and analyzed by BirdNET (via birdnetlib). For low-confidence detections (<0.15), up to 3 additional microphones are tried as fallback (top-4 closest mics by estimated position, using 12-second segments for increased context).
## 4. Validation
### 4.1 Dataset
The OSBS validation dataset consists of 54 GPS-surveyed playback events (36 calibration, 18 validation) across 35 bird species. Playback clips were broadcast from a portable speaker at differentially-corrected GPS positions (±10 cm horizontal precision), and recorded simultaneously by 8 Wildlife Acoustics SM3 recorders (16 microphone channels) spanning an approximately 50×80m array.
### 4.2 Results

*Figure 1: Estimated versus true bird positions for all 54 playback events. Open markers: ground truth. Filled markers: estimated positions. Lines connect each true-estimated pair. Red X: failed localizations (2 of 54). Calibration events in blue, validation in orange.*

*Figure 2: Distribution of horizontal and 3D localization errors for all successfully localized events. Median horizontal error: 0.55m. 90th percentile: 13.2m.*
**Table 1: Summary validation metrics (April 2026)**
| Metric | Calibration (N=36) | Validation (N=18) | Combined (N=54) |
|--------|-------------------|-------------------|-----------------|
| Detection rate | 97.2% (35/36) | 88.9% (16/18) | 94.4% (51/54) |
| CEP50 (horizontal) | 0.56 m | 0.49 m | 0.55 m |
| CEP90 (horizontal) | 12.77 m | 11.06 m | 13.22 m |
| RMSE horizontal | 7.24 m | 6.89 m | 7.13 m |
| RMSE vertical | 1.26 m | 1.28 m | 1.27 m |
| RMSE 3D | 7.35 m | 7.01 m | 7.24 m |
| Bias X (East) | +0.57 m | -0.72 m | +0.16 m |
| Bias Y (North) | -1.02 m | +0.49 m | -0.54 m |
| Bias Z (Up) | -0.36 m | -0.75 m | -0.48 m |
| Within 1 m | 24/35 (69%) | 11/16 (69%) | 35/51 (69%) |
| Within 5 m | 29/35 (83%) | 13/16 (81%) | 42/51 (82%) |
| Speed of sound | 343.0 m/s | (from calibration) | — |
**Table 2: Species identification accuracy**
| Metric | All Events (N=42) |
|--------|-------------------|
| BirdNET detection rate | 100% (42/42) |
| Top-1 species accuracy | 76.2% (32/42) |
| Top-3 species accuracy | 85.7% (36/42) |
High-confidence correct identifications include American Crow (0.99), Bachman's Sparrow (1.00), Mourning Dove (1.00), Red-shouldered Hawk (1.00), Red-headed Woodpecker (1.00), Northern Cardinal (0.98), Brown-headed Nuthatch (1.00), and Tufted Titmouse (0.84-0.98). Misidentifications primarily occur for species where the playback was too quiet at the microphone distance or where BirdNET confuses acoustically similar species (e.g., Northern Bobwhite misidentified as Mallard, Turkey Vulture as Acadian Flycatcher).
## 5. Implementation
### 5.1 Software Stack
- **Core**: Python 3.12, NumPy, SciPy (optimization, signal processing)
- **Species ID**: BirdNET via birdnetlib + TensorFlow
- **Web**: Flask + gunicorn, Leaflet.js for mapping
- **Deployment**: Docker container on Fly.io with persistent volume storage
### 5.2 Computational Performance
On a 112-core machine with 4.9 GB of recordings pre-loaded into RAM:
- Phase 1 (template matching): ~30 seconds for 36 events (parallel)
- Phase 2 (SOS grid search): ~2 minutes (51 SOS values × 36 events)
- Phase 3 (final localization): ~10 seconds
- Species identification: ~3 minutes (42 events, sequential BirdNET)
### 5.3 Availability
- **Web application**: https://soundmapper4d.fly.dev/
- **Project page**: https://www.speclab.org/soundmapper4d
SoundMapper4D is a computational framework for localizing bird vocalizations in three-dimensional space and time (4D) using GPS-synchronized arrays of acoustic recorders (e.g. Wildlife Acoustics SM3/Song Meter Micro, AudioMoth, or any recorder producing timestamped WAV files). The system combines template-matched normalized cross-correlation (NCC) for per-microphone arrival time estimation, pairwise Time Difference of Arrival (TDOA) multilateration for 3D position estimation, and BirdNET AI for automated species identification. Validated on 54 GPS-surveyed playback events across 35 species at the Ordway-Swisher Biological Station (OSBS), the system achieves sub-meter localization accuracy (CEP50 = 0.56 m calibration, 0.49 m validation) with 97.2% calibration detection rate (88.9% validation) and 76.2% top-1 species identification accuracy (85.7% top-3). The framework is deployed as both a local desktop application and a cloud-hosted web service.
## 1. Introduction
Passive acoustic monitoring (PAM) of avian biodiversity requires not only detecting and identifying bird calls but also determining the spatial position of vocalizing individuals. Traditional point-count surveys provide presence/absence data but lack precise spatial information. Acoustic localization arrays can resolve source positions by exploiting time differences of arrival (TDOA) across spatially distributed microphones.
SoundMapper4D addresses the full pipeline from raw multi-channel field recordings to georeferenced 4D bird maps: **detection** (finding when and where a bird is calling), **localization** (estimating XYZ position via multilateration), and **identification** (determining species via deep learning). The system is designed for GPS-synchronized acoustic recorder arrays deployed in field settings, and has been validated with Wildlife Acoustics SM3 recorders.
### 1.1 Study Sites
Current focus areas include:
- **Ordway-Swisher Biological Station (OSBS)**, a NEON site in north-central Florida
- **San Felasco Big Plot**, a long-term ecological research site
## 2. System Architecture
### 2.1 Hardware Configuration
The recording array consists of any number of acoustic recorders, each contributing one or more microphone channels at known 3D positions. In the OSBS validation experiment, 8 Wildlife Acoustics Song Meter 3 (SM3) units were used, each containing two independent microphone channels (Left and Right) at different vertical positions on the same mast, yielding 16 independent receivers with 120 unique microphone pairs (N(N-1)/2 where N=16).
Recorders must be GPS-synchronized to millisecond precision, ensuring that audio segments loaded from the same wall-clock window across different units are time-aligned for valid TDOA computation. For SM3 recorders, GPS sync can be coded to be indicated by the `$` separator in the filename convention. For other recorder types, synchronization method (GPS, NTP, hardware trigger, etc.) should be documented in the mic position CSV.
### 2.2 Software Pipeline
SoundMapper4D supports two pipeline architectures depending on the use case:
#### Production Pipeline (Detection-First — no ground truth)
The production pipeline (`production.py`) uses a **detection-first** architecture optimized for quality:
```
Phase 1: DETECT & IDENTIFY
Run BirdNET at maximum sensitivity (1.5) on every mic channel
→ Overlap 2.5s (0.5s step for 3s analysis windows)
→ Low confidence threshold (0.05) — false positives filtered by Phase 2
→ Per-channel detections with species, confidence, amplitude, timestamp
Phase 2: CLUSTER & VOTE
Group same-species detections across channels (DetectionClusterer)
→ Require ≥3 channels detecting same species within 0.6s
→ WeightedSpeciesVote consensus: weight = confidence × 10^(amplitude_dB/20)
→ Multi-channel events with voted species identity
Phase 3: LOCALIZE
For each event, extract audio from ALL available channels
→ Dual TDOA: run both NCC and GCC-PHAT, keep method with stronger correlations
→ Multi-start L-BFGS-B + Nelder-Mead (centroid + every mic as initial guesses)
→ 3D position with uncertainty estimate
```
The detection-first approach has several advantages over blind time-window scanning:
- Only localizes confirmed vocalizations (no wasted computation on silent windows)
- Species identity and confidence are known before localization
- Cross-channel agreement filters false positives naturally
- All channels contribute to TDOA regardless of whether BirdNET detected the bird on that channel
#### Cal/Val Pipeline (Template Matching — with ground truth)
The cal/val pipeline (`calval.py`) uses template matching against known bird library clips for accuracy assessment:
```
Phase 1: Template Matching (per-mic arrival times)
Bird library clip (44.1 kHz) → resample to 24 kHz
→ Bandpass filter (800-9000 Hz, 4th-order Butterworth)
→ Per-window Normalized Cross-Correlation (NCC)
→ Top-K peak detection (K=8) per microphone
→ Cross-mic consistency clustering (0.5s spread)
→ Iterative outlier rejection (3ms threshold, 5 rounds)
Phase 2: Speed of Sound (SOS) Calibration
Grid search 330-355 m/s (0.5 m/s step)
→ Multilaterate each event at each SOS
→ Maximize count of events within 5m of ground truth
→ Optimal SOS: 343.0 m/s (Florida January, ~15-20°C)
Phase 3: 3D Multilateration + Species ID
TDOA matrix from per-mic arrival differences
→ L-BFGS-B + Nelder-Mead optimization
→ 15ms residual gate (reject grossly inconsistent solutions)
→ BirdNET species identification on 9s audio segment
```
## 3. Methods
### 3.1 Template-Matched Arrival Time Estimation
For each playback event, the known bird library clip is cross-correlated against each microphone's recording using properly normalized cross-correlation (NCC). Unlike global normalization, which washes out signals in long search windows (140 seconds), per-window NCC divides by the local standard deviation at each lag position using running cumulative sums for O(n) computational complexity:
```
NCC[k] = Σ(rec[k+i] · src_zm[i]) / (M · σ_rec[k] · σ_src)
```
where `σ_rec[k]` is computed from running cumulative sums:
```
win_sum = cumsum[k+M] - cumsum[k]
win_sum_sq = cumsum_sq[k+M] - cumsum_sq[k]
σ_rec[k] = sqrt(win_sum_sq/M - (win_sum/M)²)
```
A critical implementation detail: library clips recorded at 44.1 kHz must be resampled to match the recorder's sample rate (24 kHz for SM3; verify for other recorder types) before cross-correlation. Failure to resample produces meaningless NCC values and near-zero detection rates.
### 3.2 Bandpass Filtering
A 4th-order Butterworth bandpass filter (800-9000 Hz) is applied to both the template and recording before NCC computation. This was the single most impactful improvement, increasing the detection rate from 20/36 to 36/36 calibration events by rejecting low-frequency wind noise and high-frequency environmental interference that diluted NCC scores.
### 3.3 Multi-Peak Cross-Mic Consistency Clustering
In bird-rich forests, the NCC may match both the playback speaker and real birds of the same species responding to the playback. To disambiguate:
1. For each microphone, the top K=8 NCC peaks above threshold (0.08) are retained
2. For events with insufficient matches, a retry at lower threshold (0.04) is performed
3. All peaks across all microphones are clustered by temporal proximity (maximum 0.5s spread)
4. The cluster with the most microphones is selected as the playback speaker
### 3.4 Iterative Outlier Rejection
After initial multilateration, per-microphone TDOA residuals are computed against the estimated source position. The microphone with the worst median residual is removed if it exceeds an adaptive threshold (3 ms for arrays with >5 receivers, 5 ms for smaller arrays). This process repeats for up to 5 rounds, provided at least 3 microphones remain. A final residual gate (15 ms for >5 receivers, 25 ms for smaller arrays) rejects grossly inconsistent solutions.
### 3.5 Multilateration
Source position is estimated by minimizing the weighted TDOA residual squared error using bounded optimization (L-BFGS-B with ±200m XY and ±50m Z bounds, refined by Nelder-Mead). The production pipeline uses **multi-start optimization** — running from the array centroid and every mic position as initial guesses — to escape local minima in the non-convex TDOA cost surface. The speed of sound is calibrated via grid search over 330-355 m/s using the ground-truth calibration objective (maximize events within 5m of known positions).
### 3.6 Species Identification
**Production pipeline (detection-first):** BirdNET runs on every mic channel at maximum sensitivity (1.5) and overlap (2.5s) before localization. Per-channel detections are clustered across channels, and the consensus species is determined by `WeightedSpeciesVote`:
```
weight = BirdNET_confidence × 10^(amplitude_dB / 20)
```
This weights louder, more-confident detections proportionally higher, making identification robust even when some channels produce low-confidence or incorrect identifications. Cross-channel agreement (≥3 channels) also filters false positives.
**Cal/Val pipeline (template matching):** For each localized event, a 9-second audio segment is extracted from the closest microphone at the template-matched arrival time and analyzed by BirdNET (via birdnetlib). For low-confidence detections (<0.15), up to 3 additional microphones are tried as fallback (top-4 closest mics by estimated position, using 12-second segments for increased context).
## 4. Validation
### 4.1 Dataset
The OSBS validation dataset consists of 54 GPS-surveyed playback events (36 calibration, 18 validation) across 35 bird species. Playback clips were broadcast from a portable speaker at differentially-corrected GPS positions (±10 cm horizontal precision), and recorded simultaneously by 8 Wildlife Acoustics SM3 recorders (16 microphone channels) spanning an approximately 50×80m array.
### 4.2 Results

*Figure 1: Estimated versus true bird positions for all 54 playback events. Open markers: ground truth. Filled markers: estimated positions. Lines connect each true-estimated pair. Red X: failed localizations (2 of 54). Calibration events in blue, validation in orange.*

*Figure 2: Distribution of horizontal and 3D localization errors for all successfully localized events. Median horizontal error: 0.55m. 90th percentile: 13.2m.*
**Table 1: Summary validation metrics (April 2026)**
| Metric | Calibration (N=36) | Validation (N=18) | Combined (N=54) |
|--------|-------------------|-------------------|-----------------|
| Detection rate | 97.2% (35/36) | 88.9% (16/18) | 94.4% (51/54) |
| CEP50 (horizontal) | 0.56 m | 0.49 m | 0.55 m |
| CEP90 (horizontal) | 12.77 m | 11.06 m | 13.22 m |
| RMSE horizontal | 7.24 m | 6.89 m | 7.13 m |
| RMSE vertical | 1.26 m | 1.28 m | 1.27 m |
| RMSE 3D | 7.35 m | 7.01 m | 7.24 m |
| Bias X (East) | +0.57 m | -0.72 m | +0.16 m |
| Bias Y (North) | -1.02 m | +0.49 m | -0.54 m |
| Bias Z (Up) | -0.36 m | -0.75 m | -0.48 m |
| Within 1 m | 24/35 (69%) | 11/16 (69%) | 35/51 (69%) |
| Within 5 m | 29/35 (83%) | 13/16 (81%) | 42/51 (82%) |
| Speed of sound | 343.0 m/s | (from calibration) | — |
**Table 2: Species identification accuracy**
| Metric | All Events (N=42) |
|--------|-------------------|
| BirdNET detection rate | 100% (42/42) |
| Top-1 species accuracy | 76.2% (32/42) |
| Top-3 species accuracy | 85.7% (36/42) |
High-confidence correct identifications include American Crow (0.99), Bachman's Sparrow (1.00), Mourning Dove (1.00), Red-shouldered Hawk (1.00), Red-headed Woodpecker (1.00), Northern Cardinal (0.98), Brown-headed Nuthatch (1.00), and Tufted Titmouse (0.84-0.98). Misidentifications primarily occur for species where the playback was too quiet at the microphone distance or where BirdNET confuses acoustically similar species (e.g., Northern Bobwhite misidentified as Mallard, Turkey Vulture as Acadian Flycatcher).
## 5. Implementation
### 5.1 Software Stack
- **Core**: Python 3.12, NumPy, SciPy (optimization, signal processing)
- **Species ID**: BirdNET via birdnetlib + TensorFlow
- **Web**: Flask + gunicorn, Leaflet.js for mapping
- **Deployment**: Docker container on Fly.io with persistent volume storage
### 5.2 Computational Performance
On a 112-core machine with 4.9 GB of recordings pre-loaded into RAM:
- Phase 1 (template matching): ~30 seconds for 36 events (parallel)
- Phase 2 (SOS grid search): ~2 minutes (51 SOS values × 36 events)
- Phase 3 (final localization): ~10 seconds
- Species identification: ~3 minutes (42 events, sequential BirdNET)
### 5.3 Availability
- **Web application**: https://soundmapper4d.fly.dev/
- **Project page**: https://www.speclab.org/soundmapper4d