Soundscape Information Retrieval

Soundscapes offer unique opportunities to study biodiversity and its interaction with human activities at a fine-scale temporal resolution. For practical applications, information relevant to biotic and abiotic sounds should be provided separately. However, effective retrieval of source-specific information remains a challenge. Until now, a comprehensive audio recognition database is still not available. Furthermore, sound sources recorded simultaneously may elevate the difficulty of acoustic analysis.

We developed the Soundscape Viewer, which is an open toolbox of soundscape information retrieval, to assist in the analysis of a large amount of audio data. The toolbox has three primary modules: (1) visualization of long-duration recordings, (2) source separation, and (3) clustering of audio events.

We apply long-term spectrograms to assist in the investigation of long-duration recordings. According to the user-defined duration, the Soundscape Viewer will divide long-duration recordings into short segments. For each audio segment, the median and mean power spectrums are measured to retain the spectral variation.

Visualization of Long-duration Recordings

Source Separation

  • Unsupervised source separation. We apply the periodicity-coded non-negative matrix factorization (PC-NMF) to separate sound sources with unique temporal patterns. The periodicity constraint is quite useful to distinguish sounds from biotic and abiotic sources, and it has been shown to work for terrestrial and marine soundscapes. Below we show an example of using the PC-NMF to separate biotic sounds (marine mammals and soniferous fish) and abiotic sounds (electrical noise and shipping noise) from a long-term spectral average (LTSA).

  • Supervised source separation. The model generated by using the PC-NMF can be used repeatedly. Another way to do supervised source separation is to use NMF to learn spectral features from a training set that contains a single sound source. Repeat this procedure for all the sound sources to generate a dictionary of spectral features and the associated source indicators. By doing so, a source separation model can be generated in a supervised manner.

Clustering of Audio Events

Clustering is a common approach to identify events with unique features. However, the performance of clustering may deteriorate when the audio data contains multiple sound sources. On the basis of the Soundscape Viewer, users can apply the source separation model as an acoustic filter. After the source separation, various acoustic events in each separated sound source can be easily identified by using clustering.

In addition, the clustering module will try different numbers of clusters and measure the data dispersion explained by the clustering result. By doing so, users can apply the Soundscape Viewer to compare acoustic diversity among periods and locations without the need to determine the number of clusters.

The upper panel shows the clustering result of the original data, the lower panel shows the clustering result of the separated sources. Different colors represent different acoustic events with unique spectral characteristics.

Tzu-Hao Lin, Yu Tsao (2018) Listening to the deep: Exploring marine soundscape variability by information retrieval techniques. OCEANS ’18 Kobe. DOI: 10.1109/OCEANSKOBE.2018.8559307


  1. Tzu-Hao Lin, Yu Tsao (2020) Source separation in ecoacoustics: A roadmap toward versatile soundscape information retrieval. Remote Sensing in Ecology and Conservation, 6: 236-247.

  2. Tzu-Hao Lin, Shih-Hua Fang, Yu Tsao. (2017) Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings. Scientific Reports, 7: 4547

Funding Support

  • Ministry of Science and Technology (MOST105-2321-B-001-069-MY3 )

  • JSPS KAKENHI (18H06491 )