We evaluated the reliability of the chorus size estimation by visual inspection of spectrogram and spectrum. First, we measured the accuracy of the estimated chorus size. During a howling survey there is generally no visual access to the replying pack. We thus analysed bioacoustically choruses of known size, either simulated by humans (i.e. “Human simulated howling” test, HSH; n = 20), or of real wolves (i.e. “Wolf downloaded howling” tests, WDH; n = 9). Second, we estimated the precision of the technique by comparing bioacoustically estimated values of 37 free-ranging wolves’ choral replies from two independent operators. A training on bioacoustic analysis following the procedure highlighted in this paper was performed by operators before the tests.
Finally, we tested whether aural estimations and estimations from visual inspection of spectrogram and spectrum provided comparable results. Using the same 37 wolves’ replies we compared aural chorus size estimations obtained in the field during the howling survey (not necessarily by the same field operators) with the bioacoustics estimations of the same choruses.
Human-simulated howls (HSH) were recorded in summer 2012 and 2013, by groups of 2 to 8 volunteers who were asked to howl together after training on howling simulation. Breaking and flat howls were alternated, with at least 5 howls per trial, and one individual entering in the chorus as a staggered basis, following Harrington and Mech . Human howls were recorded in Fonte del Baregno (43°62’ N, 11°93’ E), within the protected area called Alpe di Catenaia in the Apennine Mountains, in the North-East of Tuscany, Italy. Distance between source and recorder was 100 m.
Choruses from the internet (Wolf Downloaded Howls, WDH) were downloaded from YouTube as video file (.flv) format with VSO downloader 2.9.12  after a research with keywords such as “wolf”, “howls” or similar terms. We selected and downloaded 9 videos in which it was effectively possible to count the howling wolves and all the howling wolves were well visible. So as to be sure that the chorus size corresponded to the pack size, 8 out of the 9 choruses used in the WDH were recorded in captivity (especially in the zoo). The videos were then converted from the original video format (MP4, “.mov” or “.flv” types) into audio format (2 channels, Wave format, 44,100 KHz and 16 bit format) with the software 4Free Video Converter  . All the links to the original files are in Additional file 3.
Free-ranging wolves’ replies were collected from 2008 to 2014 during a wolf howling monitoring program (following the Habitat Directive on priority species [92/43/EEC]) carried out in the Province of Arezzo (3230 km2), Eastern Tuscany, Italy.
Wolf howling survey was performed in summer (from July to October), when the pack activity was focused in the home-sites, because of the pups presence and the rate of response was consequently higher [14, 15, 54]. Sampling sites were chosen so as to cover the whole study area, following the method described by Harrington and Mech  as “saturation census” and adapting it to local requirements/topography to maximise the range of audibility and minimise sound dispersion . Following the standard procedure suggested by Harrington and Mech : i) no session was conducted during rainfalls nor with strong wind; ii) wolf howling was performed overnight, to minimise the anthropogenic noise; iii) two trials were conducted per site.
Wolves respond to the howling of unfamiliar individuals in six different basic ways, from retreating silently to remaining and replying with/by vocal approach  in relation to their resources (e.g. fresh prey), and social context (e.g. presence of pups) and to the stimulus ; moreover, they respond to human simulated howling as well as to playbacks [13, 14]. For these reasons, our stimuli always consisted in a chorus howls emitted by two individuals (howling playback by a captive pair of wolves (duration: 1.20 min) or by human simulated howling (duration: circa 1 min)). Playback of recorded chorus howls was emitted by an exponential horn with high emission directionality (120° horizontal coverage and 60° vertical).
Three minutes after the first stimulus, if no answer had followed, a second trial (higher in volume to cover a bigger area) was attempted, after which the operators left the site. In case of response, reply bearing, times and an extemporaneous estimation of the chorus size by ear (from the operator without headphone and microphone) were recorded for each answer. For a better localization of the pack we repeated one or more trials from a place closer to the presumed site of response or concurrent sessions were performed by two groups of operators. Real pack sizes were unknown for the free-ranging wolves.
Humans’ and free-ranging wolves’ howls were captured with a Sennheiser directional microphone fitted with a windshield (ME67 head with K6 power module – frequency response: 50–20,000 Hz) and saved on a hand-held M-Audio Microtrack 24/96 II digital recorder, in uncompressed Wave format with a 44,100 Hz sampling rate and 16 bits amplitude resolution.
Acoustic signals were analysed with Raven pro 1.4, developed by Cornel Lab of Ornithology) , and with the open source Seewave package  in R v. 2.9.0  to implement the spectral view.
To estimate chorus size by visual inspections, spectrograms and spectral envelopes were computed for each audio file (Figs. 2 and 3). Spectral envelope (or spectrum) represents the sound at a given instant, showing the frequencies on the horizontal axis and the sound pressure (or amplitude) on the vertical axis [47, 55, 56]. The spectrogram of a sound represents instead a sequence of spectra, showing time on the horizontal axis, frequency on the vertical axis, and the sound pressure as a greyscale or different colour scale (Fig. 3) [47, 55, 56]. Spectrogram and spectrum are based on the mathematical function Fourier transform [47, 55, 56], and the version of this function which is used to represent digitalised/discrete signals is called discrete Fourier transform (DFT) . DFT size represents the length of the analysis window (the window size), and thus the number of frames sampled to compute each spectrum of the spectrogram, while the window function (i.e., Hanning, Gaussian) determines how to taper the abruptness of the onset and offset of a segment . A narrow-band spectrogram (high window size values) results in a spectrogram with frequencies which clearly differ from one another. To analyse wolves choruses, parameters were set as follows: DFT size: 2048 samples; Hanning window; frequency grid: 21.5 Hz; time step: 10 ms, where frequency grid = (sampling frequency)/DFT size, while time step was taken to be the distance between the centre of subsequent samples. In case of noise in the recordings (anthropogenic: cars, planes, high music from villages, human voices, bells; natural: wind, rivers, other animals), a band-pass flat filter (100–2000 Hz) was applied to delete noise and thus to improve the audibility of wolves’ replies.
Every single howl emitted by a wolf appears as a fundamental frequency (F0) and its harmonic overtones, or harmonics (Fig. 2). The fundamental frequency is the glottal pulse rate and determines the pitch of the voice , while harmonic overtones are integer multiples of the fundamental frequency (F0*2; F0*3;…, F0*N) .
Chorus size was then estimated by counting the number of different howls (viewed as the fundamental plus harmonics) visualised at the same time (Fig. 2)), assuming that one wolf cannot produce multiple fundamental frequencies at a given time. Harmonic overtones were easily recognised because of their shape (same as F0) and frequencies (integer multiples of F0); since the difference between howls (inter-harmonic space) doubles from the fundamental frequencies to the first harmonics, they can also help recognise different howls (Fig. 2).
All statistical analyses were carried out using R v. 2.9.0 . Spearman’s rank correlations were computed to compare the real and bioacoustically predicted chorus size, the bioacoustic estimations of the chorus sizes performed by two different operators and the bioacoustic and aural estimation of the chorus size. Hypotheses of no mean differences between the real and bioacoustically predicted chorus size, the bioacoustic estimations of the chorus sizes performed by two different operators and the bioacoustic and aural estimation of the chorus size were tested by Wilcoxon matched-pairs signed-ranks test.