Audio Systems Array Processing Toolbox
Kevin D. Donohue
Department of Electrical and Computer Engineering
Audio Systems Laboratory
University of Kentucky
(Last Update 10-27-2009)
A microphone array is a system of spatially distributed microphones that coherently collect acoustic data over a region of interest. Key applications include:
detect and locate a source of sound, such as a person speaking, machine buzzing, points of turbulence from wind obstruction
to Noise Enhancement
focus all microphones on a specific point of interest in the room though a delay and sum on all channels to result in constructive interference at the point of interest and destructive (or at least less constructive) interference from noises at other spatial positions
Over the last half century there have been many interesting works and ideas developed for array processing. This toolbox (under development) is a collection of Matlab functions useful for simulating and processing data from audio array systems. The programs were developed in the Audio Systems Laboratory at the University of Kentucky. Contributors to the programs of the toolbox include Phil Townsend, ArulKumaran Muthukumarasamy, Satoru Tagawa, and Jens Hannemann. In array systems signals are processed with respect to a spatial geometry of the microphones and sources. So in addition to typical time and frequency characterizations of audio sources and receivers, positions and spatial paths must be known and incorporated into the processing. The Matlab functions in this toolbox have a standard convention for vectors and matrices that provide position information. Functions developed around these conventions allow for efficient reuse, compatibility, and modification of toolbox functions.
The following matrix and vector conventions apply to all applicable toolbox functions:
Collections of signals associated with an array are stored columns-wise in a matrix (row indices correspond to time sample and column indices correspond to signals from different microphones or sources). Larger row indices correspond to more recent time samples (row index 1 is the oldest sample).
Positions of array elements, sources, and other elements in space are denoted with column vectors or column-wise in matrices where each column corresponds to a position and the rows correspond to the x, y, and z coordinates of the position. If only 2 dimensions are given (2 rows) the algorithms will work in a plane (2D). If one dimension is given the algorithms will work along a line, which may be appropriate in applications such as calibration procedures for end-fire arrays or speed of sound measurements.
The field of view (FOV) defines the spatial limits for analysis or imaging. The FOV is limited to rectangular/cubic dimensions and is a 2 column matrix denoting the coordinates of opposite corner points. The coordinates of the array elements and FOV must be with respect to the same spatial frame of reference.
The above conventions simplify the programming and use of toolbox functions with few limitations. Note there is no limitation on the number of array elements or the array element geometry.
The Software was developed by the Audio Systems Laboratory at the University of Kentucky. It can be freely used for educational/research purposes. Any direct use in a commercial product will not be permitted. Questions on its use can be directed to Kevin D. Donohue (firstname.lastname@example.org ). In addition if you are using these functions, I would like to hear about your application or get feedback on how the programs are working or how they can be improved.
To download all files in current toolbox, download zip file on this link and unzip: ArrayToolbox. Otherwise individual files can be downloaded from each of their links in the text descriptions.
Microphone Placement and Analysis Functions
Calibration and Measurement Functions
Speech Intelligibility Estimation Functions
At the heart of array signal processing is the delay operation. As sound travels from source to receiver it undergoes 2 fundamental changes; a time delay and attenuation (typically a frequency dependent attenuation). The following functions synthetically delay the sounds received over the microphone elements to focus on a particular point in space (beamforming) or simulate recording a sound emanating from a particular point over an array of microphones. While these functions are classified as delay, in reality they shift signals forward in time. This creates a delay relative to signals that are shifted less. The purpose for this convention is that as new frames of data become available, a shift back in time requires stuffing the signal channel with zeros on the front end since these samples are not yet available. Shifting into the future pushes more recent samples past the most recent sample in the frame and results in losing these points out of the processing frame; however older samples (available through double buffering of similar method) can be shifted in to the segment region that would be involved in the delay and sum operation. This eliminates artifacts from setting unavailable samples to zero in sliding window implementations. For a single frame with no previous time frames, zeros will be shifted in from the beginning of the array.
The functions listed below delay (or shift) array signals, if the signals are passed to the function in the form of a matrix where signals from the array are placed column-wise in the input matrix. For reference, the first sample in the input vector/matrix (index 1) is considered the oldest sample, and corresponds to the same time instant as the first sample of the output (delayed) vectors. The output vectors are typically longer than the input vector as a result of the delay, where zeros are inserted at the beginning of the shifted signal.
delayint - This is the simplest and fastest of the delay functions. It applies a delay (or set of delays) to the input signal matrix by substituting the signal samples into the output array with an offset corresponding to the delay. Sub-sample delays are NOT possible with this program. Delays are always integer multiples of the sampling interval.
delaytab - This delay function breaks the delay into an integer multiple of the sampling period and a fractional component. The integer part of the delay is handled as in the function delayint. The fractional difference is rounded off to the nearest subsample interval described by an input table of FIR interpolation coefficients. The number of rows in the table determine how densely the original sampling interval is divided. FIR filters must have an even number of coefficients. A table for this program can be generated with several options using the program subsamplefir.m.
delayt - This delay function breaks the delay into an integer multiple of the sampling period and a fractional component. The integer part of the delay is handled as in the function delayint. The fractional part is used to create an FIR subsample delay filter with a 4-coefficients cosine square-weighted sinc function (the filter order and type of filter can be changed by editing the appropriate comment lines in the mfile), and the subsample delay is realized through filtering the integer delayed vector by this filter. This is more flexible than delaytab.m; however it requires more computation since the filter coefficients have to be computed for every arbitrary delay.
delayf - This delay function implements the delay in the frequency domain by multiplying the zero padded FFT of the input vector by exp(-j*w*delay). This has the potential to be the most accurate delay implementation, especially for a true bandlimited signal; however, it requires more computations than any of the other delay functions.
subsamplefir - This function creates tables of FIR filter coefficients for filters that implement subsample delays at equal subincrements over the original sample period. The filter order and type can be selected through the input arguments. The resulting matrix of coefficients can be directly used in the delaytab.m function to implement subsample delays.
The following scripts are used to test and demonstrate the delay functions described above:
testdelayreal - This script generates a real signal and steps through a sequence of subsample delays using the methods in delayf, delayt, delaytab, and delayint. Examples are shown graphically as the signal shifts relative to the original. An analysis is also done to compare the time required for each algorithm and difference/error of delayed signal with the frequency domain delayed version. The script generates and plots real-valued signals.
testdelaycomplex - This script generates a complex signal using the Hilbert transform and steps through a sequence of subsample delays on the complex signal using the methods in delayf, delayt, delaytab, and delayint. Examples are shown graphically as the envelope of signal shifts relative to the original. An analysis is also done to compare the time required for each algorithm and difference/error of delayed signal with the frequency domain delayed version. The script generates complex valued signals and plots the envelope/magnitudes of these signals.
In many sound source location and array calibration algorithms it is necessary to estimate the relative delay between signals received over different acoustic channels (air path and spatially separated microphones). The following functions estimate the delay between segments of channel signals.
delayesttm - This function estimates the relative delay between signals using the cross-correlation function. The input signal segments must be the same length and sampled with respect to the same time reference. Various signal processing options are built into this function through an optional data structure input, including detrending, center clipping, and interpolating to estimate delays on a finer grid resolution than the sampling rate. See script comparedelay for example of use.
delayestfr - This function estimates the relative delay between signals using the group delay of the cross-correlation function. The group delay is estimated as a weighted average of the unwrapped phase gradients, where the weights are the magnitudes of the cross-correlation spectrum. The signal segments must be the same length and sampled with respect to the same time reference. Various signal processing options are built into this function through an optional data structure input, including detrending, and limiting the frequency range over which the group delay is estimated. See script comparedelay for example of use.
comparedelay.m - This script runs a Monte Carlo simulation for estimating delays between 2 bandlimited pulse signals in noise using the functions delayesttm and delayestfr. Performances for each run are presented in the titles of 2 figures showing the test signals (loop is paused for each run so figures can be observed). After all the runs have been completed, a trimmed mean and standard deviation is computed over all Monte Carlo runs for both estimation procedures in displayed in the workspace text.
The imaging and simulation routines in this toolbox allow for arbitrary microphone placements. However, for simulations where regular arrays are being studied, it is convenient to generate microphone positions automatically. Therefore, several function in this section compute the coordinates of microphone arrays with regular placement geometry (based on parameters of that geometry). For irregular geometry placements (i.e. random) random number generators can be used. Since random or irregular geometries cannot be described deterministically, a function is included to statistically describe the microphone geometry.
regmicsline - This function generates coordinates for a line array of equally space microphones.
regmicsplane - This function generates coordinates for a plane array of either rectilinear or hexagonally spaced microphones.
regmicsperim - This function generates coordinates for a plane array of microphones around a perimeter area.
mposanaly - This function finds all possible subsets of microphones in an array taken K at time. It outputs these subsets to a matrix along with their spacing statistics, such as the mean, minimum, maximum, and standard deviation. Since this function uses an N choose K operation to enumerate all possible subsets, care should be taken to ensure that an unwieldy amount of combinations are not generated.
The follow scripts are used to test and demonstrate the delay functions described above:
testmicgeom - This script generates a regular line array, regular plane array, hexagonal plane array, and random plane array with analyses using the above functions (regmicsplane, regmicsline, and mposanaly). Examples of microphone placements are shown graphically. For a random array placement, mposanaly is used to find all possible microphone pairs and rank them according to distance.
testmicperim - This script generates a regular perimeter array using function regmicsperim. Examples of microphone placements are shown graphically.
Simulation experiments allow the researcher to set the underlying parameters of sound generation, propagation, and microphone geometry. Therefore, relationships between these parameters and performance can be examined. The main disadvantages of simulation include:
Explicit relationships between parameters cannot be obtained, as a closed-form solution would give. Therefore, with simulations many parameter values must be tried over a broad range of variations.
Models in simulation do not including ALL effects associated with the physical phenomena in the experiment (nonlinearities, all noise sources, nonuniformities in sound speed ....).
However, since physical experiments are sometimes costly to set up, the simulation is a convenient tool for building understanding and testing ideas in such a way that experiments or derivations can performed more efficiently to test the ideas suggested by the simulations. Below are functions helpful for creating simulation experiments with array systems
General Simulation Functions:
simarraysig - This function inputs a set of sampled sound signals (either segments of a single source recording or a simulated signal) and simulates the recording over system of distributed microphones. Microphone placements and source locations for input signals must be provided. Optional inputs allow for placement of scatterers to generate multi-path interference, and parameters to control the attenuation of sound in air. The attenuation parameter in dB per Hz-meter so the further the source is from the microphone and the higher frequency of the sound, the greater attenuation on the received signal. If using recorded signal it is suggested to trim the noise or non-signal part from the input signal (pad with zeros if it is necessary to extend the signal). Any additional signals before the actual single source sound will be synchronized to the signal and not have the true noise effect. This function requires the function roomimpres to run, which requires atmAtten.m.
simarraysigim- This function is similar to simarraysig above, except it simulates sound in a rectangular room using the image method described by J.B. Allen and D.A. Berkley in "Image method for efficiently simulating small-room acoustics," J. Acoust. Soc. Am., Vol. 65, No. 4, Apr. 1979. In addition to the inputs described in simarraysig a pair of opposite vertices of the rectangular room is required along with the reflectivity of each surface of the room. Optional parameters allow for setting the number of image scatterers based on a threshold for the scattering coefficients, and the degree of frequency dependent attenuation of the air path. This function requires the functions imagesim and roomimpres to run.
simimp - This generates an impulse-like signal that is effectively the impulse response of a Butterworth filter. The frequency range of the impulse can be specified. This can be used to examine how the frequency content of different signals impact array functions like sound source location. The sounds can be played using soundsc in Matlab. For example, a narrowband signal sounds like a drip with tonal properties and the broadband high frequency sound like a click or a snap. If the band includes a broad range, then both high and low frequency sounds can be heard, a low bass-drum-like sound with a snap.
simtone - This generates a tone burst signal that is a half-sine modulated envelope. Several different envelopes can be selected.
simnoise - This generates a noise burst signal that is white noise modulated with an envelope. Several different envelopes can be selected.
imagesim - This generates a series of image scatterers (delays and scattering coefficients) based on a rectangular room geometry and scattering properties of the surfaces. The number of image scatterers is controlled by a threshold to stop generating scatterers once their scatterer coefficient drop below a certain value.
roomimpres - This inputs a sequence of scatterers (delays and scattering coefficients) associated with a sound source and receiver pair in a reverberant environment, the frequency attenuation coefficient, and creates an impulse response for the source receiver pair. This requires atmAtten.m to compute frequency dependent attenuation based on humidity, temperature, and pressure.
atmAtten.m – This function computes attenuation of the propagated sound wave based on inputs, which are temperature in deg C, static pressure in in Hg, relative humidity in %, sound propagation distance, and frequency of sound (may be a vector). It does not account for spherical spreading. This program was written by Nathan Burnside 10/5/04, AerospaceComputing Inc., email@example.com, and is also published on Matlab Central.
test_multipath_sim - This script reads in audio file scare_crow.wav which is a single source recording of a male saying the words there was a scare crow standing the middle of the street and creates a set of signals corresponding to this signal recorded over a linear array of microphones. Graphic and sound output compares the closest and furthest microphones with and without multipath scatterers. This script requires the wave file scare_crow.wav, and functions simarraysig, roomimpres, atmAtten.m, and regmicsline.
test_roomrev_sim - This script reads in audio file scare_crow.wav which is a single source recording of a male saying the words there was a scare crow standing the middle of the street and creates a set of signals corresponding to this signal recorded over a linear array of microphones in a rectangular room with reverberations. Graphic and sound output compares the closest and furthest microphones. This script requires the wave file scare_crow.wav, and functions simarraysigim, roomimpres, atmAtten.m, imagesim, and regmicsline.
Cocktail Party Simulation Functions
The cocktail party effect refers to the ability of the human auditory system to listen to and understand the speech of one voice in a room filled with many voices. It is generally not possible to separate these voices or improve the SNR of the voice of interest with signal processing algorithms operating single microphone recordings. However, since spatially distributed microphones can be used to electronically beamform on a point of interest, it is possible with many microphones to separate the voice of interest from the interfering voices in the room, which are at different spatial positions. The following mfiles use the general simulation functions to simulate cocktail party recordings from single channel (one person) recordings. Microphone geometries, voice locations, and room conditions can be arbitrarily chosen. These simulations are useful for studying the cocktail party effect in humans or sound source location and beamformer algorithms.
wav2sig - This function reads in wave files and stores all the sampled signals into a single matrix with equal number of rows (samples) and same sampling rates, where each column representing the different wave files.
cocktailp - This function simulates placing sound sources at specified positions in a room and recording the resulting sound field with spatially distributed microphones. Its input is a matrix with columns being the sample sounds of interest. This matrix can be created by wav2sig from stored wave files. This function calls the general simulation functions simarraysigim, roomimpres, and imagesim.
runCocktailp - This test script shows an example of simulating a cocktail party recording from individually recorded signals. The microphone positions and the source positions for each individual recording can be set by the simulation. The script reads in 6 wave files, generates a planar distribution of microphones in a 3.6 by 3.6 by 2.2 meter room with mild room reverberation, plots the microphones, and source positions, and plays 4 microphone recordings (randomly selected) of the simulated cocktail party. It calls the following functions from the ArrayToolbox: wav2sig, cocktailp, simarraysig, simarraysigim, roomimpres, imagesim, and regmicsplane and uses the follow wave files: man1.wav, man2.wav, man3.wav, woman1.wav, woman2.wav, and woman3.wav.
flattap - This function creates a tapered window that is flat in the center and tapered on the ends with a Hann window. This is especially useful if processing with an FFT and correlating later as in the case of the whitening or Phase Only Transform (PHAT) for sound source location. The tapers with a maximum point in the center tend to bias the correlation toward the center. This window overcomes this problem.
whiten - This function scales the modulus of the FFT coefficients of input signals to 1, effectively whitening the signal. The phase is the same for input and output, and therefore this operation is sometimes referred to as the phase transform (PHAT). In addition to whitening the input, a special parameter can be selected to control the degree of whitening. For example a value of 1 will result in the PHAT, a value of 0 will leave the modulus unchanged. Any number between 0 and 1 will represent a transition between original modulus and the flat/white one and is referred to as partial whitening.
srpframenn - This function creates an acoustic image from a set of microphone array signals using a steered response power (SRP algorithm). It requires arweights and delayint to run. This particular version uses only integer shifts on the input signals for the delay and sum operation, and computes the coherent average power (rather than total average power), which does not include the power terms for each individual microphone signal. So the coherent power is effectively the cross-correlation between different microphone signals without the self power terms, and can therefore be negative.
arweights - This function creates a set of real weights for the microphone array signals based on their distance from the point of interest in the field of view (FOV). For this function the weight are inversely proportional to the distance from the point and normalized so the sum of weights are one.
cclip - This function perform various types of center clipping on audio data useful for removing low-level noise. The types of center clipping include:
Soft Center Clip: C[x(n)]=x(n)-Cl for (x(n)>=Cl), 0 for (|x(n)|<Cl), and x(n)+Cl for (x(n)<=-Cl)
Hard Center Clip: C[x(n)]=x(n) for (|x(n)|>=Cl), and 0 for (|x(n)|<Cl)
Center Hard Limiter Clip: C[x(n)]= 1 for (|x(n)|>=Cl), 0 for (|x(n)|<Cl), and -1 for (x(n)<=-Cl)
This function is used as an option by the delay estimator function.
testsprimage - This script creates an acoustic image using the function srpframenn. It simulates a simple impulse like sound in an FOV of a perimeter array. It also uses the following functions: flattap, whiten, arweights, delayint, simarraysigim, regmicsperim, mposanaly, delayt, simimp, imagesim and roomimpres. These must be in your current directory for this to run properly. The output is a plot of the SRP image with the simulated target position, microphone positions, and coherent noise source positions. The script randomly generates the target and noise source positions, so the simulation from this script is different every time. See script for modifying simulation parameters. An additional figure is generated to show the original microphone inputs from which the image was created.
Beamforming is a signal processing technique that attempts to use spatial information to filter a target signal of undesired interference. Beamforming algorithms use the measured positions of a target speaker and an array of microphones in order to calculate optimal methods of filtering and combining several audio tracks into one with an enhanced SNR.
Traditionally, beamforming has been carried out using linear, equispaced microphone arrays. Our research aims not only to enhance current beamforming algorithms but also to successfully implement them with arrays of arbitrary three-dimensional geometry.
dsb - The Delay-Sum Beamformer and simplest of the beamforming algorithms. This function aligns a group of microphone tracks based on the time difference of arrival of a sound source based on the measured positions of the source and microphones and adds these delayed recordings together. These operations should increase the target signal relative to the interference since the delay operation will bring the target signal components into phase and cause them to add constructively, while the interference sources are likely to shifted out of phase and thus add destructively.
gjbf - The Griffiths-Jim Beamformer (also known as the Generalised Sidelobe canceller, or GSC), a common improvement that builds on the Delay-Sum Beamformer. The GJBF comprises two signal flow paths in one, a DSB approximates the target signal, while in the other path pairs of microphone signals are subtracted and adaptively filtered using LMS filters to approximate the interference. The paths converge by simply subtracting the approximated noise signals from the target. Several enhancements proposed by Hoshuyama et al ("Robust Adaptive Beamforming." Microphone Arrays : Signal Processing Techniques and Applications. Ed. Michael Brandstein and Darren Ward. New York: Springer, 2001) are also available in this implementation, including adding LMS adaptive filters to the Blocking Matrix (area where the target signal is cancelled, abbreviated BM), constraints on the LMS taps, and SNR estimation for selective adaptation. For more information on these techniques, consult the papers listed in the References section. This function requires bmlms and mclms.
bmlms - Handles computations for the blocking matrix of a GSC beamformer when adaptive filtering in the BM is selected.
ccafbounds - This highly-experimental function generates the explicit coefficient constraints for bmlms.
mclms - Handles computation for the Multiple-Input Canceller (MC) of a GSC beamformer.
corrbf - A new variation on the traditional GSC beamformer. The greatest limitation on the GSC's performance is leakage in the Blocking Matrix, which can cause some target signal cancellation and hamper performance. This algorithm proposes two new ways to handle this problem:
Instead of taking simple differences between microphone tracks, use the Wiener coefficient (a scalar derived from expected values that minimizes the difference between two signal).
Instead of using alignments from the Delay-Sum beamformer, calculate the cross correlation between tracks over a window and shift the tracks to where this correlation is greatest and use this value in the Delay-Sum beamformer subsequently. This modification could make up for errors microphone and speaker position measurement and make it possible to track a moving target speaker with automatic target position updates.
partyanal - A test script demonstrating how to use the above m-files to carry out several types of beamforming. This function requires all of the beamforming functions along with micpos.dat, srcpos.dat, and cocktail11k.wav.
rt60est - This function estimates the RT60 time of a room, which is the time required for a steady-state diffuse sound to drop to 60dB below its maximum. The function requires a data recorded from a burst noise sound (preferably white) that has reached steady state in the room and then stopped. Instructions on how to record the sound in the room are given in the help comments of this function. This function requires that at least 20% of the signal at the beginning is the actual noise burst at steady state in the room, and the last 20% (or more) is the noise floor of the room. The program takes a while to estimate the envelope of the decaying sound and fits a line to it. The slope is then used for the RT60 estimate. This way, if the room has a high noise floor and the decay cannot be measured down to 60 dB, the decaying trend can be picked up before hitting the noise floor and extrapolated from the estimated slope. This function calls a line fit program and an order statistic filter, both of which are included in this mfile and do not need to be loaded separately. If data is recorded over an array of microphones, the RT60 value for each channel is computed and averaged.
velest - This function estimates the velocity of sound based on a measurement from a system with a collinear arrangement of the sound source and 2 microphones with a known distance between them. It uses the time-domain delay estimator delayesttm to estimate the time delay between the microphones. White noise bursts typically yield the best delay estimates and are recommend for this measurement. This function automatically estimates the window size and a high-pass filter cutoff for processing the sound (these operations can be changed from the defaults through optional input parameters). If the input signal is longer than the window size for the estimate, this function performs repeated estimates in a sliding window fashion to obtain a final result by averaging all velocity estimates from windows corresponding to correlation values greater than 0.4.
testrt60p - This script applies to rt60est function to the recorded data in wave file rt60exdata.wav. The data starts with a recording of the steady-state noise burst in a room and ends after the noise source has been turned off. In this case the reverberant sound drops below the noise floor before it drops 60 dB from the loudest diffuse sound. This script uses the rt60est function to estimate the slope of the decaying sound from which the RT60 time is estimated. The script may take a while to run on slower or limited memory system (on the order of a minute or 2). A significant amount of computation is devoted to smoothing out the envelope with order statistics filters.
testvelest - This script uses the velest function with recorded data in wav file sound_speed_p45m.wav. The data was recorded using a 1 second white noise burst in a collinear configuration with 2 microphones separated by 0.45 meters. The script calls the velest function and returns a sequence of velocity estimates, which are averaged and the 95% confidence limits are computed. In addition, an original microphone signal is plotted along with the velocity estimates for each windowed valid data segment and the corresponding correlation coefficients. Note that in practice the hardest part of this measurement is obtaining accurate distances between the microphones. In this case, a 1mm measurement error (about 2%) can make an 8m/s error in the velocity estimate. The further apart the microphones are the less impact the distance measurement error has on the velocity estimate error. On the other hand, the further apart the microphones are, the more susceptible the recording is to multipath interference. These must be considered when making measurements.
Speech Intelligibility is the measure of effectiveness of speech. It can be defined as the degree to which the speech can be understood correctly by the listener. There are various subjective and objective measures available to estimate the Intelligibility in an enclosure. The Speech Intelligibility Index (SII) is the most robust physical measure that is highly correlated with intelligibility of speech. This method is standardized by ANSI and is proposed in draft form as ANSI s3.5 -1997, American National Standards methods for Calculation of the Speech Intelligibility Index (http://www.sii.to/index.html) . The SII is the quantification of the proportion of speech information which is audible and usable for a listener. The SII ranges from 0 (completely unintelligible) to 1 (perfect intelligibility). The following m-files are useful for estimating the SII, especially for experiments and simulations, where the speech signal of interest and noise/interference can be separated for independent power computations.
sii - This function is developed by Hannes Muesh which calculates Intelligibility based on the ANSI s3.5-1997 standard. This function is used from the website http://www.sii.to/index.html which is created by the members of the Acoustical Society of America (ASA) Working Group S3-79, which is in charge of reviewing American National Standard ANSI S3.5-1997.
intel - This function inputs the signal and noise vectors separately, breaks them into smaller segments to dynamically compute their spectrum levels for inputs to the function sii which estimates the Speech Intelligibility Index.
spectrumlevel - This function divides the input signal into eighteen bands based on One-third octave band method and estimates the spectrum power level in each individual band.
rmsilence - This function removes intervals of silence from a speech signal and filters it so that distortion (clicking) from the concatenation of active speech segments is reduced. The speech signal plays faster because pauses are removed. This is useful in computing mean quantities related to speech in that it removes the pauses of silence between words and syllables, which can be quite variable between people and affect performance computations. If multiple columns are present for the input signal, the first column is taken as the reference signal and intervals related to its detected silence are removed from all signals in the matrix so their time synchronization remains.
testscript_intel - This script shows an example of estimating SII of a single channel speech recording with an interfering speaker and white noise. This script estimates the intelligibility of speech-in-noise conditions for various signal and noise level conditions. The script reads two wave files; the first one (man's voice) is the target signal and the second one (woman's voice) is the interfering speaker. In order to contrast the effects of an interference speaker (non-stationary noise) with a white noise inference (stationary noise/constant power) a white noise signal is also used and presented sequentially (hit any key to hear next sample). These signals are scaled (varying SNR) to illustrate numbers for good intelligibility (~.6), barely intelligible (~.2), and unintelligible (.1). The output of the script plots the Speech intelligibility Index over 100 ms windows that are slid over the speech signal, and displays the computed mean and standard deviation of SII (for active speech, silence intervals were excluded). Users can change the envelope threshold which is used to remove the silence intervals so the mean SII is not as dependent on pauses between words (if speech is not active intelligibility is low). They can also change the length of the overlapping time windows over which the SSI is estimated. The scaled target signal together with the interfering signal (noise) is played for reference purpose. It uses the following functions from the Array Toolbox: intel, sii, spectrumlevel, rmsilence, wav2sig and wave files: woman1.wav, man1.wav