TEMPORAL STRUCTURE OF THE SPECTRAL LEVELS WITHIN PRE-RECORDED SPEECH, MUSIC AND CINEMA MATERIAL

Authors
G Leembruggen

When sound system designers are selecting loudspeakers and amplifiers to produce a given sound pressure level, parameters that must be determined are the required power capacity of amplifiers, and the sensitivity and power handling capacity of the loudspeakers. Integral to the determination of these parameters is a knowledge of the long-term Leq (or RMS level) and the crest factor of the programme material that is to be fed to the system for broadband and specific frequency bands. Given that cost may impose limits on the power capacities of the amplifiers and loudspeakers, an understanding of the histogram of levels in each frequency range can allow the designer to select a specific dynamic range that is to be reproduced. Currently, there is a paucity of audio histogram data in the temporal domain that is available for designers, particularly in multiple frequency ranges. Boren et al1 present peak and averaged levels of vocal sound pressure, without histogram information, while Chapman2 presents one-third octave RMS levels for a range of programme material types based on numerous recordings along with averaged peak levels based on the rms level in each 0.37 second band. Another use for this temporal information is environmental noise measurements, whereby knowledge of the likely temporal structure of the emitted sound from a music venue can assist with determining compliance with codes in situations where straightforward measurements are not possible. This paper presents temporal level data in dB for a number of pre-recorded speech, music and cinema segments in broadband, and octave and one-third octave frequency bands with 0 and 125 ms time constants. Segments include action movies, thrash metal, hip-hop, pop, symphonic, rock and speech.