The dynamic range is compressed or standard. Compression in practice. Narrowing the dynamic range

At a time when researchers were just starting to solve the problem of creating a speech interface for computers, they often had to make their own equipment that allows you to enter sound information into a computer, as well as output it from a computer. Today, such devices may only be of historical interest, as modern computers can be easily equipped with sound input and output devices such as sound adapters, microphones, headphones, and speakers.

We will not go into the details of the internal structure of these devices, but we will talk about how they work, and give some recommendations for choosing sound computer devices for working with speech recognition and synthesis systems.

As we said in the previous chapter, sound is nothing more than air vibrations, the frequency of which lies in the frequency range perceived by a person. In different people, the exact limits of the range of audible frequencies may vary, but it is believed that sound vibrations lie in the range of 16-20,000 Hz.

The task of the microphone is to convert sound vibrations into electrical vibrations, which can then be amplified, filtered to remove interference and digitized to enter sound information into a computer.

According to the principle of operation, the most common microphones are divided into carbon, electrodynamic, condenser and electret. Some of these microphones require an external current source for their operation (for example, carbon and condenser microphones), while others, under the influence of sound vibrations, are able to independently generate an alternating electrical voltage (these are electrodynamic and electret microphones).

You can also separate microphones by purpose. There are studio microphones that can be held in the hand or mounted on a stand, there are radio microphones that can be clipped to clothing, and so on.

There are also microphones designed specifically for computers. These microphones are usually mounted on a stand placed on the table surface. Computer microphones can be combined with headphones, as shown in fig. 2-1.

Rice. 2-1. Head phones with microphone

How to choose from the whole variety of microphones the one that is best suited for speech recognition systems?

In principle, you can experiment with any microphone you have, as long as it can be connected to your computer's audio adapter. However, developers of speech recognition systems recommend purchasing a microphone that will be at a constant distance from the speaker's mouth during operation.

If the distance between the microphone and the mouth does not change, then the average level of the electrical signal coming from the microphone will also not change too much. This will have a positive impact on the quality of modern speech recognition systems.

What is the problem here?

A person is able to successfully recognize speech, the volume of which varies over a very wide range. The human brain is able to filter out quiet speech from noise such as the noise of cars driving down the street, extraneous conversations and music.

As for modern speech recognition systems, their abilities in this area leave much to be desired. If the microphone is on a table, then when you turn your head or change the position of your body, the distance between your mouth and the microphone will change. This will change the microphone output level, which in turn will degrade the reliability of speech recognition.

Therefore, when working with speech recognition systems, the best results will be achieved if you use a microphone attached to headphones, as shown in Fig. 2-1. When using such a microphone, the distance between the mouth and the microphone will be constant.

We also draw your attention to the fact that all experiments with speech recognition systems are best done in seclusion in a quiet room. In this case, the influence of interference will be minimal. Of course, if you need to choose a speech recognition system that can work in conditions of strong interference, then the tests need to be done differently. However, as far as the authors of the book know, the noise immunity of speech recognition systems is still very, very low.

The microphone performs for us the conversion of sound vibrations into electrical current vibrations. These fluctuations can be seen on the oscilloscope screen, but do not rush to the store to purchase this expensive device. We can carry out all oscillographic studies using a conventional computer equipped with a sound adapter, for example, a Sound Blaster adapter. Later we will tell you how to do it.

On fig. 2-2 we have shown the oscillogram of the sound signal obtained when pronouncing a long sound a. This waveform was acquired using the GoldWave program, which we will discuss later in this chapter of the book, as well as using a Sound Blaster audio adapter and a microphone similar to that shown in fig. 2-1.

Rice. 2-2. Oscillogram of the audio signal

The GoldWave program allows you to stretch the waveform along the time axis, which allows you to see the smallest details. On fig. 2-3 we have shown a stretched fragment of the oscillogram of the sound a mentioned above.

Rice. 2-3. Fragment of an oscillogram of an audio signal

Note that the magnitude of the input signal from the microphone changes periodically and takes on both positive and negative values.

If only one frequency were present in the input signal (that is, if the sound were "clean"), the waveform received from the microphone would be sinusoidal. However, as we have already said, the spectrum of human speech sounds consists of a set of frequencies, as a result of which the shape of the speech signal oscillogram is far from sinusoidal.

A signal whose magnitude changes continuously with time, we will call analog signal. This is the signal coming from the microphone. Unlike an analog signal, a digital signal is a set of numerical values that change discretely over time.

In order for a computer to process an audio signal, it must be converted from analog to digital form, that is, presented as a set of numerical values. This process is called analog digitization.

The digitization of an audio (and any analog) signal is performed using a special device called analog to digital converter ADC (Analog to Digital Converter, ADC). This device is located on the sound adapter board and is an ordinary-looking microcircuit.

How does an analog-to-digital converter work?

It periodically measures the level of the input signal, and outputs a numerical value of the measurement result at the output. This process is illustrated in Fig. 2-4. Here, the gray rectangles mark the values of the input signal, measured with a certain constant time interval. The set of such values is the digitized representation of the input analog signal.

Rice. 2-4. Measurements of the dependence of the signal amplitude on time

On fig. In Figure 2-5, we've shown connecting an analog-to-digital converter to a microphone. In this case, an analog signal is applied to the input x 1, and a digital signal is removed from the outputs u 1 -u n.

Rice. 2-5. Analog to digital converter

Analog-to-digital converters are characterized by two important parameters - the conversion frequency and the number of quantization levels of the input signal. Proper selection of these parameters is critical to achieving an adequate digitization of an analog signal.

How often do you need to measure the amplitude value of the input analog signal so that information about changes in the input analog signal is not lost as a result of digitization?

It would seem that the answer is simple - the input signal should be measured as often as possible. Indeed, the more often an analog-to-digital converter makes such measurements, the better it will track the slightest changes in the amplitude of the analog input signal.

However, excessively frequent measurements can lead to an unjustified increase in the digital data flow and a waste of computer resources in signal processing.

Fortunately, choosing the right conversion rate (sampling rate) is easy enough. To do this, it suffices to refer to the Kotelnikov theorem, known to specialists in the field of digital signal processing. The theorem states that the conversion frequency must be twice the maximum frequency of the spectrum of the converted signal. Therefore, in order to digitize without losing the quality of the audio signal, the frequency of which lies in the range of 16-20,000 Hz, you need to select a conversion frequency that is not less than 40,000 Hz.

Note, however, that in professional audio equipment, the conversion frequency is selected several times greater than the specified value. This is done to achieve very high quality digitized audio. For speech recognition systems, this quality is not relevant, so we will not draw your attention to this choice.

And what conversion frequency is needed to digitize the sound of human speech?

Since the sounds of human speech lie in the frequency range of 300-4000 Hz, the minimum required conversion frequency is 8000 Hz. However, many computer programs speech recognition uses the standard 44,000 Hz conversion rate for conventional audio adapters. On the one hand, such a conversion rate does not lead to an excessive increase in the digital data stream, and on the other hand, it ensures speech digitization with sufficient quality.

Back in school, we were taught that with any measurements, errors arise that cannot be completely eliminated. Such errors arise due to the limited resolution of measuring instruments, and also due to the fact that the measurement process itself can introduce some changes in the measured value.

The analog-to-digital converter represents the input analog signal as a stream of numbers of limited capacity. Conventional audio adapters contain 16-bit ADC blocks capable of representing the amplitude of the input signal as 216 = 65536 different values. ADC devices in high-end audio equipment can be 20-bit, providing greater accuracy in representing the amplitude of the audio signal.

Modern speech recognition systems and programs were created for ordinary computers equipped with ordinary sound adapters. Therefore, to conduct experiments with speech recognition, you do not need to purchase a professional audio adapter. An adapter such as Sound Blaster is quite suitable for digitizing speech for further recognition.

Along with the useful signal, various noises usually enter the microphone - noise from the street, wind noise, extraneous conversations, etc. Noise has a negative impact on the quality of speech recognition systems, so it has to be dealt with. One of the ways we have already mentioned is that today's speech recognition systems are best used in a quiet room, remaining alone with the computer.

However, ideal conditions can not always be created, so you have to use special methods to get rid of interference. To reduce the noise level, special tricks are used in the design of microphones and special filters that remove frequencies from the analog signal spectrum that do not carry useful information. In addition, a technique such as compression is used dynamic range input signal levels.

Let's talk about all this in order.

frequency filter A device that converts the frequency spectrum of an analog signal is called. In this case, in the process of transformation, the selection (or absorption) of oscillations of certain frequencies occurs.

You can think of this device as a kind of black box with one input and one output. In relation to our situation, a microphone will be connected to the input of the frequency filter, and an analog-to-digital converter will be connected to the output.

Frequency filters are different:

low-pass filters;

High pass filters

Passing bandpass filters

blocking bandpass filters.

Low Pass Filters(low -pass filter ) remove from the spectrum of the input signal all frequencies whose values are below a certain threshold frequency, depending on the filter setting.

Since audio signals lie in the range of 16-20,000 Hz, all frequencies below 16 Hz can be cut off without degrading the sound quality. For speech recognition, the frequency range of 300-4000 Hz is important, so frequencies below 300 Hz can be cut out. In this case, all noises, the frequency spectrum of which lies below 300 Hz, will be cut out of the input signal, and they will not interfere with the speech recognition process.

Likewise, high pass filters(high -pass filter ) cut out from the spectrum of the input signal all frequencies above a certain threshold frequency.

Humans cannot hear sounds at frequencies of 20,000 Hz or higher, so they can be cut out of the spectrum without noticeable deterioration in sound quality. As for speech recognition, all frequencies above 4000 Hz can be cut out, which will lead to a significant reduction in the level of high-frequency interference.

Band pass filter(band -pass filter ) can be thought of as a combination of a low pass filter and a high pass filter. Such a filter stops all frequencies below the so-called lower pass frequency, as well as above upper pass frequency.

Thus, for a speech recognition system, a pass-through bandpass filter is convenient, which delays all frequencies, except for frequencies in the range of 300-4000 Hz.

As for the band-stop filters (band-stop filter), they allow you to cut out from the spectrum of the input signal all frequencies that lie in a given range. Such a filter is convenient, for example, to suppress noise that occupies a certain continuous part of the signal spectrum.

On fig. 2-6 we have shown the connection of a pass-through filter.

Rice. 2-6. Filtering the audio signal before digitizing

I must say that the usual sound adapters installed in the computer have a band-pass filter through which the analog signal passes before digitization. The bandwidth of such a filter usually corresponds to the range of audio signals, namely 16-20,000 Hz (in different audio adapters, the values of the upper and lower frequencies may vary slightly).

But how to achieve a narrower bandwidth of 300-4000 Hz, corresponding to the most informative part of the spectrum of human speech?

Of course, if you have a penchant for designing electronic equipment, you can make your own filter from an operational amplifier chip, resistors and capacitors. This is exactly what the first creators of speech recognition systems did.

However, industrial speech recognition systems must be able to work on standard computer equipment, so the way of manufacturing a special band-pass filter is not suitable here.

Instead, modern speech processing systems use so-called digital frequency filters implemented in software. This became possible after CPU computer has become powerful enough.

A digital frequency filter implemented in software converts an input digital signal into an output digital signal. During the conversion process, the program processes in a special way the stream of numerical values of the signal amplitude coming from the analog-to-digital converter. The result of the conversion will also be a stream of numbers, but this stream will correspond to the already filtered signal.

Talking about the analog-to-digital converter, we noted such an important characteristic as the number of quantization levels. If a 16-bit analog-to-digital converter is installed in the audio adapter, then after digitization, the audio signal levels can be represented as 216 = 65536 different values.

If there are few quantization levels, then the so-called quantization noise. To reduce this noise, high-quality audio digitization systems should use analog-to-digital converters with the maximum number of quantization levels available.

However, there is another trick to reduce the effect of quantization noise on the quality of the audio signal, which is used in digital sound recording systems. Using this technique, the signal is passed through a non-linear amplifier before digitization, which emphasizes signals with a small signal amplitude. Such a device amplifies weak signals more than strong ones.

This is illustrated by the plot of output signal amplitude versus input signal amplitude shown in Fig. 2-7.

Rice. 2-7. Nonlinear amplification before digitization

In the step of converting the digitized audio back to analog (which we will discuss later in this chapter), the analog signal is again passed through a non-linear amplifier before being output to the speakers. This time, a different amplifier is used that emphasizes large amplitude signals and has a transfer characteristic (dependence of the output signal amplitude on the input signal amplitude) that is the opposite of that used during digitization.

How can all this help the creators of speech recognition systems?

A person, as you know, is quite good at recognizing speech uttered in a low whisper or in a fairly loud voice. It can be said that the dynamic range of volume levels of successfully recognized speech for a person is quite wide.

Today's computer speech recognition systems, unfortunately, cannot yet boast of this. However, in order to slightly expand the specified dynamic range before digitization, it is possible to pass the signal from the microphone through a nonlinear amplifier, the transfer characteristic of which is shown in Fig. 2-7. This will reduce the level of quantization noise when digitizing weak signals.

Developers of speech recognition systems, again, are forced to focus primarily on commercially available sound adapters. They do not provide for the non-linear signal conversion described above.

However, it is possible to create the software equivalent of a non-linear amplifier that converts the digitized signal before passing it to the speech recognition module. And although such a software amplifier will not be able to reduce quantization noise, it can be used to emphasize those signal levels that carry the most speech information. For example, you can reduce the amplitude of weak signals, thus ridding the signal of noise.

Dynamic compression(Dynamic range compression, DRC) - narrowing (or expanding in the case of an expander) the dynamic range of a phonogram. Dynamic Range, is the difference between the quietest and loudest sound. Sometimes the quietest sound in the phonogram will be a little louder than the noise level, and sometimes a little quieter than the loudest. Hardware devices and programs that perform dynamic compression are called compressors, distinguishing four main groups among them: compressors themselves, limiters, expanders and gates.

Tube analog compressor DBX 566

Down and up compression

downcompression(Downward compression) reduces the volume of a sound when it exceeds a certain threshold, leaving quieter sounds unchanged. An extreme version of downcompression is limiter. Up Compression(Upward compression), on the contrary, increases the volume of the sound if it is below the threshold value, without affecting louder sounds. At the same time, both types of compression narrow the dynamic range of the audio signal.

downcompression

Up Compression

Expander and Gate

If the compressor reduces the dynamic range, the expander increases it. When the signal level gets above the threshold level, the expander increases it even more, thus increasing the difference between loud and soft sounds. Such devices are often used when recording a drum set to separate the sounds of one drum from another.

The type of expander that is used not to amplify loud, but to mute soft sounds that do not exceed a threshold level (for example, background noise) is called noise gate. In such a device, as soon as the sound level becomes less than the threshold, the signal stops passing. Typically, a gate is used to suppress noise in pauses. On some models, you can make sure that the sound does not stop abruptly when the threshold level is reached, but gradually fades out. In this case, the decay rate is set by the Decay control.

Gate, like other types of compressors, can be frequency dependent(i.e. treat certain frequency bands differently) and can operate in side chain(see below).

The principle of operation of the compressor

The signal entering the compressor is split into two copies. One copy is sent to an amplifier in which the gain is controlled by an external signal, the second copy forms this signal. It enters a device called a side-chain, where the signal is measured, and based on this data, an envelope is created that describes the change in its volume.
This is how most modern compressors are arranged, this is the so-called feed-forward type. In older devices (feedback type), the signal level is measured after the amplifier.

There are various analog technologies for controlled amplification (variable-gain amplification), each with its own advantages and disadvantages: tube, optical using photoresistors and transistors. When working with digital audio (in a sound editor or DAW), proprietary mathematical algorithms can be used or analog technologies can be emulated.

Main parameters of compressors

Threshold

The compressor reduces the level of the audio signal if its amplitude exceeds a certain threshold value (threshold). It is usually specified in decibels, with a lower threshold (eg -60 dB) meaning more sound will be processed than a higher threshold (eg -5 dB).

Ratio

The amount of level reduction is determined by the ratio parameter: a ratio of 4:1 means that if the input level is 4 dB above the threshold, the output level will be 1 dB above the threshold.
For example:
Threshold = -10dB
Input signal = -6 dB (4 dB above threshold)
Output signal = -9 dB (1 dB above threshold)

It is important to keep in mind that the suppression of the signal level continues for some time after it falls below the threshold level, and this time is determined by the value of the parameter release.

Compression with a maximum ratio of ∞:1 is called limiting. This means that any signal above the threshold level is attenuated to the threshold level (except for a short period after a sudden increase in the input volume). See "Limiter" below for details.

Examples of different Ratio values

Attack and Release

The compressor provides some control over how quickly it responds to changing signal dynamics. The Attack parameter determines the time it takes for the compressor to reduce the gain to the level specified by the Ratio parameter. Release determines the amount of time it takes for the compressor to either ramp up the gain, or return to normal if the input level drops below the threshold.

Attack and Release phases

These parameters indicate the time (usually in milliseconds) it takes for the gain to change by a certain number of decibels, typically 10 dB. For example, in this case, if Attack is set to 1ms, it will take 1ms to decrease the gain by 10dB, and 2ms by 20dB.

In many compressors, the Attack and Release parameters can be adjusted, but in some they are preset and are not adjustable. Sometimes they are referred to as "automatic" or "program dependent", i.e. change depending on the input signal.

Knee

Another compressor option: hard/soft Knee. It determines whether the start of applying compression will be abrupt (hard) or gradual (soft). Soft knee reduces the visibility of the dry-to-compressed signal transition, especially at high Ratios and sudden volume increases.

Hard Knee and Soft Knee Compression

Peak and RMS

The compressor can respond to peak (short-term maximum) values or to the average level of the input signal. The use of peak values can lead to large fluctuations in the degree of compression, and even distortion. Therefore, compressors apply an averaging function (usually RMS) of the input signal when comparing it to a threshold value. This gives a more comfortable compression that is closer to the human perception of loudness.

RMS is a parameter that reflects the average loudness of a phonogram. From a mathematical point of view, RMS (Root Mean Square) is the root mean square value of the amplitude of a certain number of samples:

stereo linking

A compressor in stereo linking mode applies the same gain to both stereo channels. This avoids shifting the stereo pan that can result from processing the left and right channels individually. Such an offset occurs if, for example, any loud element is panned off-center.

makeup gain

As the compressor reduces the overall level of the signal, it is common to add a fixed gain option to the output to get the optimum level.

Look-ahead

The look-ahead function is intended to solve the problems associated with both too large and too small Attack and Release values. Too long an attack time does not allow effective interception of transients, and too short an attack time may not be comfortable for the listener. When using the look-ahead function, the main signal is delayed relative to the control signal, this allows compression to begin in advance, even before the signal reaches the threshold value.
The only drawback of this method is the time delay of the signal, which is undesirable in some cases.

Using Dynamic Compression

Compression is used everywhere, not only in musical phonograms, but also wherever it is necessary to increase the overall volume without increasing peak levels, where inexpensive sound reproducing equipment or a limited transmission channel is used (public address and communication systems, amateur radio, etc.) .

Compression is applied when playing background music (in shops, restaurants, etc.) where any noticeable volume changes are undesirable.

But the most important application of dynamic compression is music production and broadcasting. Compression is used to give the sound "thickness" and "drive", to better match instruments with each other, and especially when processing vocals.

Vocals in rock and pop music are usually compressed to make them stand out from the accompaniment and add clarity. A special kind of compressor, tuned only to certain frequencies - a de-esser, is used to suppress hissing phonemes.

In instrumental parts, compression is also used for effects that are not directly related to volume, for example, quickly fading drum sounds can become longer.

Electronic dance music (EDM) often uses side-chaining (see below) - for example, the bass line can be driven by a kick or similar to prevent bass/drum conflict and create dynamic pulsation.

Compression is widely used in broadcast (radio, TV, internet) to increase the perceived loudness while reducing the dynamic range of the original audio (usually a CD). Most countries have legal limits on the instantaneous maximum volume that can be broadcast. Usually these limitations are implemented by permanent hardware compressors in the on-air circuit. In addition, increasing the perceived loudness improves the "quality" of the sound from the point of view of most listeners.

side chaining

Another common compressor switch is the "side chain". In this mode, the sound is compressed not depending on its own level, but depending on the level of the signal coming to the connector, which is usually called side chain.

There are several uses for this. For example, the vocalist is lisping and all the letters "s" stand out from the overall picture. You pass his voice through the compressor, and the same sound is fed into the side chain jack, but passed through the equalizer. On the equalizer, you remove all frequencies except those used by the vocalist when pronouncing the letter "c". Usually about 5 kHz, but can be from 3 kHz to 8 kHz. If you then put the compressor in side chain mode, then the compression of the voice will occur at those moments when the letter “s” is pronounced. Thus, the device known as the "de-esser" (de-esser) was obtained. This way of working is called frequency dependent.

Another application of this function is called "ducker". For example, at a radio station, the music goes through the compressor, and the words of the DJ go through the side chain. When the DJ starts chatting, the volume of the music will automatically decrease. This effect can also be successfully applied in recording, for example, to reduce the volume of keyboard parts while singing.

brick wall limiting

The compressor and limiter work in much the same way, we can say that the limiter is a compressor with a high Ratio (from 10:1) and usually a low attack time.

There is the concept of Brick wall limiting - limiting with a very high Ratio (from 20:1 and above) and a very fast attack. Ideally, it does not allow the signal to exceed the threshold level at all. The result will be unpleasant to the ear, but it will prevent damage to sound-reproducing equipment or exceeding the bandwidth of the channel. Many manufacturers integrate limiters into their devices for this very purpose.

Clipper vs. Limiter, soft and hard clipping

The encoding technology used in DVD players with their own

audio decoders and receivers. Dynamic range compression (or reduction) is used to limit audio peaks when watching movies. If the viewer wishes to watch a film in which abrupt changes in volume level are possible (a film about war,

for example) but does not want to disturb their family members, then DRC should be turned on. Subjectively, by ear, after turning on DRC, the proportion of low frequencies in the sound decreases and high sounds lose transparency, so you should not turn on the DRC mode unless necessary.

DreamWeaver (See - front page)

A visual editor of hypertext documents developed by the software company Macromedia Inc. Powerful professional program DreamWeaver contains the ability to generate HTML pages of any complexity and scale, and also has built-in tools to support large network projects. It is a visual design tool that supports advanced WYSIWYG concept tools.

Driver (See Driver)

A software component that allows you to interact with devices

computer, such as a network interface card (NIC), keyboard, printer, or monitor. Network equipment (such as a hub) connected to a PC requires drivers in order for the PC to communicate with the equipment.

DRM (Digital Rights Management - Management of access and copying of information protected by copyright, Digital Rights Management)

u A concept that involves the use of special technologies and methods for protecting digital materials to ensure that they are provided only to authorized users.

v A client program for interacting with the Digital Rights Management Services package, which is designed to control access to and copy of copyrighted information. DRM Services works in the environment Windows Server 2003. The client software will run on Windows 98, Me, 2000, and XP, allowing applications such as Office 2003 to access the appropriate services. In the future, Microsoft should release a digital rights management module for the browser Internet Explorer. In the future, it is planned to have such a program on a computer to work with any content that uses DRM technologies in order to protect against illegal copying.

Droid (Robot) (See Agent)

DSA(Digital Signature Algorithm - Digital Signature Algorithm)

Public key digital signature algorithm. Developed by NIST (USA) in 1991

DSL (Digital Subscrabe Line)

State of the art technology supported by the city telephone exchanges for exchanging signals at higher frequencies than those used in conventional analog modems. A DSL modem can work simultaneously with a telephone (analogue signal) and a digital line. Since the spectra of the voice signal from the phone and the digital DSL signal do not "intersect", i.e. do not affect each other, DSL allows you to surf the Internet and talk on the phone on the same physical line. What's more, DSL technology typically uses multiple frequencies, and DSL modems on both sides of the line try to pick the best ones for data transmission. The DSL modem not only transmits data, but also acts as a router. Equipped with an Ethernet port, the DSL modem makes it possible to connect several computers to it.

DSOM(Distributed System Object Model, Distributed SOM - Distributed System Object Model)

IBM technology with appropriate software support.

DSR? (Data set ready - Data ready signal, DSR signal)

Serial interface signal indicating that the device (for example,

modem) is ready to send a bit of data to the PC.

DSR? (Device Status Report)

DSR? (Device Status Register)

DSS? (Decision Support System) (See

The sound level is the same throughout the composition, there are several pauses.

Narrowing the dynamic range

Narrowing the dynamic range, or more simply compression, is necessary for different purposes, the most common of them:

1) Achieving a single volume level throughout the entire composition (or part of the instrument).

2) Achieving a single volume level of compositions throughout the album / radio broadcast.

2) Increasing intelligibility, mainly when compressing a certain part (vocal, bass drum).

How does the narrowing of the dynamic range happen?

The compressor analyzes the input audio level by comparing it to a user-defined Threshold value.

If the signal level is below the value Threshold– then the compressor continues to analyze the sound without changing it. If the sound level exceeds the Threshold value, then the compressor starts its action. Since the role of the compressor is to narrow the dynamic range, it is logical to assume that it limits the largest and smallest amplitude values (signal level). At the first stage, the largest values are limited, which decrease with a certain force, which is called Ratio(Attitude). Let's look at an example:

The green curves represent the sound level, the greater the amplitude of their oscillations from the X axis, the greater the signal level.

The yellow line is the threshold (Threshold) for the compressor to operate. By making the Threshold value higher, the user moves it away from the X axis. By making the Threshold value lower, the user brings it closer to the Y axis. It is clear that the lower the threshold value, the more often the compressor will operate and vice versa, the higher, the less often. If the Ratio value is very high, then after reaching the Threshold signal level, the entire subsequent signal will be suppressed by the compressor to silence. If the Ratio value is very small, then nothing will happen. The choice of Threshold and Ratio values will be discussed later. Now we should ask ourselves the following question: What is the point of suppressing all subsequent sound? Indeed, this makes no sense, we only need to get rid of the amplitude values (peaks) that exceed the Threshold value (marked in red on the graph). It is to solve this problem that there is a parameter Release(Fade out), which sets the duration of the compression.

The example shows that the first and second Threshold exceedances last less than the third Threshold exceedance. So, if the Release parameter is set to the first two peaks, then when processing the third peak, an unprocessed part may remain (since the threshold exceeding the Threshold lasts longer). If the Release parameter is set to the third peak, then when processing the first and second peaks, an undesirable decrease in the signal level is formed behind them.

The same goes for the Ratio parameter. If the Ratio parameter is set to the first two peaks, then the third one will not be sufficiently suppressed. If the Ratio parameter is set to process the third peak, then the processing of the first two peaks will be too high.

These problems can be solved in two ways:

1) By setting the attack parameter (Attack) - a partial solution.

2) Dynamic compression is a complete solution.

Parameter butstill (Attack) is designed to set the time after which the compressor will start its work after exceeding the Threshold threshold. If the parameter is close to zero (it is equal to zero in case of parallel compression, see the corresponding article) - then the compressor will start to suppress the signal immediately, and the amount of time set by the Release parameter will work. If the attack speed is high, then the compressor will start its action after a certain period of time (this is necessary to give clarity). In our case, you can set the threshold (Threshold), attenuation (Release) and compression level (Ratio) parameters to process the first two peaks, and set the Attack value (Attack) close to zero. Then the compressor will suppress the first two peaks, and when processing the third one, it will suppress it until the threshold is exceeded (Threshold). However, this does not guarantee high-quality sound processing and is close to limiting (a rough cut of all amplitude values, in this case the compressor is called a limiter).

Let's look at the result of sound processing by the compressor:

The peaks disappeared, I note that the processing settings were quite gentle and we suppressed only the most protruding amplitude values. In practice, the dynamic range narrows much more and this trend is only progressing. In the minds of many composers, they make music louder, but in practice, they completely deprive it of dynamics for those listeners who will probably listen to it at home and not on the radio.

It remains for us to consider the last compression parameter, this Gain(Gain). Amplification is intended to increase the amplitude of the entire composition and, in fact, is equivalent to another tool of sound editors - normalize. Let's look at the end result:

In our case, the compression was justified and improved the sound quality, since the prominent peak is more an accident than an intentional result. In addition, you can see that the music is rhythmic, therefore it has a narrow dynamic range. In cases where high amplitude values were made on purpose, compression can become a mistake.

Dynamic compression

The difference between dynamic compression and non-dynamic compression is that the first level of signal suppression (Ratio) depends on the level of the incoming signal. Dynamic compressors are in all modern programs, the Ratio and Threshold parameters are controlled using a window (each parameter has its own axis):

There is no single standard for displaying the graph, somewhere along the Y axis the level of the incoming signal is displayed, somewhere on the contrary, the level of the signal after compression. Somewhere the point (0,0) is in the upper right corner, somewhere in the lower left. In any case, moving the mouse cursor over this field changes the values of the numbers that correspond to the Ratio and Threshold parameters. Those. You set the compression level for each Threshold value, so you can set the compression very flexibly.

Side Chain

The side chain compressor analyzes the signal of one channel, and when the sound level exceeds the threshold (threshold), it applies compression to the other channel. The side chain has its advantages of working with instruments that are located in the same frequency region (bass-bass drum is actively used), but sometimes instruments located in different frequency regions are used, which leads to an interesting side-chain effect.

Part Two - Compression Steps

There are three stages of compression:

1) The first stage is the compression of individual sounds (singleshoots).

The timbre of any instrument has the following characteristics: Attack, Hold, Decay, Delay, Sustain, Release.

The stage of compression of individual sounds is divided into two parts:

1.1) Compression of individual sounds of rhythmic instruments

Often the components of a beat require separate compression to give them clarity. Many people process the bass drum separately from other rhythmic instruments, both at the stage of compressing individual sounds, and at the stage of compressing individual parts. This is due to the fact that it is located in the low-frequency region, where, in addition to it, only bass is usually present. The clarity of the bass drum is understood as the presence of a characteristic click (the bass drum has a very short attack and hold time). If there is no click, then you need to process it with a compressor, setting the threshold to zero and the attack time from 10 to 50 ms. The Compressor's Realese must end before the kick kick kicks in again. The last problem can be solved using the formula: 60,000 / BPM , where BPM is the tempo of the composition. So, for example) 60,000/137=437.96 (time in milliseconds until a new downbeat of a 4-meter composition).

All of the above applies to other rhythmic instruments with a short attack time - they should have an accentuated click that should not be suppressed by the compressor at any of the stages of compression levels.

1.2) Compressionindividual soundsharmonic instruments

Unlike rhythmic instruments, parts of harmonic instruments are rarely composed of individual sounds. However, this does not mean that they should not be processed at the sound compression level. If you use a sample with a recorded part, then this is the second level of compression. This level of compression applies only to synthesized harmonic instruments. These can be samplers, synthesizers using various sound synthesis methods (physical modeling, FM, additive, subtractive, etc.). As you probably already guessed, we are talking about programming synthesizer settings. Yes! It's compression too! Almost all synthesizers have a programmable envelope parameter (ADSR), which means envelope. With the help of the envelope, the time of the Attack (Attack), Decay (Decay), Holding Level (Sustain), Decay (Release) is set. And if you tell me that this is not the compression of each individual sound - you are my enemy for life!

2) The second stage - Compression of individual parts.

By compression of individual parts, I mean the narrowing of the dynamic range of a number of combined individual sounds. This stage also includes recordings of parties, including vocals, which require compression processing to give it clarity and intelligibility. When processing batches by compression, it is necessary to take into account the fact that when adding individual sounds, unwanted peaks may appear, which you need to get rid of at this stage, since if this is not done now, then the picture may worsen at the stage of mixing the entire composition. At the stage of compression of individual parts, the compression of the processing stage of individual sounds must be taken into account. If you have achieved the clarity of the bass drum, then incorrect re-processing at the second stage can ruin everything. It is not necessary to have all parts processed by the compressor, nor is it necessary to process all individual sounds. I advise you to put an amplitude analyzer just in case to determine the presence of unwanted side effects of combining individual sounds. In addition to compression, at this stage it is necessary to ensure that the parties are, if possible, in different frequency ranges so that quantization is performed. It is also useful to remember that sound has such a characteristic as masking (psychoacoustics):

1) A quieter sound is masked by a louder sound in front of it.

2) Quieter sound at low frequency is masked by louder sound at high frequency.

So, for example, if you have a synth part, often the notes start playing before the previous notes finish playing. Sometimes this is necessary (creating harmony, playing style, polyphony), but sometimes not at all - you can cut their end (Delay - Release) in case it is heard in solo mode, but not heard in all-part play mode. The same applies to effects, such as reverb - it should not last until the sound source starts again. By cutting and removing the unwanted signal, you make the sound cleaner, and this can also be considered as compression - because you remove unwanted waves.

3) The third stage - Compression of the composition.

When compressing the entire composition, you need to take into account the fact that all parts are a combination of many individual sounds. Therefore, when combining them and then compressing them, care must be taken that the final compression does not spoil what we achieved in the first two stages. You also need to separate compositions in which a wide or narrow range is important. when compressing compositions with a wide dynamic range, it is enough to put a compressor that will crush short-term peaks that were formed as a result of adding parts together. When compressing a composition in which a narrow dynamic range is important, everything is much more complicated. Here compressors have recently been called maximizers. Maximizer is a plugin that combines a compressor, limiter, graphic equalizer, enhancer and other sound transformation tools. At the same time, he must necessarily have sound analysis tools. Maximizing, the final processing by the compressor, is largely needed to combat the mistakes made in the previous stages. Mistakes - not so much compression (however, if you do at the last stage what you could have done at the first stage, this is already a mistake), but in the initial choice of good samples and instruments that would not interfere with each other (we are talking about frequency ranges) . This is what the frequency response is corrected for. It often happens that with strong compression on the master, you need to change the compression and mixing parameters at earlier stages, since with a strong narrowing of the dynamic range, quiet sounds that were previously masked come out, the sound of individual components of the composition changes.

In these parts, I deliberately did not talk about specific compression parameters. I considered it necessary to write about the fact that during compression it is necessary to pay attention to all sounds and all parts at all stages of creating a composition. Only in this way, in the end, you will get a harmonious result, not only from the point of view of music theory, but also from the point of view of sound engineering.

The table below gives practical tips for processing individual batches. However, in compression, numbers and presets can only suggest the desired area in which to search. The ideal compression settings depend on each individual case. The Gain and Threshold parameters assume a normal sound level (logical use of the entire range).

Part Three - Compression Options

Quick reference:

Threshold - determines the sound level of the incoming signal, upon reaching which the compressor starts to work.

Attack (Attack) - determines the time after which the compressor will start to work.

Level (ratio) - determines the degree of reduction of amplitude values (in relation to the original amplitude value).

Release (release) - determines the time after which the compressor will stop working.

Gain - Determines how much the input signal will be boosted after it has been processed by the compressor.

Compression table:

Tool	Threshold	Attack	Ratio	Release	Gain	Description
vocals	0 dB	1-2ms 2-5ms 10 ms 0.1 ms 0.1 ms	less than 4:1 2,5: 1 4:1 – 12:1 2:1 -8:1	150ms 50-100ms 150 ms 150ms 0.5s		Compression during recording should be minimal, it requires mandatory processing at the mixing stage to give clarity and intelligibility.
wind instruments		1-5ms	6:1 – 15:1	0.3s
Barrel		10 to 50 ms 10-100ms	4:1 and up 10:1	50-100ms 1ms		The lower the Thrshold and the larger the Ratio and the longer the Attack , the more pronounced the click at the beginning of the kick.
Synthesizers						Depends on wave type (ADSR envelopes).
Working drum:		10-40ms 1-5ms	5:1 5:1 – 10:1	50ms 0.2s
Hi-hat		20ms	10:1	1ms
Overhead microphones		2-5ms	5:1	1-50ms
Drums		5ms	5:1 – 8:1	10ms
Bas-guitar		100-200ms 4ms to 10ms	5:1	1ms 10ms
Strings		0-40ms	3:1	500ms
Synth. bass		4ms-10ms	4:1	10ms		Depends on envelopes.

Percussion		0-20ms	10:1	50ms
Acoustic guitar, Piano		10-30ms 5 - 10ms	4:1 5:1 -10:1	50-100ms 0.5s
Electro-nitara		2-5ms	8:1	0.5s

Final compression		0.1 ms 0.1 ms	2:1 2:1 to 3:1	50ms 0.1 ms	0 dB output	The attack time depends on the goal - whether to remove peaks or make the track smoother.
Limiter after final compression		0 mS	10:1	10-50ms	0 dB output	If you need a narrow dynamic range and a rough "cut" of the waves.

The information was taken from various sources, which are referred to by popular resources on the Internet. The difference in compression parameters is explained by the difference in sound preferences and working with different material.