FFmpeg Source Analysis: Introduction to Audio Filters

Posted by bijukon on Tue, 21 Dec 2021 00:32:07 +0100

FFmpeg provides audio and video filters in the libavfilter module. All audio filters are registered in libavfilter/allfilters.c. We can also use the ffmpeg-filters command line to view all currently supported filters, with the Front-a representing audio. This article mainly introduces audio filters, including delay, echo, mixing, equalizer, clipper, iir and fir filters, low-pass filters, band-pass filters, high-pass filters, variable speed, mute detection.

For a detailed description of the audio filter, see the official documentation: Audio Filters.

1,acompressor

Compressor, mainly used to reduce the dynamic range of the signal. Especially in modern music, most of them use high compression ratio to improve overall loudness. Compression works by dividing the detected signal by a scale factor over a set threshold. The parameter options are as follows:

level_in: Input gain, default 1, range [0.015625, 64]

Mode: Compression mode, with up and downward mode, default is downward

Threshold: If the media streaming signal reaches this threshold, the gain will be reduced. Default is 0.125, range [0.00097563, 1]

Ratio: the ratio factor for signal compression, default 2, range [1, 20]

attack: The number of milliseconds the signal takes to rise to the threshold, defaulting to 20, in the range [0.01, 2000]

release: the number of milliseconds it takes to lower the signal to the threshold, defaulting to 250, range [0.01, 9000]

makeup: How many signals are amplified after processing. Default 1, Range [1, 64]

knee: Order of gain reduction, default 2.82843, range [1, 8]

link: average and maximum mode of signal attenuation, default is average

detection: peak peak peak signal or rms root mean square signal is used, and smoother rms is used by default

mix: how much compressed signal to use for output, default 1, range [0, 1]

2,acrossfade

Fade effect, which applies to the switching process from one audio stream to another. The parameter options are as follows:

nb_samples, ns: Specifies the number of samples to fade in and out, defaulting to 44100

duration, d: Specify the duration of fade-in and fade-out

overlap, o: Is the end of the first stream seamlessly connected to the second stream, turned on by default

curve1: Set the first transition curve for fading in and out

curve2: set the transition curve for the second fade in and out

The reference commands are as follows:

ffmpeg -i first.flac -i second.flac -filter_complex acrossfade=d=10:c1=exp:c2=exp output.flac

3,afade

Fade effect, similar to acrossfade effect. The list of parameters is as follows:

type, t: the effect type in or out, defaulting to in

start_sample, ss: Start sampling number, default is 0

nb_samples, ns: Number of fade-in and fade-out samples, default 44100

start_time, st: start time, default is 0

duration, d: The fade-in and fade-out effect lasts for a long time

Curve: A transition curve that fades in and out, including the following options:

  • tri: triangle, default to linear slope
  • qsin: quarter sine wave
  • hsin: half sine wave
  • esin: exponential sine wave
  • log: logarithm
  • ipar: antiparabola
  • qua: quadratic interpolation
  • cub: cubic interpolation
  • squ: square root
  • cbr: cubic root
  • par: parabola
  • exp: index
  • iqsin: inverse quarter sine wave
  • ihsin: inverse half sine wave
  • dese: double exponent
  • desi: biexponential curve
  • losi: regression curve
  • sinc: sine cardinality function
  • isinc: arcsine cardinality function
  • nofade: no fade in and out

Similarly, fade-in and fade-out effects are set for different sampling formats using macro definitions, coded in af_afade.c, divided into FADE_ PLANAR and FADE are two forms:

#define FADE_PLANAR(name, type)                                             \
static void fade_samples_## name ##p(uint8_t **dst, uint8_t * const *src,   \
                                     int nb_samples, int channels, int dir, \
                                     int64_t start, int64_t range, int curve) \
{                                                                           \
    int i, c;                                                               \
                                                                            \
    for (i = 0; i < nb_samples; i++) {                                      \
        double gain = fade_gain(curve, start + i * dir, range);             \
        for (c = 0; c < channels; c++) {                                    \
            type *d = (type *)dst[c];                                       \
            const type *s = (type *)src[c];                                 \
                                                                            \
            d[i] = s[i] * gain;                                             \
        }                                                                   \
    }                                                                       \
}

#define FADE(name, type)                                                    \
static void fade_samples_## name (uint8_t **dst, uint8_t * const *src,      \
                                  int nb_samples, int channels, int dir,    \
                                  int64_t start, int64_t range, int curve)  \
{                                                                           \
    type *d = (type *)dst[0];                                               \
    const type *s = (type *)src[0];                                         \
    int i, c, k = 0;                                                        \
                                                                            \
    for (i = 0; i < nb_samples; i++) {                                      \
        double gain = fade_gain(curve, start + i * dir, range);             \
        for (c = 0; c < channels; c++, k++)                                 \
            d[k] = s[k] * gain;                                             \
    }                                                                       \
}

4,adeclick

Remove impulse noise from the input signal. Samples detected as impulse noise are replaced by interpolated samples using an autoregressive model. The parameter options are as follows:

window, w: Sets the window function size in ms. Default is 55, range [10, 100]

overlap, o: Set the overlap ratio of the window weight, default is 75, range [50, 95]

arorder, a: set autoregressive order, default 2, range [0, 25]

threshold, t: Set threshold, default is 2, range [1, 100]

burst, b: set the fusion factor, default is 2, range [0, 10]

method, m: Set the overlap method, which can be add, a or save, s

5,adelay

Delay effect, delayed sampling of the channel uses silent filling. The code is in libavfilter/af_adelay.c. Use the macro definition to mute the corresponding sampling format. If the u8 type is filled with 0x80 and the other types are filled with 0x00, the core code is as follows:

#define DELAY(name, type, fill)                                           \
static void delay_channel_## name ##p(ChanDelay *d, int nb_samples,       \
                                      const uint8_t *ssrc, uint8_t *ddst) \
{                                                                         \
    const type *src = (type *)ssrc;                                       \
    type *dst = (type *)ddst;                                             \
    type *samples = (type *)d->samples;                                   \
                                                                          \
    while (nb_samples) {                                                  \
        if (d->delay_index < d->delay) {                                  \
            const int len = FFMIN(nb_samples, d->delay - d->delay_index); \
                                                                          \
            memcpy(&samples[d->delay_index], src, len * sizeof(type));    \
            memset(dst, fill, len * sizeof(type));                        \
            d->delay_index += len;                                        \
            src += len;                                                   \
            dst += len;                                                   \
            nb_samples -= len;                                            \
        } else {                                                          \
            *dst = samples[d->index];                                     \
            samples[d->index] = *src;                                     \
            nb_samples--;                                                 \
            d->index++;                                                   \
            src++, dst++;                                                 \
            d->index = d->index >= d->delay ? 0 : d->index;               \
        }                                                                 \
    }                                                                     \
}

DELAY(u8,  uint8_t, 0x80)
DELAY(s16, int16_t, 0)
DELAY(s32, int32_t, 0)
DELAY(flt, float,   0)
DELAY(dbl, double,  0)

6,aecho

Echo effect, add echo to audio stream. Echoes are reflections of sound that naturally occur in hills or rooms. Digital echo signals can simulate this effect by adjusting the delay time and attenuation factor of the original and reflected sound. The original sound, also known as dry sound, is reflected and claimed to be wet. The parameter options are as follows:

in_gain: Input gain of reflected sound, default is 0.6

out_gain: output gain of reflected sound, default 0.3

delays: The delay interval for each reflected sound, separated by'|', defaults to 1000 and ranges from (0,90000.0)

decays: The attenuation factor of each reflected sound, separated by'|', defaults to 0 and ranges from (0, 1.0)

For example, to simulate echo between mountains, the reference commands are as follows:

aecho=0.8:0.9:1000:0.3

Code in af_aecho.c. Use macro definition to set echo for different sampling formats:

#define ECHO(name, type, min, max)                                          \
static void echo_samples_## name ##p(AudioEchoContext *ctx,                 \
                                     uint8_t **delayptrs,                   \
                                     uint8_t * const *src, uint8_t **dst,   \
                                     int nb_samples, int channels)          \
{                                                                           \
    const double out_gain = ctx->out_gain;                                  \
    const double in_gain = ctx->in_gain;                                    \
    const int nb_echoes = ctx->nb_echoes;                                   \
    const int max_samples = ctx->max_samples;                               \
    int i, j, chan, av_uninit(index);                                       \
                                                                            \
    av_assert1(channels > 0); /* would corrupt delay_index */               \
                                                                            \
    for (chan = 0; chan < channels; chan++) {                               \
        const type *s = (type *)src[chan];                                  \
        type *d = (type *)dst[chan];                                        \
        type *dbuf = (type *)delayptrs[chan];                               \
                                                                            \
        index = ctx->delay_index;                                           \
        for (i = 0; i < nb_samples; i++, s++, d++) {                        \
            double out, in;                                                 \
                                                                            \
            in = *s;                                                        \
            out = in * in_gain;                                             \
            for (j = 0; j < nb_echoes; j++) {                               \
                int ix = index + max_samples - ctx->samples[j];             \
                ix = MOD(ix, max_samples);                                  \
                out += dbuf[ix] * ctx->decay[j];                            \
            }                                                               \
            out *= out_gain;                                                \
                                                                            \
            *d = av_clipd(out, min, max);                                   \
            dbuf[index] = in;                                               \
                                                                            \
            index = MOD(index + 1, max_samples);                            \
        }                                                                   \
    }                                                                       \
    ctx->delay_index = index;                                               \
}

ECHO(dbl, double,  -1.0,      1.0      )
ECHO(flt, float,   -1.0,      1.0      )
ECHO(s16, int16_t, INT16_MIN, INT16_MAX)
ECHO(s32, int32_t, INT32_MIN, INT32_MAX)

7,agate

A noise gate used to reduce low frequency signals and to eliminate interference noise from useful signals. Divide the signal below the threshold by a set scale factor. The parameter options are as follows:

level_in: Input level, default 0, range [0.015625, 64]

Mode: operation mode upward or downward.. Default to downward

Range: gain attenuation range, default 0.06125, range [0, 1]

Threshold: The threshold for gain gain gain gain gain gain enhancement, default 0.125, range [0, 1]

ratio: The scale factor for gain decay, default is 2, range [1,9000]

attack: signal amplification time, default to 20ms, range [0.01, 9000]

release: signal attenuation time, default 250ms. Range [0.01, 9000]

makeup: signal amplification factor, default 1, range [1, 64]

detection: Probe mode, peak or rms, default is RMS

link: attenuation mode, average or maximum, defaults to average

8,alimiter

A clipper used to prevent an input signal from exceeding a set threshold. Using forward prediction to avoid signal distortion means that there is a slight delay in signal processing. The parameter options are as follows:

level_in: Input gain, default 1

level_out: Output gain, default to 1

Limit: limit signal does not exceed threshold, default is 1

attack: signal amplification time, default is 5 ms

release: signal attenuation time, default is 50ms

Asc: ASC is responsible for reducing gain to an average level when needed

asc_level: decay time level, 0 means no extra time, 1 means extra time

level: auto-adjust output signal, turn off by default

The code for the clipper is in af_alimiter.c, the core code is as follows:

static int filter_frame(AVFilterLink *inlink, AVFrame *in)
{
    ......
    // Loop detection for each sample
    for (n = 0; n < in->nb_samples; n++) {
        double peak = 0;
        
        for (c = 0; c < channels; c++) {
            double sample = src[c] * level_in;

            buffer[s->pos + c] = sample;
            peak = FFMAX(peak, fabs(sample));
        }

        if (s->auto_release && peak > limit) {
            s->asc += peak;
            s->asc_c++;
        }

        if (peak > limit) {
            double patt = FFMIN(limit / peak, 1.);
            double rdelta = get_rdelta(s, release, inlink->sample_rate,
                                       peak, limit, patt, 0);
            double delta = (limit / peak - s->att) / buffer_size * channels;
            int found = 0;

            if (delta < s->delta) {
                s->delta = delta;
                nextpos[0] = s->pos;
                nextpos[1] = -1;
                nextdelta[0] = rdelta;
                s->nextlen = 1;
                s->nextiter= 0;
            } else {
                for (i = s->nextiter; i < s->nextiter + s->nextlen; i++) {
                    int j = i % buffer_size;
                    double ppeak, pdelta;

                    ppeak = fabs(buffer[nextpos[j]]) > fabs(buffer[nextpos[j] + 1]) ?
                            fabs(buffer[nextpos[j]]) : fabs(buffer[nextpos[j] + 1]);
                    pdelta = (limit / peak - limit / ppeak) / (((buffer_size - nextpos[j] + s->pos) % buffer_size) / channels);
                    if (pdelta < nextdelta[j]) {
                        nextdelta[j] = pdelta;
                        found = 1;
                        break;
                    }
                }
                if (found) {
                    s->nextlen = i - s->nextiter + 1;
                    nextpos[(s->nextiter + s->nextlen) % buffer_size] = s->pos;
                    nextdelta[(s->nextiter + s->nextlen) % buffer_size] = rdelta;
                    nextpos[(s->nextiter + s->nextlen + 1) % buffer_size] = -1;
                    s->nextlen++;
                }
            }
        }

        buf = &s->buffer[(s->pos + channels) % buffer_size];
        peak = 0;
        for (c = 0; c < channels; c++) {
            double sample = buf[c];

            peak = FFMAX(peak, fabs(sample));
        }

        if (s->pos == s->asc_pos && !s->asc_changed)
            s->asc_pos = -1;

        if (s->auto_release && s->asc_pos == -1 && peak > limit) {
            s->asc -= peak;
            s->asc_c--;
        }

        s->att += s->delta;

        for (c = 0; c < channels; c++)
            dst[c] = buf[c] * s->att;

        if ((s->pos + channels) % buffer_size == nextpos[s->nextiter]) {
            if (s->auto_release) {
                s->delta = get_rdelta(s, release, inlink->sample_rate,
                                      peak, limit, s->att, 1);
                if (s->nextlen > 1) {
                    int pnextpos = nextpos[(s->nextiter + 1) % buffer_size];
                    double ppeak = fabs(buffer[pnextpos]) > fabs(buffer[pnextpos + 1]) ?
                                                            fabs(buffer[pnextpos]) :
                                                            fabs(buffer[pnextpos + 1]);
                    double pdelta = (limit / ppeak - s->att) /
                                    (((buffer_size + pnextpos -
                                    ((s->pos + channels) % buffer_size)) %
                                    buffer_size) / channels);
                    if (pdelta < s->delta)
                        s->delta = pdelta;
                }
            } else {
                s->delta = nextdelta[s->nextiter];
                s->att = limit / peak;
            }

            s->nextlen -= 1;
            nextpos[s->nextiter] = -1;
            s->nextiter = (s->nextiter + 1) % buffer_size;
        }

        if (s->att > 1.) {
            s->att = 1.;
            s->delta = 0.;
            s->nextiter = 0;
            s->nextlen = 0;
            nextpos[0] = -1;
        }

        if (s->att <= 0.) {
            s->att = 0.0000000000001;
            s->delta = (1.0 - s->att) / (inlink->sample_rate * release);
        }

        if (s->att != 1. && (1. - s->att) < 0.0000000000001)
            s->att = 1.;

        if (s->delta != 0. && fabs(s->delta) < 0.00000000000001)
            s->delta = 0.;

        for (c = 0; c < channels; c++)
            dst[c] = av_clipd(dst[c], -limit, limit) * level * level_out;

        s->pos = (s->pos + channels) % buffer_size;
        src += channels;
        dst += channels;
    }

    if (in != out)
        av_frame_free(&in);

    return ff_filter_frame(outlink, out);
}