AAC coding for Android audio and video processing

Posted by raydar2000 on Sat, 15 Jan 2022 23:59:22 +0100

Original address: https://www.jianshu.com/p/839b11e0638b

AAC is an audio coding format. AAC usually has a compression ratio of 18:1. Some data say it is 20:1, which is far better than mp3.

AAC audio formats include ADIF and ADTS:

ADIF: Audio Data Interchange Format. The characteristic of this format is that it can be determined to find the beginning of the audio data without decoding in the middle of the audio data stream, that is, its decoding must be carried out at the clearly defined beginning. Therefore, this format is often used in disk files.

ADTS: Audio Data Transport Stream. The characteristic of this format is that it is a bit stream with synchronous words, and decoding can start anywhere in the stream. Its characteristics are similar to mp3 data stream format.

In short, ADTS can be decoded in any frame, that is, it has header information in each frame. ADIF has only one unified header, so all data must be decoded. The formats of the two headers are also different. At present, the encoded and extracted audio streams are ADTS format.

ADTS is a frame sequence with stream characteristics, which is more suitable for audio stream transmission and processing.

Let's analyze ADTS:

ADTS AAC
ADTS_headerAAC ESADTS_headerAAC ES...ADTS_headerAAC ES

You can see that each frame of ADTS has header information, that is, ADTS_header, the relatively useful information in ADTS header is sampling rate, channel number and frame length. Generally, ADTS header information is 7 bytes, and if there is CRC, it is 9 bytes.

ADTS frame header structure:

Serial numberfieldLength (bits)explain
1Syncword12all bits must be 1
2MPEG version10 for MPEG-4, 1 for MPEG-2
3Layer2always 0
4Protection Absent1et to 1 if there is no CRC and 0 if there is CRC
5Profile2the MPEG-4 Audio Object Type minus 1
6MPEG-4 Sampling Frequency Index4MPEG-4 Sampling Frequency Index (15 is forbidden)
7Private Stream1set to 0 when encoding, ignore when decoding
8MPEG-4 Channel Configuration3MPEG-4 Channel Configuration (in the case of 0, the channel configuration is sent via an inband PCE)
9Originality1set to 0 when encoding, ignore when decoding
10Home1set to 0 when encoding, ignore when decoding
11Copyrighted Stream1set to 0 when encoding, ignore when decoding
12Copyrighted Start1set to 0 when encoding, ignore when decoding
13Frame Length13this value must include 7 or 9 bytes of header length: FrameLength = (ProtectionAbsent == 1 ? 7 : 9) + size(AACFrame)
14Buffer Fullness11buffer fullness
15Number of AAC Frames2number of AAC frames (RDBs) in ADTS frame minus 1, for maximum compatibility always use 1 AAC frame per ADTS frame
16CRC16CRC if protection absent is 0

Generation of ADTS header:

/**
 * Add ADTS header
 *
 * @param packet    ADTS header byte [], length 7
 * @param packetLen The length of the frame, including the length of the header
 */
private void addADTStoPacket(byte[] packet, int packetLen) {
    int profile = 2; // AAC LC
    int freqIdx = 3; // 48000Hz
    int chanCfg = 2; // 2 Channel

    packet[0] = (byte) 0xFF;
    packet[1] = (byte) 0xF9;
    packet[2] = (byte) (((profile - 1) << 6) + (freqIdx << 2) + (chanCfg >> 2));
    packet[3] = (byte) (((chanCfg & 3) << 6) + (packetLen >> 11));
    packet[4] = (byte) ((packetLen & 0x7FF) >> 3);
    packet[5] = (byte) (((packetLen & 7) << 5) + 0x1F);
    packet[6] = (byte) 0xFC;
}

The profile indicates which level of AAC is used. Three types are defined in MPEG-2 AAC:

AAC three levels

freqIdx indicates the subscript of the sampling rate used. Find the value of the sampling rate in the Sampling Frequencies [] array through this subscript:

  • 0: 96000 Hz
  • 1: 88200 Hz
  • 2: 64000 Hz
  • 3: 48000 Hz
  • 4: 44100 Hz
  • 5: 32000 Hz
  • 6: 24000 Hz
  • 7: 22050 Hz
  • 8: 16000 Hz
  • 9: 12000 Hz
  • 10: 11025 Hz
  • 11: 8000 Hz
  • 12: 7350 Hz
  • 13: Reserved
  • 14: Reserved
  • 15: frequency is written explictly

chanCfg indicates the number of channels:

  • 0: Defined in AOT Specifc Config
  • 1: 1 channel: front-center
  • 2: 2 channels: front-left, front-right
  • 3: 3 channels: front-center, front-left, front-right
  • 4: 4 channels: front-center, front-left, front-right, back-center
  • 5: 5 channels: front-center, front-left, front-right, back-left, back-right
  • 6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
  • 7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
  • 8-15: Reserved

Analysis of AAC:

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.HashMap;
import java.util.Map;

public class AACHelper {
    // Sampling frequency comparison table
    private static Map<Integer, Integer> samplingFrequencyIndexMap = new HashMap<>();

    static {
        samplingFrequencyIndexMap.put(96000, 0);
        samplingFrequencyIndexMap.put(88200, 1);
        samplingFrequencyIndexMap.put(64000, 2);
        samplingFrequencyIndexMap.put(48000, 3);
        samplingFrequencyIndexMap.put(44100, 4);
        samplingFrequencyIndexMap.put(32000, 5);
        samplingFrequencyIndexMap.put(24000, 6);
        samplingFrequencyIndexMap.put(22050, 7);
        samplingFrequencyIndexMap.put(16000, 8);
        samplingFrequencyIndexMap.put(12000, 9);
        samplingFrequencyIndexMap.put(11025, 10);
        samplingFrequencyIndexMap.put(8000, 11);
        samplingFrequencyIndexMap.put(0x0, 96000);
        samplingFrequencyIndexMap.put(0x1, 88200);
        samplingFrequencyIndexMap.put(0x2, 64000);
        samplingFrequencyIndexMap.put(0x3, 48000);
        samplingFrequencyIndexMap.put(0x4, 44100);
        samplingFrequencyIndexMap.put(0x5, 32000);
        samplingFrequencyIndexMap.put(0x6, 24000);
        samplingFrequencyIndexMap.put(0x7, 22050);
        samplingFrequencyIndexMap.put(0x8, 16000);
        samplingFrequencyIndexMap.put(0x9, 12000);
        samplingFrequencyIndexMap.put(0xa, 11025);
        samplingFrequencyIndexMap.put(0xb, 8000);
    }

    private AdtsHeader mAdtsHeader = new AdtsHeader();
    private BitReader mHeaderBitReader = new BitReader(new byte[7]);
    private byte[] mSkipTwoBytes = new byte[2];
    private FileInputStream mFileInputStream;
    private byte[] mBytes = new byte[1024];

    /**
     * Constructor to create an input stream by passing in the file path
     *
     * @param aacFilePath AAC File path
     * @throws FileNotFoundException
     */
    public AACHelper(String aacFilePath) throws FileNotFoundException {
        mFileInputStream = new FileInputStream(aacFilePath);
    }

    /**
     * Get next Sample data
     *
     * @param byteBuffer ByteBuffer for storing Sample data
     * @return byte [] size of the current Sample. If it is empty, - 1 will be returned
     * @throws IOException
     */
    public int getSample(ByteBuffer byteBuffer) throws IOException {
        if (readADTSHeader(mAdtsHeader, mFileInputStream)) {
            int length = mFileInputStream.read(mBytes, 0, mAdtsHeader.frameLength - mAdtsHeader.getSize());
            byteBuffer.clear();
            byteBuffer.put(mBytes, 0, length);
            byteBuffer.position(0);
            byteBuffer.limit(length);
            return length;
        }
        return -1;
    }

    /**
     * Read ADTS header from AAC file stream
     *
     * @param adtsHeader      ADTS head
     * @param fileInputStream AAC File stream
     * @return Read successfully
     * @throws IOException
     */
    private boolean readADTSHeader(AdtsHeader adtsHeader, FileInputStream fileInputStream) throws IOException {
        if (fileInputStream.read(mHeaderBitReader.buffer) < 7) {
            return false;
        }

        mHeaderBitReader.position = 0;

        int syncWord = mHeaderBitReader.readBits(12); // A
        if (syncWord != 0xfff) {
            throw new IOException("Expected Start Word 0xfff");
        }
        adtsHeader.mpegVersion = mHeaderBitReader.readBits(1); // B
        adtsHeader.layer = mHeaderBitReader.readBits(2); // C
        adtsHeader.protectionAbsent = mHeaderBitReader.readBits(1); // D
        adtsHeader.profile = mHeaderBitReader.readBits(2) + 1;  // E
        adtsHeader.sampleFrequencyIndex = mHeaderBitReader.readBits(4);
        adtsHeader.sampleRate = samplingFrequencyIndexMap.get(adtsHeader.sampleFrequencyIndex); // F
        mHeaderBitReader.readBits(1); // G
        adtsHeader.channelconfig = mHeaderBitReader.readBits(3); // H
        adtsHeader.original = mHeaderBitReader.readBits(1); // I
        adtsHeader.home = mHeaderBitReader.readBits(1); // J
        adtsHeader.copyrightedStream = mHeaderBitReader.readBits(1); // K
        adtsHeader.copyrightStart = mHeaderBitReader.readBits(1); // L
        adtsHeader.frameLength = mHeaderBitReader.readBits(13); // M
        adtsHeader.bufferFullness = mHeaderBitReader.readBits(11); // 54
        adtsHeader.numAacFramesPerAdtsFrame = mHeaderBitReader.readBits(2) + 1; // 56
        if (adtsHeader.numAacFramesPerAdtsFrame != 1) {
            throw new IOException("This muxer can only work with 1 AAC frame per ADTS frame");
        }
        if (adtsHeader.protectionAbsent == 0) {
            fileInputStream.read(mSkipTwoBytes);
        }
        return true;
    }

    /**
     * Release resources
     *
     * @throws IOException
     */
    public void release() throws IOException {
        mFileInputStream.close();
    }

    /**
     * ADTS head
     */
    private class AdtsHeader {
        int getSize() {
            return 7 + (protectionAbsent == 0 ? 2 : 0);
        }

        int sampleFrequencyIndex;

        int mpegVersion;
        int layer;
        int protectionAbsent;
        int profile;
        int sampleRate;

        int channelconfig;
        int original;
        int home;
        int copyrightedStream;
        int copyrightStart;
        int frameLength;
        int bufferFullness;
        int numAacFramesPerAdtsFrame;
    }
}

 

Topics: aac