On the recording and playing process of audio in webrtc

Posted by snascendi on Sat, 27 Nov 2021 05:08:39 +0100


This article is based on the PineAppRtc project github.com/thfhongfeng...

In webrtc, the recording and playing of audio are encapsulated inside. Generally, we don't need to pay attention to it. We can use it directly.

But recently, there is a requirement to transmit our own data, so we need to expose these interfaces for use. So we need to study its source code, so we have this article.

Audio engine

In fact, there is more than one set of audio engine in webrtc, including the native layer implemented by OpenSL ES and the java layer implemented by android api.

Note here that the java layer is in audio_ device_ In java. Jar, the package name is org.webrtc.voiceengine. However, there is a package named org.webrtc.audio in the latest webrtc code on the official website, which seems to replace the previous one.

However, the only version used in the PineAppRtc project is org.webrtc.voiceengine.

OpenSL ES is used by default. But you can use

WebRtcAudioManager.setBlacklistDeviceForOpenSLESUsage(true /* enable */);

Disable this set, so that the set of engines in the java layer will be used.

So how can we expose them? We can directly put the source code of the package under the project, and then delete the jar package, so that we can directly modify the code.

Send data (recording)

In audio_ device_ The webrtcaaudiorecord class in Java. Jar is responsible for recording.

This class and the following functions are automatically called at the bottom of webrtc, so we don't need to consider the source of parameters and just know how to use them.

The first is the constructor

WebRtcAudioRecord(long nativeAudioRecord) { 
    this.nativeAudioRecord = nativeAudioRecord; 

This nativeAudioRecord is very important and is an important parameter for subsequent calls to the interface.

Let's look at the init function

private int initRecording(int sampleRate, int channels) {
    if (this.audioRecord != null) {
        this.reportWebRtcAudioRecordInitError("InitRecording called twice without StopRecording.");
        return -1;
    } else {
        int bytesPerFrame = channels * 2;
        int framesPerBuffer = sampleRate / 100;
        this.byteBuffer = ByteBuffer.allocateDirect(bytesPerFrame * framesPerBuffer);
        this.emptyBytes = new byte[this.byteBuffer.capacity()];
        this.nativeCacheDirectBufferAddress(this.byteBuffer, this.nativeAudioRecord);
    return framesPerBuffer;

The two parameters are sampling rate and channel (1 is mono and 2 is dual). These two parameters are also very important. They are selected by webrtc through early socket negotiation. We can also modify these two parameters, which will be described later.

Note that the capacity of bytebuffer cannot be modified here, because the bottom layer will perform verification. This size can only be (sampling rate / 100 * number of channels * 2), which is actually sending data 100 times per second.

If the size is changed, the native layer will crash, and the error is Check failed: frames_per_buffer_ == audio_parameters_.frames_per_10ms_buffer() (xxx vs. xxx)

The most important function is the nativeCacheDirectBufferAddress function. You can see that a bytebuffer and nativeAudioRecord are passed in, which will be used later.

After the nativeCacheDirectBufferAddress, the AudioRecorder is initialized.

Then look at startRecording

private boolean startRecording() {
    if (this.audioRecord.getRecordingState() != 3) {
    } else {
        this.audioThread = new WebRtcAudioRecord.AudioRecordThread("AudioRecordJavaThread");
        return true;

You can see that a thread is started and what is done in the thread

public void run() {
    while(this.keepAlive) {
        int bytesRead = WebRtcAudioRecord.this.audioRecord.read(WebRtcAudioRecord.this.byteBuffer, WebRtcAudioRecord.this.byteBuffer.capacity());
        if (bytesRead == WebRtcAudioRecord.this.byteBuffer.capacity()) {
            if (this.keepAlive) {
                WebRtcAudioRecord.this.nativeDataIsRecorded(bytesRead, WebRtcAudioRecord.this.nativeAudioRecord);
        } else {

After getting the data from record, the nativeDataIsRecorded function is called.

Here you can see the bytebuffer before the data is passed in when getting data from record. When calling nativedata isrecorded, only the length and nativeAudioRecord are passed in.

Therefore, you can see that if you want to use your own data (that is, do not record), you need to have a nativeAudioRecord (obtained through the constructor); Then call nativeCacheDirectBufferAddress to initialize. Then, loop to write data to bytebuffer, call nativedata isrecorded once and send it out.

Receive data (playback)

In audio_ device_ Webrtcaaudiotrack in java.jar is responsible for playing.

This class and the following functions are also called automatically at the bottom of webrtc, so we don't need to consider the source of parameters, just know how to use them.

Again, first the constructor

WebRtcAudioTrack(long nativeAudioTrack) {
    this.nativeAudioTrack = nativeAudioTrack;

Similarly, nativeAudioTrack is very important, which is similar to the nativeAudioRecord above

Then let's look at the init function

private boolean initPlayout(int sampleRate, int channels) {
    int bytesPerFrame = channels * 2;
    this.byteBuffer = ByteBuffer.allocateDirect(bytesPerFrame * (sampleRate / 100));
    this.emptyBytes = new byte[this.byteBuffer.capacity()];
    this.nativeCacheDirectBufferAddress(this.byteBuffer, this.nativeAudioTrack);
    return true;

The sampling rate and channel are the same as above. A bytebuffer is also created here and the nativeCacheDirectBufferAddress is passed in.

The bytebuffer capacity here is the same as that of recording, and cannot be changed at will, otherwise it will crash.

Then look at the start function

private boolean startPlayout() {
    if (this.audioTrack.getPlayState() != 3) {
    } else {
        this.audioThread = new WebRtcAudioTrack.AudioTrackThread("AudioTrackJavaThread");
        return true;

It also opens a thread. In the thread

public void run() {
    for(int sizeInBytes = WebRtcAudioTrack.this.byteBuffer.capacity(); this.keepAlive; WebRtcAudioTrack.this.byteBuffer.rewind()) {
        WebRtcAudioTrack.this.nativeGetPlayoutData(sizeInBytes, WebRtcAudioTrack.this.nativeAudioTrack);
        int bytesWritten;
        if (WebRtcAudioUtils.runningOnLollipopOrHigher()) {
            bytesWritten = this.writeOnLollipop(WebRtcAudioTrack.this.audioTrack, WebRtcAudioTrack.this.byteBuffer, sizeInBytes);
        } else {
            bytesWritten = this.writePreLollipop(WebRtcAudioTrack.this.audioTrack, WebRtcAudioTrack.this.byteBuffer, sizeInBytes);

In fact, it's similar to the recording logic, but here we call nativeGetPlayoutData to let the bottom layer write the received data into the bytebuffer, and then play it through the write function (these two write functions finally call the write function of AudioTrack).

Therefore, if we want to process the received data by ourselves, we just need to call nativeGetPlayoutData here, and then read the data from bytebuffer for self processing. The following codes can be deleted.

The summary is the same as the recording. First, the constructor takes the value of nativeAudioTrack, then creates a bytebuffer and passes in nativeCacheDirectBufferAddress, and then circularly calls nativeGetPlayoutData to obtain data processing

Setting of sampling rate, channel, etc

The setting of these parameters is negotiated by both parties. One party should send the parameters that can be supported to the other party, the other party selects an appropriate return according to the parameters that can be supported, and then both parties process the data with this parameter.

But can we intervene in this process? For example, there may be more than one supported by both sides. We don't want to use the appropriate one selected automatically. What should we do?

In audio_ device_ There are also two classes in Java. Jar: webrtcoaudiomanager and webrtcoaudiutils

Some settings can be made in these two, such as

sampling rate

In webrtcoaudiomanager

private int getNativeOutputSampleRate() {
//        if (WebRtcAudioUtils.runningOnEmulator()) {
//            Logging.d("WebRtcAudioManager", "Running emulator, overriding sample rate to 8 kHz.");
//            return 8000;
//        } else if (WebRtcAudioUtils.isDefaultSampleRateOverridden()) {
//            Logging.d("WebRtcAudioManager", "Default sample rate is overriden to " + WebRtcAudioUtils.getDefaultSampleRateHz() + " Hz");
//            return WebRtcAudioUtils.getDefaultSampleRateHz();
//        } else {
//            int sampleRateHz;
//            if (WebRtcAudioUtils.runningOnJellyBeanMR1OrHigher()) {
//                sampleRateHz = this.getSampleRateOnJellyBeanMR10OrHigher();
//            } else {
//                sampleRateHz = WebRtcAudioUtils.getDefaultSampleRateHz();
//            }
//            Logging.d("WebRtcAudioManager", "Sample rate is set to " + sampleRateHz + " Hz");
//            return sampleRateHz;
//        }
    return 16000;

Remove the original code and directly return the sampling rate we want.

Vocal tract

Also in webrtcoaudiomanager

public static synchronized boolean getStereoOutput() {
    return useStereoOutput;

public static synchronized boolean getStereoInput() {
    return useStereoInput;

Because the return values of these two directly affect the number of channels:

private void storeAudioParameters() {
    this.outputChannels = getStereoOutput() ? 2 : 1;
    this.inputChannels = getStereoInput() ? 2 : 1;
    this.sampleRate = this.getNativeOutputSampleRate();
    this.hardwareAEC = isAcousticEchoCancelerSupported();
    this.hardwareAGC = false;
    this.hardwareNS = isNoiseSuppressorSupported();
    this.lowLatencyOutput = this.isLowLatencyOutputSupported();
    this.lowLatencyInput = this.isLowLatencyInputSupported();
    this.proAudio = this.isProAudioSupported();
    this.outputBufferSize = this.lowLatencyOutput ? this.getLowLatencyOutputFramesPerBuffer() : getMinOutputFrameSize(this.sampleRate, this.outputChannels);
    this.inputBufferSize = this.lowLatencyInput ? this.getLowLatencyInputFramesPerBuffer() : getMinInputFrameSize(this.sampleRate, this.inputChannels);

You can see that there are other settings in the above code, which can be modified if necessary.


Here, we just briefly analyzed the process of recording and playing, and know where we should start and how to transmit the existing audio and obtain each other's audio data. As for the transformation and subsequent processing, we can play by ourselves.

Topics: Android Design Pattern kotlin webrtc