Android OpenGL ES realizes the visualization of real-time audio

Posted by prashant0812 on Tue, 21 Dec 2021 04:33:41 +0100

Visualization of real-time audio with OpenGL ES

Visual real-time audio

1. Collection of audio and video data

OpenGL has a clear idea of realizing visual real-time audio. Uncoded audio raw data (PCM data) can be collected by using the API AudioRecorder of Java layer, or by using OpenSLES interface in the Native layer.

Then the collected audio data is regarded as a group of audio intensity values, and then the grid is generated according to this group of intensity values, and finally the real-time rendering is carried out.

For the convenience of demonstration, this paper directly uses the API AudioRecorder of Android to collect raw audio data, then passes it into the Native layer through JNI, and finally generates a grid for rendering.

When using AudioRecorder, the acquisition format is ENCODING_PCM_16BIT audio data needs to be understood: the byte storage mode of the collected audio data in the memory is small endian (small endian), that is, the low address stores the low bit and the high address stores the high bit. Therefore, special attention should be paid to converting 2 bytes into short data.

In addition, the large end sequence is opposite to the small end sequence, that is, the low address stores the high order and the high address stores the low order.

Little endian small end sequence

Big endian large end sequence

The following methods can be used to convert byte data stored in small and medium-sized end order into short value in Java:

byte firstByte = 0x10, secondByte = 0x01; //0x0110
ByteBuffer bb = ByteBuffer.allocate(2);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(firstByte);
bb.put(secondByte);
short shortVal = bb.getShort(0);

In order to avoid the trouble of data conversion, the AudioRecorder class of Android also provides an API that can directly output short array audio data. I found it after stepping on the pit.

public int read(short[] audioData, int offsetInShorts, int sizeInShorts, int readMode)

Android uses the AudioRecorder to collect audio, which is simply encapsulated in the Java layer:

public class AudioCollector implements AudioRecord.OnRecordPositionUpdateListener{
    private static final String TAG = "AudioRecorderWrapper";
    private static final int RECORDER_SAMPLE_RATE = 44100; //sampling rate
    private static final int RECORDER_CHANNELS = 1; //Number of channels
    private static final int RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT; //Audio format
    private static final int RECORDER_ENCODING_BIT = 16;
    private AudioRecord mAudioRecord;
    private Thread mThread;
    private short[] mAudioBuffer;
    private Handler mHandler;
    private int mBufferSize;
    private Callback mCallback;

    public AudioCollector() {
        //Calculate buffer size
        mBufferSize = 2 * AudioRecord.getMinBufferSize(RECORDER_SAMPLE_RATE,
                RECORDER_CHANNELS, RECORDER_AUDIO_ENCODING);
    }

    public void init() {
        mAudioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, RECORDER_SAMPLE_RATE,
                RECORDER_CHANNELS, RECORDER_AUDIO_ENCODING, mBufferSize);
        mAudioRecord.startRecording();
        //Continuously collect audio data in a new working thread
        mThread = new Thread("Audio-Recorder") {
            @Override
            public void run() {
                super.run();
                mAudioBuffer = new short[mBufferSize];
                Looper.prepare();
                mHandler = new Handler(Looper.myLooper());
                //Via audiorecord Onrecordpositionupdatelistener keeps collecting audio data
                mAudioRecord.setRecordPositionUpdateListener(AudioCollector.this, mHandler);
                int bytePerSample = RECORDER_ENCODING_BIT / 8;
                float samplesToDraw = mBufferSize / bytePerSample;
                mAudioRecord.setPositionNotificationPeriod((int) samplesToDraw);
                mAudioRecord.read(mAudioBuffer, 0, mBufferSize);
                Looper.loop();
            }
        };
        mThread.start();
    }

    public void unInit() {
        if(mAudioRecord != null) {
            mAudioRecord.stop();
            mAudioRecord.release();
            mHandler.getLooper().quitSafely();
            mHandler = null;
            mAudioRecord = null;
        }
    }

    public void addCallback(Callback callback) {
        mCallback = callback;
    }

    @Override
    public void onMarkerReached(AudioRecord recorder) {

    }

    @Override
    public void onPeriodicNotification(AudioRecord recorder) {
        if (mAudioRecord.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING
                && mAudioRecord.read(mAudioBuffer, 0, mAudioBuffer.length) != -1)
        {
            if(mCallback != null)
                //The audio data is transferred to the Native layer through the interface callback
                mCallback.onAudioBufferCallback(mAudioBuffer);
        }

    }

    public interface Callback {
        void onAudioBufferCallback(short[] buffer);
    }
}

2 Audio visualization

The PCM audio data (short type array) collected by the AudioRecorder is obtained in the Native layer, and then the S-axis of the texture coordinate system is divided equidistantly according to the length of the array, and then the bar graph is constructed with the value in the array (similar to the intensity value of sound) as the height to generate the corresponding texture coordinates and vertex coordinates.

Build bar graph

Because the array corresponding to "one frame" audio data is relatively large, the drawn audio bar graph has become a lump. In order to intuitively represent the audio in the time domain, it is also necessary to sample the data before drawing.

float dx = 1.0f / m_RenderDataSize;
for (int i = 0; i < m_RenderDataSize; ++i) {
    int index = i * RESAMPLE_LEVEL; //RESAMPLE_LEVEL indicates the sampling interval
    float y = m_pAudioData[index] * dy * -1;
    y = y < 0 ? y : -y; //Indicates that the value of the audio becomes positive

    //Construct 4 points of the strip rectangle
    vec2 p1(i * dx, 0 + 1.0f);
    vec2 p2(i * dx, y + 1.0f);
    vec2 p3((i + 1) * dx, y + 1.0f);
    vec2 p4((i + 1) * dx, 0 + 1.0f);

    //Construction texture coordinates
    m_pTextureCoords[i * 6 + 0] = p1;
    m_pTextureCoords[i * 6 + 1] = p2;
    m_pTextureCoords[i * 6 + 2] = p3;
    m_pTextureCoords[i * 6 + 3] = p1;
    m_pTextureCoords[i * 6 + 4] = p3;
    m_pTextureCoords[i * 6 + 5] = p4;
    m_pTextureCoords[i * 6 + 2] = p4;
    m_pTextureCoords[i * 6 + 3] = p4;
    m_pTextureCoords[i * 6 + 4] = p2;
    m_pTextureCoords[i * 6 + 5] = p3;

    //Construct vertex coordinates and convert texture coordinates to vertex coordinates
    m_pVerticesCoords[i * 6 + 0] = GLUtils::texCoordToVertexCoord(p1);
    m_pVerticesCoords[i * 6 + 1] = GLUtils::texCoordToVertexCoord(p2);
    m_pVerticesCoords[i * 6 + 2] = GLUtils::texCoordToVertexCoord(p3);
    m_pVerticesCoords[i * 6 + 3] = GLUtils::texCoordToVertexCoord(p1);
    m_pVerticesCoords[i * 6 + 4] = GLUtils::texCoordToVertexCoord(p3);
    m_pVerticesCoords[i * 6 + 5] = GLUtils::texCoordToVertexCoord(p4);
    m_pVerticesCoords[i * 6 + 2] = GLUtils::texCoordToVertexCoord(p4);
    m_pVerticesCoords[i * 6 + 3] = GLUtils::texCoordToVertexCoord(p4);
    m_pVerticesCoords[i * 6 + 4] = GLUtils::texCoordToVertexCoord(p2);
    m_pVerticesCoords[i * 6 + 5] = GLUtils::texCoordToVertexCoord(p3);
}

The Java layer inputs "one frame" audio data, and the Native layer draws one frame:

void VisualizeAudioSample::Draw(int screenW, int screenH) {
    LOGCATE("VisualizeAudioSample::Draw()");
    if (m_ProgramObj == GL_NONE) return;

    //Add mutex lock to ensure the synchronization of audio data drawing and update
    std::unique_lock<std::mutex> lock(m_Mutex);

    //Update texture coordinates and vertex coordinates based on audio data
    UpdateMesh();

    UpdateMVPMatrix(m_MVPMatrix, m_AngleX, m_AngleY, (float) screenW / screenH);

    // Generate VBO Ids and load the VBOs with data
    if(m_VboIds[0] == 0)
    {
        glGenBuffers(2, m_VboIds);

        glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[0]);
        glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * m_RenderDataSize * 6 * 3, m_pVerticesCoords, GL_DYNAMIC_DRAW);

        glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[1]);
        glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * m_RenderDataSize * 6 * 2, m_pTextureCoords, GL_DYNAMIC_DRAW);
    }
    else
    {
        glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[0]);
        glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(GLfloat) * m_RenderDataSize * 6 * 3, m_pVerticesCoords);

        glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[1]);
        glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(GLfloat) * m_RenderDataSize * 6 * 2, m_pTextureCoords);
    }

    if(m_VaoId == GL_NONE)
    {
        glGenVertexArrays(1, &m_VaoId);
        glBindVertexArray(m_VaoId);

        glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[0]);
        glEnableVertexAttribArray(0);
        glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), (const void *) 0);
        glBindBuffer(GL_ARRAY_BUFFER, GL_NONE);

        glBindBuffer(GL_ARRAY_BUFFER, m_VboIds[1]);
        glEnableVertexAttribArray(1);
        glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (const void *) 0);
        glBindBuffer(GL_ARRAY_BUFFER, GL_NONE);

        glBindVertexArray(GL_NONE);
    }


    // Use the program object
    glUseProgram(m_ProgramObj);
    glBindVertexArray(m_VaoId);
    glUniformMatrix4fv(m_MVPMatLoc, 1, GL_FALSE, &m_MVPMatrix[0][0]);
    GLUtils::setFloat(m_ProgramObj, "drawType", 1.0f);
    glDrawArrays(GL_TRIANGLES, 0, m_RenderDataSize * 6);
    GLUtils::setFloat(m_ProgramObj, "drawType", 0.0f);
    glDrawArrays(GL_LINES, 0, m_RenderDataSize * 6);


}

The rendering results of real-time audio are as follows:

Rendering results of real-time audio

However, the above real-time audio rendering effect does not give people the feeling of time passing, that is, simply drawing one group and then drawing another group of data without any transition.

We draw audio data in the time domain (or transform it into the frequency domain through Fourier transform). If we want to draw the effect with the feeling of time passing, we need to perform offset rendering on the Buffer.

That is, gradually discard the old data and gradually add new data, so that the drawn effect has the feeling of time passing.

The pointer is offset in a fixed step

First, our Buffer should be doubled (or several times). Collect 2 frames of audio data to fill the Buffer. At this time, block the audio acquisition thread, and then notify the rendering thread (the data is ready) to draw. Then, the pointer to the Buffer will be offset according to a specific step, and the offset will be drawn once at a time.

When the pointer is offset to the boundary shown in the above figure, the data in the Buffer is drawn, and the rendering thread pauses drawing.

Then notify the audio acquisition thread to unblock, copy the data in Buffer2 into Buffer1, and receive new data into Buffer2. At this time, block the audio acquisition thread again, and notify the rendering thread that the data update is completed and can be drawn.

void VisualizeAudioSample::UpdateMesh() {
    //Set an offset step
    int step = m_AudioDataSize / 64;

    //Determines whether the pointer is offset to the boundary
    if(m_pAudioBuffer + m_AudioDataSize - m_pCurAudioData >= step)
    {

        float dy = 0.5f / MAX_AUDIO_LEVEL;
        float dx = 1.0f / m_RenderDataSize;
        for (int i = 0; i < m_RenderDataSize; ++i) {
            int index = i * RESAMPLE_LEVEL;
            float y = m_pCurAudioData[index] * dy * -1;
            y = y < 0 ? y : -y;
            vec2 p1(i * dx, 0 + 1.0f);
            vec2 p2(i * dx, y + 1.0f);
            vec2 p3((i + 1) * dx, y + 1.0f);
            vec2 p4((i + 1) * dx, 0 + 1.0f);

            m_pTextureCoords[i * 6 + 0] = p1;
            m_pTextureCoords[i * 6 + 1] = p2;
            m_pTextureCoords[i * 6 + 2] = p4;
            m_pTextureCoords[i * 6 + 3] = p4;
            m_pTextureCoords[i * 6 + 4] = p2;
            m_pTextureCoords[i * 6 + 5] = p3;

            m_pVerticesCoords[i * 6 + 0] = GLUtils::texCoordToVertexCoord(p1);
            m_pVerticesCoords[i * 6 + 1] = GLUtils::texCoordToVertexCoord(p2);
            m_pVerticesCoords[i * 6 + 2] = GLUtils::texCoordToVertexCoord(p4);
            m_pVerticesCoords[i * 6 + 3] = GLUtils::texCoordToVertexCoord(p4);
            m_pVerticesCoords[i * 6 + 4] = GLUtils::texCoordToVertexCoord(p2);
            m_pVerticesCoords[i * 6 + 5] = GLUtils::texCoordToVertexCoord(p3);
        }
        m_pCurAudioData += step;
    }
    else
    {   
        //Notify the audio collection thread to update the data when the offset reaches the boundary
        m_bAudioDataReady = false;
        m_Cond.notify_all();
        return;
    }
}

void VisualizeAudioSample::LoadShortArrData(short *const pShortArr, int arrSize) {
    if (pShortArr == nullptr || arrSize == 0)
        return;
    m_FrameIndex++;

    std::unique_lock<std::mutex> lock(m_Mutex);
    //The Buffer is directly filled with the data of the first two frames
    if(m_FrameIndex == 1)
    {
        m_pAudioBuffer = new short[arrSize * 2];
        memcpy(m_pAudioBuffer, pShortArr, sizeof(short) * arrSize);
        m_AudioDataSize = arrSize;
        return;
    }

    //The Buffer is directly filled with the data of the first two frames
    if(m_FrameIndex == 2)
    {
        memcpy(m_pAudioBuffer + arrSize, pShortArr, sizeof(short) * arrSize);
        m_RenderDataSize = m_AudioDataSize / RESAMPLE_LEVEL;
        m_pVerticesCoords = new vec3[m_RenderDataSize * 6]; //(x,y,z) * 6 points
        m_pTextureCoords = new vec2[m_RenderDataSize * 6]; //(x,y) * 6 points
    }

    //Copy the data in Buffer2 to Buffer1, and receive the new data into Buffer2,
    if(m_FrameIndex > 2)
    {
        memcpy(m_pAudioBuffer, m_pAudioBuffer + arrSize, sizeof(short) * arrSize);
        memcpy(m_pAudioBuffer + arrSize, pShortArr, sizeof(short) * arrSize);
    }

    //At this time, block the audio acquisition thread and notify the rendering thread that the data update is complete
    m_bAudioDataReady = true;
    m_pCurAudioData = m_pAudioBuffer;
    m_Cond.wait(lock);

}

The results are shown in the first figure of this paper