Practice! Achieving Audio Clipping Processing in Pure Front End

Posted by ericburnard on Tue, 08 Oct 2019 09:26:44 +0200

Preface

Recently, in a project, the audio recorded by webRTC needs to be processed, including audio clipping, multi-audio merging, and even replacing one part of an audio with another.

Originally, I intend to leave this job to the server side, but considering whether the front end or the backstage is done, the work is almost the same. Moreover, the delivery to the server needs an additional process of uploading and downloading the audio. This not only adds pressure to the server, but also costs the network traffic, so an idea arises: why audio processing? Can't the front end do this?

Therefore, under the author's half-exploration and half-practice, this article came into being. Say less nonsense, go first. Warehouse address This is an out-of-the-box front-end audio clip sdk.

ffmpeg

Ffmpeg is a very core module to realize the front-end audio processing. Of course, not only the front-end, ffmpge as a mature and complete solution to provide recording, conversion and streaming audio and video industry, it is also applied in the server, APP applications and other scenarios. About the introduction of ffmpeg, you can google by yourselves, not too much here.

Because ffmpeg needs a lot of computation in the process of processing, it is impossible to run it directly on the front-end page, because we need to open a web worker separately and let it run in the worker itself, without blocking page interaction.

Fortunately, there are already developers on Universal github. ffmpge.js It also provides a worker version, which can be used directly.

So we have a general idea: when we get the audio file, we decode it and send it to worker, let it calculate and process it, and return the result in the way of events, so that we can do whatever we want with the audio:)

Necessary Work Before A Wonderful Journey

What needs to be stated in advance is that because the author's project requirements only need to deal with. MP3 format, so the following code examples and the code involved in the warehouse address are mainly for mp3, of course, regardless of which format, the idea is similar.

Create worker

The way to create a worker is very simple, directly new. Note that due to the limitation of the homology policy, to make the worker work properly, it should be homologous with the parent page, because this is not the focus, so skip it.

function createWorker(workerPath: string) {
  const worker = new Worker(workerPath);
  return worker;
}

postMessage to promise

Looking closely at the children's shoes of ffmpeg.js document, we can see that it will send events to the parent page at different stages of dealing with audio, such as stdout, start and done. If we add callback functions directly to these events, it is not easy to distinguish and process the results of one audio after another in the callback function. Individuals are more inclined to convert it into promise:

function pmToPromise(worker, postInfo) {
  return new Promise((resolve, reject) => {
    // Successful callback
    const successHandler = function(event) {
      switch (event.data.type) {
        case "stdout":
          console.log("worker stdout: ", event.data.data);
          break;

        case "start":
          console.log("worker receive your command and start to work:)");
          break;

        case "done":
          worker.removeEventListener("message", successHandler);
          resolve(event);
          break;

        default:
          break;
      }
    };
    
    // Anomaly capture
    const failHandler = function(error) {
      worker.removeEventListener("error", failHandler);
      reject(error);
    };

    worker.addEventListener("message", successHandler);
    worker.addEventListener("error", failHandler);
    postInfo && worker.postMessage(postInfo);
  });
}

Through this transformation, we can transform a postMessage request into a promise, which makes it easier to expand in space.

Interconversion of audio, blob and arrayBuffer

ffmpeg-workerThe required data format isarrayBuffer,Ordinarily we can use it directly, or we can use it as an audio file object.blob,Or audio element objectsaudio,Maybe it's just a link. url,Therefore, the conversion of these formats is very necessary:

audio turn arrayBuffer

function audioToBlob(audio) {
  const url = audio.src;
  if (url) {
    return axios({
      url,
      method: 'get',
      responseType: 'arraybuffer',
    }).then(res => res.data);
  } else {
    return Promise.resolve(null);
  }
}

The way audio transfers to blob is to initiate an ajax request, set the request type to arraybuffer, and get the arrayBuffer.

blob to arrayBuffer

It's also very simple, just use FileReader to extract the blob content.

function blobToArrayBuffer(blob) {
  return new Promise(resolve => {
    const fileReader = new FileReader();
    fileReader.onload = function() {
      resolve(fileReader.result);
    };
    fileReader.readAsArrayBuffer(blob);
  });
}

ArayBuffer to blob

Create a blob with File

function audioBufferToBlob(arrayBuffer) {
  const file = new File([arrayBuffer], 'test.mp3', {
    type: 'audio/mp3',
  });
  return file;
}

blob to audio

Blob to audio is very simple, js provides a native API - URL. createObject URL, with which we can turn blob to cost accessible links for playback.

function blobToAudio(blob) {
  const url = URL.createObjectURL(blob);
  return new Audio(url);
}

Next, let's get to the point.

Audio clipping - clip

The so-called clipping refers to extracting the content of the given audio according to the given starting and ending time points to form a new audio, first code:

class Sdk {
  end = "end";

  // other code...

  /**
   * The incoming audio blob is clipped according to the specified time position
   * @param originBlob Audio to be processed
   * @param startSecond Start cutting time point (seconds)
   * @param endSecond End clipping time point (seconds)
   */
  clip = async (originBlob, startSecond, endSecond) => {
    const ss = startSecond;
    // Gets the length of time that needs to be tailored, and if endSecond is not passed, it is tailored to the end by default
    const d = isNumber(endSecond) ? endSecond - startSecond : this.end;
    // Converting blob to processable arrayBuffer
    const originAb = await blobToArrayBuffer(originBlob);
    let resultArrBuf;

    // Get the instructions sent to ffmpge-worker and send them to worker, waiting for its tailoring to complete
    if (d === this.end) {
      resultArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, ss)
      )).data.data.MEMFS[0].data;
    } else {
      resultArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, ss, d)
      )).data.data.MEMFS[0].data;
    }

    // Wrap the arrayBuffer after the worker has been processed into a blob and return
    return audioBufferToBlob(resultArrBuf);
  };
}

We define three parameters of the interface: the audio blob that needs to be tailored, and the start and end points of the tailoring. It is noteworthy that the getClipCommand function here is responsible for wrapping the incoming arrayBuffer and time into the data format agreed upon by ffmpeg-worker.

/**
 * Convert clipped data into specified format according to ffmpeg document requirements
 * @param arrayBuffer Audio buffer to be processed
 * @param st Start cutting time point (seconds)
 * @param duration Cutting time
 */
function getClipCommand(arrayBuffer, st, duration) {
  return {
    type: "run",
    arguments: `-ss ${st} -i input.mp3 ${
      duration ? `-t ${duration} ` : ""
    }-acodec copy output.mp3`.split(" "),
    MEMFS: [
      {
        data: new Uint8Array(arrayBuffer),
        name: "input.mp3"
      }
    ]
  };
}

Multi-audio synthesis-concat

Multimedia synthesis is well understood, which combines multiple audio into one audio in sequence of arrays.

class Sdk {
  // other code...

  /**
   * The incoming audio blob is clipped according to the specified time position
   * @param blobs Audio blob array to be processed
   */
  concat = async blobs => {
    const arrBufs = [];
  
    for (let i = 0; i < blobs.length; i++) {
      arrBufs.push(await blobToArrayBuffer(blobs[i]));
    }
  
    const result = await pmToPromise(
      this.worker,
      await getCombineCommand(arrBufs),
    );
    return audioBufferToBlob(result.data.data.MEMFS[0].data);
  };
}

In the above code, we use the for loop to decode the blob in the array into arrayBuffer. Maybe children's shoes will be curious: why don't we directly use the forEach method of array to traverse it? Writing for loops is a bit of a hassle. Actually, there is a reason: we use await in the loop body. We expect these blobs to decode one by one before executing the following code. For loop is executed synchronously, but for each loop body of forEach is executed asynchronously. We can't wait for them to complete by await, so using forEach does not meet our expectations.

Similarly, the responsibility of the getCombineCommand function is similar to that of the getClipCommand described above:

async function getCombineCommand(arrayBuffers) {
  // Convert arrayBuffers to the data format specified by ffmpeg-worker, respectively
  const files = arrayBuffers.map((arrayBuffer, index) => ({
    data: new Uint8Array(arrayBuffer),
    name: `input${index}.mp3`,
  }));
  
  // Create a txt text that tells ffmpeg what audio files we need to merge (a mapping table similar to these files)
  const txtContent = [files.map(f => `file '${f.name}'`).join('\n')];
  const txtBlob = new Blob(txtContent, { type: 'text/txt' });
  const fileArrayBuffer = await blobToArrayBuffer(txtBlob);

  // Push the txt file into the list of files to be sent to ffmpeg-worker
  files.push({
    data: new Uint8Array(fileArrayBuffer),
    name: 'filelist.txt',
  });

  return {
    type: 'run',
    arguments: `-f concat -i filelist.txt -c copy output.mp3`.split(' '),
    MEMFS: files,
  };
}

In the above code, unlike the clipping operation, there are more than one audio object to be operated on, so we need to create a "mapping table" to tell ffmpeg-worker what audio needs to be merged together and their merging order.

Audio Clipping Replacement-splice

It's a bit like an upgraded version of clip. We delete Audio A from the specified location and insert Audio B here:

class Sdk {
  end = "end";
  // other code...

  /**
   * Replace one audio blob with another audio at the specified location
   * @param originBlob Audio blob to be processed
   * @param startSecond Starting point (seconds)
   * @param endSecond End time point (seconds)
   * @param insertBlob Replaced audio blob
   */
  splice = async (originBlob, startSecond, endSecond, insertBlob) => {
    const ss = startSecond;
    const es = isNumber(endSecond) ? endSecond : this.end;

    // If insertBlob does not exist, only the specified content of the audio is deleted.
    insertBlob = insertBlob
      ? insertBlob
      : endSecond && !isNumber(endSecond)
      ? endSecond
      : null;

    const originAb = await blobToArrayBuffer(originBlob);
    let leftSideArrBuf, rightSideArrBuf;

    // Cut and divide the audio first according to the specified position
    if (ss === 0 && es === this.end) {
      // Cut all
      return null;
    } else if (ss === 0) {
      // Cut from scratch
      rightSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, es)
      )).data.data.MEMFS[0].data;
    } else if (ss !== 0 && es === this.end) {
      // Cut to the tail
      leftSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, 0, ss)
      )).data.data.MEMFS[0].data;
    } else {
      // Partial clipping
      leftSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, 0, ss)
      )).data.data.MEMFS[0].data;
      rightSideArrBuf = (await pmToPromise(
        this.worker,
        getClipCommand(originAb, es)
      )).data.data.MEMFS[0].data;
    }

    // Merge multiple audio
    const arrBufs = [];
    leftSideArrBuf && arrBufs.push(leftSideArrBuf);
    insertBlob && arrBufs.push(await blobToArrayBuffer(insertBlob));
    rightSideArrBuf && arrBufs.push(rightSideArrBuf);

    const combindResult = await pmToPromise(
      this.worker,
      await getCombineCommand(arrBufs)
    );

    return audioBufferToBlob(combindResult.data.data.MEMFS[0].data);
  };
}

The above code is somewhat similar to the combination of clip and concat.

At this point, we have basically realized our needs, only with the help of worker, the front-end itself can handle audio, is it not beautiful?

These codes are just for better explanation, so they are simplified. Interested children's shoes can be directly explained. Source code Welcome to exchange and pat bricks:)

Topics: Javascript SDK network less Google