reference resources
pcm to wav_ Little bear blog - CSDN blog_ Professional tool for converting pcm to wav file
Introduction to audio format and conversion of pcm to wav_ Lidongxiu0714 CSDN blog_ pcm to wav
Background
During the development of Hikvision camera, the original voice file (pcm) is obtained by decomposing the audio file in the audio and video matching stream through sdk. However, pcm cannot be played by default. The voice file transfer system cannot recognize it and must be converted to wav.
Cause of problem:
PCM recording is to represent the analog signal of sound as a digital signal identified by 0,1 without any coding and compression processing, so PCM can be regarded as the original uncompressed audio format. The PCM format file does not contain header information. The player cannot know the sampling rate, channel number, sampling bits, audio data size and other information, so it cannot be played.
The full name of the WAV format is WAVE. As mentioned earlier, you only need to add a WAV file header in front of the PCM file to generate a WAV format file. Let's talk about the WAV file header format.
WAV conforms to the RIFF Resource Interchange File Format specification. The RIFF file structure can be regarded as a tree structure. Its basic composition is a unit called "Chunk". The WAVE file is composed of several chunks. The WAV file itself consists of three "block" information: the RIFF block that identifies the file as the WAV file, the FORMAT block that identifies parameters such as sampling rate, and the DATA block that contains actual DATA (samples).
All wavs have a header that records the encoding parameters of the audio stream. Data blocks are recorded in little endian byte order.
Problem solving:
The fundamental reason is that the pcm file is only the analog data of the sound in the audio file, but the audio sampling rate, channel number, sampling bits, large audio data, audio length and other information are not known, so the transcribing system cannot recognize it, and the ordinary audio player cannot play it. Add the header information of the wav file to the pcm file (the total length of the header is 44k, including the format specification of the wave, sampling rate, number of channels, sampling bits, file size, etc.).
WAV file header information consists of 44 bytes, so you only need to add 44 bytes of WAV file header to PCM file header to generate WAV format file.
ChunkID: the size is 4 bytes of data, and the content is "RIFF", indicating the identification of the resource exchange file
ChunkSize: the size is 4 bytes of data, and the content is an integer, indicating the total number of bytes from the next address to the end of the file
Format: the size is 4 bytes of data, and the content is "WAVE", indicating the WAV file ID
Subchunkl ID: the size is 4 bytes of data, the content is "fmt", indicating the waveform format identification (fmt), and the last space.
Subchunkl Size: the size is 4 bytes of data, and the content is an integer, indicating the length of PCMWAVEFORMAT.
AudioFormat: the size is 2 bytes of data, and the content is a short integer, indicating the type of format (when the value is 1, it means that the data is linear PCM coding)
NumChannels: the size is 2 bytes of data, and the content is a short integer, indicating the number of channels. Mono channel is 1 and dual channel is 2
SampleRate: the size is 4 bytes of data, and the content is an integer, indicating the sampling rate, such as 44100
Byte rate: the size is 4 bytes of data, and the content is an integer, indicating the waveform data transmission rate (average bytes per second). The size is sampling rate * number of channels * sampling bits
BlockAlign: the size is 2 bytes of DATA, the content is a short integer, indicating the length of the DATA block, and the size is the number of channels * sampling bits
BitsPerSample: the size is 2 bytes of data, and the content is a short integer, indicating the sampling bits, that is, the PCM bit width, usually 8 bits or 16 bits
Subchunk2ID: the size is 4 bytes of data, and the content is "data", indicating the data marker
Subchunk2 Size: the size is 4 bytes of data, and the content is an integer, indicating the total size of the next sound data. 44 bytes of the header need to be subtracted.
data: is the content of other encoded files
code:
/** * * @ClassName AudioPcmToWaveUtil * @Description PCM Convert to WAV g tool * @ Author mouse * * @Date 2021/10/15 16:22 * @Version 1.0 **/ public class AudioPcmToWaveUtil { private final Logger logger = LoggerFactory.getLogger(this.getClass()); private static final long SAMPLE_RATE = 16000L; private static final int CHANNELS = 2; private static final int BYTE_SIZE = 1024; private static final int WAVE_HEADER_LENGTH = 44; /** * Conversion method * @param pcmFilePath pcm File path * @param waveFilePath wav File path * @return */ public static Boolean audioPcmToWave(String pcmFilePath, String waveFilePath) { System.out.println(pcmFilePath + "to" + waveFilePath); long totalAudioLen, totalDateLen; long byteRate = 16 * SAMPLE_RATE * CHANNELS / 8; byte[] data = new byte[BYTE_SIZE]; FileInputStream inStream = null; FileOutputStream outStream = null; try { inStream = new FileInputStream(pcmFilePath); outStream = new FileOutputStream(waveFilePath); totalAudioLen = inStream.getChannel().size(); totalDateLen = totalAudioLen + 36; byte[] waveFileHeader = getWaveFileHeader(totalAudioLen, totalDateLen, SAMPLE_RATE, CHANNELS, byteRate); outStream.write(waveFileHeader); while (inStream.read(data) != -1) { outStream.write(data); } inStream.close(); outStream.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return true; } /** * Generate header * @param totalAudioLen File voice data size * @param totalDateLen Total file size * @param sampleRate sampling rate * @param channels Channels * @param byteRate Playback frequency, data buffer size * @return */ private static byte[] getWaveFileHeader(long totalAudioLen, long totalDateLen, long sampleRate, int channels, long byteRate) { byte[] header = new byte[WAVE_HEADER_LENGTH]; // RIFF/WAVE header header[0] = 'R'; header[1] = 'I'; header[2] = 'F'; header[3] = 'F'; header[4] = (byte) (totalDateLen & 0xff); header[5] = (byte) ((totalDateLen >> 8) & 0xff); header[6] = (byte) ((totalDateLen >> 16) & 0xff); header[7] = (byte) ((totalDateLen >> 24) & 0xff); header[8] = 'W'; header[9] = 'A'; header[10] = 'V'; header[11] = 'E'; // 'fmt' chunk header[12] = 'f'; header[13] = 'm'; header[14] = 't'; header[15] = ' '; // 4bytes: size of 'fmt ' chunk header[16] = 16; header[17] = 0; header[18] = 0; header[19] = 0; // format = 1 header[20] = 1; header[21] = 0; header[22] = (byte) channels; header[23] = 0; header[24] = (byte) (sampleRate & 0xff); header[25] = (byte) ((sampleRate >> 8) & 0xff); header[26] = (byte) ((sampleRate >> 16) & 0xff); header[27] = (byte) ((sampleRate >> 24) & 0xff); header[28] = (byte) (byteRate & 0xff); header[29] = (byte) ((byteRate >> 8) & 0xff); header[30] = (byte) ((byteRate >> 16) & 0xff); header[31] = (byte) ((byteRate >> 24) & 0xff); // block align header[32] = (byte) (2 * 16 / 8); header[33] = 0; header[34] = 16; header[35] = 0; // data header[36] = 'd'; header[37] = 'a'; header[38] = 't'; header[39] = 'a'; header[40] = (byte) (totalAudioLen & 0xff); header[41] = (byte) ((totalAudioLen >> 8) & 0xff); header[42] = (byte) ((totalAudioLen >> 16) & 0xff); header[43] = (byte) ((totalAudioLen >> 24) & 0xff); return header; } }