Cable Messenger: voice ripple curve generation strategy

Posted by melody on Tue, 01 Feb 2022 14:33:25 +0100


When sending short voice in Cable Messenger chat, relevant voice ripple curve can be generated through real-time analysis of voice text data.

This article is mainly to advocate that in the project development, we should think more and practice more. If we don't do it, we can't leave the third-party library. Without the third-party library, we can't leave our legs, and even go to the product manager to change the demand. In fact, many things can be written and realized by ourselves. In addition, we should understand and learn more about technology and things themselves. Doing projects is not a simple pile of third-party libraries.

Generation and analysis of ripple data

PCM(Pulse Code Modulation) audio data is a raw stream of uncompressed audio sampling data. It is standard digital audio data converted from analog signals through sampling, quantization and coding.

If it is a mono audio file, the sampling data is stored in chronological order. If it is a dual channel, it is stored in the way of LR. The storage time is related to the byte order. Taking the quantization bit of 16bit as an example, for dual channel audio files, 16 * 2 bit digital audio data will be generated at the same time in each sampling time interval and stored in sequential form.

PCM data, as the most original data generated by the equipment, is encapsulated in various compression algorithms and packaging formats to generate the well-known MP3, AMR and other formats. When we receive various audio formats, we should unpack them reversely before playing them, restore the compressed data by algorithm, and play them only after restoring them to the original PCM data. Because this part is not a detailed description of the audio format theory, it is only mentioned in one stroke.

Cable Messenger adopts AMR format for short voice file transmission. AMR has a small file, which is more suitable for short voice transmission. In terms of specific selection, AMR-NB file is smaller and AMR-WEB has higher definition.

On the Android platform, the native playback control has perfectly supported the playback of files in two formats under AMR. On the IOS platform, it claims that AMR file format playback has been supported in previous versions. But at present, the native player still lacks the ability to directly unpack and decompress AMR text and generate PCM data for playback. Therefore, when the IOS side receives the AMR format file, it automatically converts the AMR data before playing, generates PCM format data, and encapsulates the PCM data with WAV packaging specification. When playing, it will be played by the native audio player.

As the most original audio data, PCM is the basis of generating ripple data. So the first step we need to do is how to separate PCM data.

Before the introduction, we should first understand what is the RIFF resource exchange file format. RIFF files consist of one or more "blocks". Each "block" is identified by "block ID" (4Byte) "length" (4Byte) "data" (determined by the previous length).

For WAV files, it is encapsulated by a "block" with a "block ID" value RIFF. The value of "WAVE" in the "block" is a standard value.

In the WAVE sub block, there may be three sub blocks with fmt (format information) data (PCM data) and fact (additional data) as the "block identification" value. The ripple data is placed in the "data" of the "block ID" value of data.

Well, speaking of myself, I've gone around. Because it is not a theoretical article, we don't talk about theory at length. If you are interested, these materials can be found everywhere.

The following takes Objective C code as an example to give the value method of PCM data, because it only focuses on the value of PCM data, and other block information will not give the analysis code

#pragma mark - analyze the voiceprint curve of wav and return the voiceprint data
+ (nullable NSData*) decodePCM:(nonnull NSString*)path{
    NSData* wavData = [NSData dataWithContentsOfFile:path];
    if wavData == nil{
    	return null;
    }
    
    int index 	 	 = 0;
    int dataSize 	 = 0;
    BOOL enable  	 = NO;
    NSData *RIIFData = nil;
 
    //1. First judge whether the file is in standard RIFF format
    NSData *dType   = [wavData subdataWithRange:NSMakeRange(0, 4)];
    NSString *sType = [[NSString alloc] initWithData:dType encoding:NSUTF8StringEncoding];
    if([@"RIFF" isEqualToString:fileType]){
        enable = YES;
        //2. Obtain RIFF data length
        int RIIFsize;
        [[wavData subdataWithRange:NSMakeRange(4, 4)] getBytes:&RIIFsize length:sizeof(RIIFsize)];
            
        //Judge whether it is a WAVE grid. There will be format and data chunk after the WAVE format
        NSData *dWave   = [wavData subdataWithRange:NSMakeRange(8, 4)];
        NSString *sWave = [[NSString alloc] initWithData:dWave encoding:NSUTF8StringEncoding];
        if ([@"WAVE" isEqualToString:sWave] == NO){
            enable = NO;
        }
        //3. Intercept WAVE chunk data
        if (RIIFsize > 0){
            RIIFsize = RIIFsize - 4;
            RIIFData = [wavData subdataWithRange:NSMakeRange(12, RIIFsize)];
        }
     }
        
     //3. Get the data content in wav 
     while (enable && RIIFData != nil && RIIFData.length > 0) {
     	NSData *dData 		= [RIIFData subdataWithRange:NSMakeRange(index, 4)];
        NSString *chunkType = [[NSString alloc] initWithData:dData encoding:NSUTF8StringEncoding];
        
        int chunkSize;  //Data length of chunk
        [[RIIFData subdataWithRange:NSMakeRange(index + 4, 4)] getBytes:&chunkSize length:sizeof(chunkSize)];
            
        //data type found
        if([chunkType isEqualToString:@"data"] == YES){
            dataSize = chunkSize;
            break;
        }
        index = index + 8 + chunkSize;
      }
        
      //4. Intercept PCM data and return
      if(dataSize != 0){
          NSData *dPCM = [RIIFData subdataWithRange:NSMakeRange(index + 8, dataSize * sizeof(char))];
          return dPCM;
      }
    return nil;
}

Ripple curve control core logic

The display effect of ripple curve control is shown in the figure below:


After obtaining the PCM data, in order to present the data to the control with limited length, it is necessary to conduct a certain secondary sampling of the PCM data value.

The quantization bit is 16bit, for example. The simple sampling code is as follows:

#pragma mark - resample the generated voiceprint data
+ (nullable NSMutableArray*)encodeLineValue:(nonnull NSData*)data offset:(int)offset{
    int size = data.length * 0.5;
    NSMutableArray *yPoins = [[NSMutableArray alloc] init];
    for (int i = 0; i < size; i++) {
        if ((i % offset) == 0){
            int16_t value;
            [[data subdataWithRange:NSMakeRange(i * 2 , 2)] getBytes:&value length:sizeof(value)];
            [yPoins addObject:[NSNumber numberWithInt:value]];
        }
    }
    return yPoins;
}

In the implementation of the control, the length of the control is proportional to the length of the audio. The width of the corrugated curve is a fixed value. The number of curves in the control can be obtained by calculating the length first, and then dividing it.

The calculation of curve height will be slightly more complicated. First scan the secondary sampled data and take the data with the largest absolute value as a reference. The maximum value of curve height is a fixed value. First calculate the proportion between the two values. Then traverse the secondary sampled data, and calculate the real height of each curve through the generated proportional value. Relevant codes are as follows:

	///
	///Generate voiceprint coordinates
	///
    public func createPCMLineData(path:String, width:CGFloat) -> Data?{
        ///Maximum line height
        let maxHeight:CGFloat = self.Max 
        let height:CGFloat    = self.frame.size.height - self.paddingBotton;                           
        var lineCount:Int     = Int(width / (self.lineWidth + self.linePadding))
        lineCount             = lineCount > 0 ? lineCount : 1 
        guard let dPCM:Data = self.decodePCM(path) else{
        	return nil
        } 
        self.points.removeAll()
        ///Taking 16 bits as an example, two bytes are a single completion
        let unitCount:Int = Int(dPCM.count / 2) 
        ///Sampling interval               
        let offset:Int32  = Int32(unitCount / lineCount) 
        // 1. ========== generate secondary sampling data============
		guard let values:[Int16] = self.encodeLineValue(dPCM, offset:offset) as? [Int16]{ 
			return nil
		}       
        // 2. ================== take the maximum value=============
        var maxValue:Int16 = 0
        for item in 0..<values.count{
            if abs(values[item]) > maxValue{
               maxValue = Int16(abs(values[item]))
            }
        }      
        // 3. ==========================================================================
        let scrol:CGFloat      = maxHeight  / CGFloat(maxValue)
        // 4. ============= generate line coordinates==============
        let totalWidth:CGFloat = self.lineWidth + self.linePadding
        for i in 0..<values.count{
            ///Horizontal coordinates
            let x:CGFloat     = totalWidth * CGFloat(i + 1) - (self.lineWidth + self.linePadding) * 0.5  
            var value:CGFloat = CGFloat(fabsf(Float(values[i]))) * scrol
            if value < self.Min{  // The minimum value is 1
                value = self.Min
            }
            if value >= height * 0.5 - 1{
                value = height * 0.5 - 2
            }
            ///Vertical coordinates
            let y:CGFloat = CGFloat(fabsf(Float(height * 0.5) - Float(value))) 
                self.points.append(CGPoint(x: x, y: y))
            }
         }
         ///After the data is generated, the active refresh triggers the drawing
         self.setNeedsDisplay()
        }
        ///Return voiceprint data
        return NSKeyedArchiver.archivedData(withRootObject:self.points)
    }

After the ripple curve is generated from the standard, the control drawing logic can be written. The code is as follows:

	override public func draw(_ rect: CGRect) {
        objc_sync_enter(self)
        let width:CGFloat      			= rect.size.width;
        let height:CGFloat 				= rect.size.height;
        let progressBackCGColor:CGColor = self.progressBC.cgColor
        let progressFormCGColor:CGColor = self.progressFC.cgColor
        let defaultCGolor:CGColor 		= self.defaultColor.cgColor
        
        if let context:CGContext = UIGraphicsGetCurrentContext(){
            if self.progress == 0 || self.isAnimationRunning == false{
                context.setStrokeColor(defaultCGolor)
            }else{
                context.setStrokeColor(progressBackCGColor)
            }
            context.setLineWidth(lineWidth)
            for i in 0..<points.count {
                //Point setting
                let point:CGPoint = points[i]
                context.move(to: CGPoint(x: point.x, y: point.y))
                context.addLine(to: CGPoint(x: point.x, y: height  - point.y))
                context.strokePath()
            }
            context.saveGState()
            // 2. Set clipping area
            context.beginPath()
            context.addRect(CGRect(x: 0, y: 0, width: width * self.progress, height: height))
            context.closePath()
            context.clip()
            
            // 3. Draw the light color layer of [upper progress]
            context.setStrokeColor(progressFormCGColor)
            context.setLineWidth(lineWidth)
            for i in 0..<points.count {
                //Point setting
                let point:CGPoint = points[i]
                context.move(to: CGPoint(x: point.x, y: point.y))
                context.addLine(to: CGPoint(x: point.x, y: height - point.y))
                context.strokePath()
            }
            context.saveGState()
        }
        objc_sync_exit(self)
    }

The drawing process code is only for reference, because the control also involves many operations, such as the progress display during playback, the progress response of gesture dragging left and right, the color switching after playback, etc., so there is no complete code one by one. The above code is the core logic of drawing. If I have time, I will sort out a module and put it into my own open source project for your reference.

Nothing is difficult if you put your heart into it. Although many excellent third-party libraries are worth using, it is more important to learn to think and explore how other people's libraries are implemented. This is the basic accomplishment of a qualified programmer.

Topics: iOS im