Large file upload second transmission breakpoint continuous transmission fragment upload

Posted by SaMike on Mon, 03 Jan 2022 07:43:56 +0100

preface

File upload is an old topic. When the file is relatively small, you can directly convert the file into a byte stream and upload it to the server. However, when the file is relatively large, it is not a good way to upload it in an ordinary way. After all, few people will tolerate it. When the file is uploaded to half of the interrupt, It's an unpleasant experience to continue uploading but only start uploading again. Do you have a better upload experience? The answer is yes, which is several upload methods to be introduced below

Detailed tutorial

Second transmission

1. What is second pass

Generally speaking, when you upload something to be uploaded, the server will perform MD5 verification first. If there is the same thing on the server, it will directly give you a new address. In fact, you download the same file on the server. If you don't want to transfer it in seconds, in fact, just let MD5 change, Just modify the file itself (changing the name is not enough). For example, if you add a few more words to a text file, MD5 will change and will not be transmitted in seconds

2. The second transmission core logic implemented in this paper

a. Use the set method of redis to store the file upload status, where key is the md5 of file upload, and value is the flag bit of whether the upload is completed,

b. When the flag bit true indicates that the upload has been completed, if the same file is uploaded, the second transmission logic will be entered. If the flag bit is false, it indicates that the upload has not been completed. At this time, you need to call the set method to save the path of the block number file record, where key is the upload file md5 plus a fixed prefix, and value is the block number file record path

Fragment upload

1. What is fragment upload

Slice upload is to separate the files to be uploaded into multiple data blocks (we call them parts) according to a certain size. After uploading, the server summarizes all uploaded files and integrates them into the original files.

2. Scenario of fragment upload

1. Large file upload

2. The network environment is bad, and there is a risk of retransmission

Breakpoint continuation

1. What is breakpoint continuation

Breakpoint continuation is when downloading or uploading, Tasks will be downloaded or uploaded (a file or a compressed package) is artificially divided into several parts, and each part is uploaded or downloaded by a thread. In case of network failure, you can continue to upload or download the unfinished part from the part that has been uploaded or downloaded instead of uploading or downloading from the beginning. The breakpoint continuation in this paper is mainly aimed at the breakpoint upload scenario.

2. Application scenario

Breakpoint continuation can be regarded as a derivative of fragment upload. Therefore, breakpoint continuation can be used in all scenarios where fragment upload can be used.

3. The core logic of realizing breakpoint continuation

In the process of fragment upload, if the upload is interrupted due to abnormal factors such as system crash or network interruption, the client needs to record the upload progress. When uploading again is supported later, you can continue to upload from the place where the last upload was interrupted.

In order to avoid the problem that the progress data of the client after uploading is deleted, which leads to the restart of uploading from the beginning, the server can also provide corresponding interfaces for the client to query the uploaded fragment data, so that the client can know the uploaded fragment data, so as to continue uploading from the next fragment data.

4. Implementation process steps

a. Scheme I, general steps

Divide the files to be uploaded into data blocks of the same size according to certain segmentation rules;
Initialize a slice upload task and return the unique ID of this slice upload;
Send each fragment data block according to a certain strategy (serial or parallel);
After sending, the server judges whether the data upload is complete according to the. If it is complete, it will synthesize the data blocks to obtain the original file.

b. Scheme 2. Implementation steps of this paper

The front-end (client) needs to fragment the file according to the fixed size. When requesting the back-end (server), it should bring the fragment serial number and size
The server creates a conf file to record the partition location. The length of the conf file is the total number of partitions. Each time a partition is uploaded, a 127 is written to the conf file. Then the location not uploaded is the default 0, and the uploaded is byte MAX_ Value 127 (this step is the core step to realize breakpoint continuous transmission and second transmission)
The server calculates the start position according to the fragment sequence number given in the request data and the block size of each fragment (the fragment size is fixed and the same), and writes it to the file together with the read file fragment data.

5. Implementation of fragment upload / breakpoint upload code

a. The front-end uses the webuploader plug-in provided by Baidu for segmentation. Because this article mainly introduces the implementation of the server code and how to fragment the webuploader, you can see the following links for the specific implementation:

http://fex.baidu.com/webuploader/getting-started.html

b. The backend implements file writing in two ways. One is RandomAccessFile. If you are not familiar with RandomAccessFile, you can view the following link:

https://blog.csdn.net/dimudan2015/article/details/81910690

The other is to use MappedByteBuffer. Friends who are not familiar with MappedByteBuffer can see the following link:

https://www.jianshu.com/p/f90866dcbffc

The core code for the write operation at the back end

a. RandomAccessFile implementation

@UploadMode(mode = UploadModeEnum.RANDOM_ACCESS)  
@Slf4j  
public class RandomAccessUploadStrategy extends SliceUploadTemplate {  
  
  @Autowired  
  private FilePathUtil filePathUtil;  
  
  @Value("${upload.chunkSize}")  
  private long defaultChunkSize;  
  
  @Override  
  public boolean upload(FileUploadRequestDTO param) {  
    RandomAccessFile accessTmpFile = null;  
    try {  
      String uploadDirPath = filePathUtil.getPath(param);  
      File tmpFile = super.createTmpFile(param);  
      accessTmpFile = new RandomAccessFile(tmpFile, "rw");  
      //This must be consistent with the value set at the front end
      long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024  
          : param.getChunkSize();  
      long offset = chunkSize * param.getChunk();  
      //The offset to locate the slice
      accessTmpFile.seek(offset);  
      //Write the partition data
      accessTmpFile.write(param.getFile().getBytes());  
      boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath);  
      return isOk;  
    } catch (IOException e) {  
      log.error(e.getMessage(), e);  
    } finally {  
      FileUtil.close(accessTmpFile);  
    }  
   return false;  
  }  
  
}

b. MappedByteBuffer implementation

@UploadMode(mode = UploadModeEnum.MAPPED_BYTEBUFFER)  
@Slf4j  
public class MappedByteBufferUploadStrategy extends SliceUploadTemplate {  
  
  @Autowired  
  private FilePathUtil filePathUtil;  
  
  @Value("${upload.chunkSize}")  
  private long defaultChunkSize;  
  
  @Override  
  public boolean upload(FileUploadRequestDTO param) {  
  
    RandomAccessFile tempRaf = null;  
    FileChannel fileChannel = null;  
    MappedByteBuffer mappedByteBuffer = null;  
    try {  
      String uploadDirPath = filePathUtil.getPath(param);  
      File tmpFile = super.createTmpFile(param);  
      tempRaf = new RandomAccessFile(tmpFile, "rw");  
      fileChannel = tempRaf.getChannel();  
  
      long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024  
          : param.getChunkSize();  
      //Write the partition data
      long offset = chunkSize * param.getChunk();  
      byte[] fileData = param.getFile().getBytes();  
      mappedByteBuffer = fileChannel  
.map(FileChannel.MapMode.READ_WRITE, offset, fileData.length);  
      mappedByteBuffer.put(fileData);  
      boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath);  
      return isOk;  
  
    } catch (IOException e) {  
      log.error(e.getMessage(), e);  
    } finally {  
  
      FileUtil.freedMappedByteBuffer(mappedByteBuffer);  
      FileUtil.close(fileChannel);  
      FileUtil.close(tempRaf);  
  
    }  
  
    return false;  
  }  
  
}

c. File operation core template class code

@Slf4j  
public abstract class SliceUploadTemplate implements SliceUploadStrategy {  
  
  public abstract boolean upload(FileUploadRequestDTO param);  
  
  protected File createTmpFile(FileUploadRequestDTO param) {  
  
    FilePathUtil filePathUtil = SpringContextHolder.getBean(FilePathUtil.class);  
    param.setPath(FileUtil.withoutHeadAndTailDiagonal(param.getPath()));  
    String fileName = param.getFile().getOriginalFilename();  
    String uploadDirPath = filePathUtil.getPath(param);  
    String tempFileName = fileName + "_tmp";  
    File tmpDir = new File(uploadDirPath);  
    File tmpFile = new File(uploadDirPath, tempFileName);  
    if (!tmpDir.exists()) {  
      tmpDir.mkdirs();  
    }  
    return tmpFile;  
  }  
  
  @Override  
  public FileUploadDTO sliceUpload(FileUploadRequestDTO param) {  
  
    boolean isOk = this.upload(param);  
    if (isOk) {  
      File tmpFile = this.createTmpFile(param);  
      FileUploadDTO fileUploadDTO = this.saveAndFileUploadDTO(param.getFile().getOriginalFilename(), tmpFile);  
      return fileUploadDTO;  
    }  
    String md5 = FileMD5Util.getFileMD5(param.getFile());  
  
    Map<Integer, String> map = new HashMap<>();  
    map.put(param.getChunk(), md5);  
    return FileUploadDTO.builder().chunkMd5Info(map).build();  
  }  
  
  /**  
   * Check and modify the file upload progress
   */  
  public boolean checkAndSetUploadProgress(FileUploadRequestDTO param, String uploadDirPath) {  
  
    String fileName = param.getFile().getOriginalFilename();  
    File confFile = new File(uploadDirPath, fileName + ".conf");  
    byte isComplete = 0;  
    RandomAccessFile accessConfFile = null;  
    try {  
      accessConfFile = new RandomAccessFile(confFile, "rw");  
      //Mark the segment as true to indicate completion
      System.out.println("set part " + param.getChunk() + " complete");  
      //Create a conf file. The file length is the total number of partitions. Every time a partition is uploaded, a 127 is written to the conf file. Then the location that has not been uploaded is 0 by default, and the uploaded location is byte MAX_ VALUE 127  
      accessConfFile.setLength(param.getChunks());  
      accessConfFile.seek(param.getChunk());  
      accessConfFile.write(Byte.MAX_VALUE);  
  
      //completeList: check whether all pieces are completed. If all pieces in the array are 127 (all pieces are uploaded successfully)
      byte[] completeList = FileUtils.readFileToByteArray(confFile);  
      isComplete = Byte.MAX_VALUE;  
      for (int i = 0; i < completeList.length && isComplete == Byte.MAX_VALUE; i++) {  
        //And operation. If some parts are not completed, then {isComplete} is not} byte MAX_ VALUE  
        isComplete = (byte) (isComplete & completeList[i]);  
        System.out.println("check part " + i + " complete?:" + completeList[i]);  
      }  
  
    } catch (IOException e) {  
      log.error(e.getMessage(), e);  
    } finally {  
      FileUtil.close(accessConfFile);  
    }  
 boolean isOk = setUploadProgress2Redis(param, uploadDirPath, fileName, confFile, isComplete);  
    return isOk;  
  }  
  
  /**  
   * Save the upload progress information in redis
   */  
  private boolean setUploadProgress2Redis(FileUploadRequestDTO param, String uploadDirPath,  
      String fileName, File confFile, byte isComplete) {  
  
    RedisUtil redisUtil = SpringContextHolder.getBean(RedisUtil.class);  
    if (isComplete == Byte.MAX_VALUE) {  
      redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "true");  
      redisUtil.del(FileConstant.FILE_MD5_KEY + param.getMd5());  
      confFile.delete();  
      return true;  
    } else {  
      if (!redisUtil.hHasKey(FileConstant.FILE_UPLOAD_STATUS, param.getMd5())) {  
        redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "false");  
        redisUtil.set(FileConstant.FILE_MD5_KEY + param.getMd5(),  
            uploadDirPath + FileConstant.FILE_SEPARATORCHAR + fileName + ".conf");  
      }  
  
      return false;  
    }  
  }  
/**  
   * Save file operation
   */  
  public FileUploadDTO saveAndFileUploadDTO(String fileName, File tmpFile) {  
  
    FileUploadDTO fileUploadDTO = null;  
  
    try {  
  
      fileUploadDTO = renameFile(tmpFile, fileName);  
      if (fileUploadDTO.isUploadComplete()) {  
        System.out  
            .println("upload complete !!" + fileUploadDTO.isUploadComplete() + " name=" + fileName);  
        //TODO save file information to database
  
      }  
  
    } catch (Exception e) {  
      log.error(e.getMessage(), e);  
    } finally {  
  
    }  
    return fileUploadDTO;  
  }  
/**  
   * File rename
   *  
   * @param toBeRenamed The file whose name will be modified
   * @param toFileNewName New name
   */  
  private FileUploadDTO renameFile(File toBeRenamed, String toFileNewName) {  
    //Check whether the file to be renamed exists and whether it is a file
    FileUploadDTO fileUploadDTO = new FileUploadDTO();  
    if (!toBeRenamed.exists() || toBeRenamed.isDirectory()) {  
      log.info("File does not exist: {}", toBeRenamed.getName());  
      fileUploadDTO.setUploadComplete(false);  
      return fileUploadDTO;  
    }  
    String ext = FileUtil.getExtension(toFileNewName);  
    String p = toBeRenamed.getParent();  
    String filePath = p + FileConstant.FILE_SEPARATORCHAR + toFileNewName;  
    File newFile = new File(filePath);  
    //Modify file name
    boolean uploadFlag = toBeRenamed.renameTo(newFile);  
  
    fileUploadDTO.setMtime(DateUtil.getCurrentTimeStamp());  
    fileUploadDTO.setUploadComplete(uploadFlag);  
    fileUploadDTO.setPath(filePath);  
    fileUploadDTO.setSize(newFile.length());  
    fileUploadDTO.setFileExt(ext);  
    fileUploadDTO.setFileId(toFileNewName);  
  
    return fileUploadDTO;  
  }  
}

summary

In the process of fragment uploading, the front-end and back-end need to cooperate. For example, the file size of the upload block number at the front and rear ends must be consistent, otherwise there will be problems in uploading. Secondly, a file server should be built for normal file related operations, such as using fastdfs, hdfs, etc.

When the computer is configured with 4-core memory of 8G, the sample code uploads 24G files, which takes more than 30 minutes. The main time is spent on the calculation of md5 value at the front end, and the writing speed at the back end is still relatively fast. If the project team thinks that the self built file server takes too much time, and the project needs only upload and download, it is recommended to use Alibaba's oss server. For its introduction, you can check the official website:

https://help.aliyun.com/product/31815.html

Ali's oss is essentially an object storage server, not a file server. Therefore, if there is a need to delete or modify a large number of files, oss may not be a good choice.

A link demo of oss form upload is provided at the end of the article. Through oss form upload, you can directly upload files from the front end to the oss server, and push the upload pressure to the oss server:

https://www.cnblogs.com/ossteam/p/4942227.html

Topics: Java Redis server

Programmer Think

Large file upload second transmission breakpoint continuous transmission fragment upload

preface

Detailed tutorial

summary

Hot Topics