Spring boot develops large file upload service and breakpoint continuation function

Posted by NiGHTFiRE on Sat, 18 Sep 2021 11:22:42 +0200

Recently, I received a commercial project, which involves the problem of large file upload server and breakpoint continuation. I saw a good technical article to share with you.

Recently, due to the product needs of the R & D group where the author works, it is required to support high-performance large file http upload and http breakpoint continuous transmission. Here is a brief summary to facilitate memory:

  1. The server side is implemented by C language instead of java and PHP;

  2. The server side writes to the hard disk immediately, so there is no need to call move again_ uploaded_ file,InputStreamReader   This technology needs caching to avoid server memory occupation and browser request timeout;

  3. Support HTML5 and IFRAME (for old browsers) and obtain file upload progress.

In order to better adapt to the current mobile Internet, the upload service is required to support breakpoint continuation and disconnection reconnection. Because the mobile Internet is not very stable; Moreover, the possibility of abnormal disconnection when uploading a large file is very high. In order to avoid re uploading, it is very necessary to support breakpoint continuation.

The idea of supporting breakpoint continuation is:

The client (usually the browser) uploads a file to the server and keeps recording the upload progress. If the line drops or other exceptions occur, the client can query the server about the uploaded status of a file and upload it from the last uploaded file location.

There are also masters on the Internet who upload large files by means of piecemeal file upload. The method is to cut the files into small pieces, such as 4MB one fragment. The server receives a small piece of file each time and saves it as a temporary file. After all fragments are transmitted, the merge is performed. The author believes that this method is OK if the original file is small enough, but once the file has hundreds of megabytes or several gigabytes or dozens of gigabytes, the time of merging files will be very long, which often leads to browser response timeout or server blocking.

If you implement an independent client (or ActiveX plug-in of the browser) to upload files, it will be very simple to support breakpoint continuation. You only need to record the file upload status on the client. Generally speaking, supporting browser breakpoint continuation (without installing third-party plug-ins) is more difficult than making independent client upload, but it is not difficult. My implementation idea is as follows:

1, When the browser uploads a file, it first generates a HASH value for the file, which must be generated on the browser side.

The file upload record cannot be queried based on the file name alone. The file name has great repeatability. The repeatability of the value composed of file name + file size is reduced. If the file modification time is added, the repeatability is further reduced. If the ID of a browser is added, the repeatability conflict can be further reduced. The best way to calculate the HASH value is to use the contents of the file for MD5 calculation, but the amount of calculation is huge (in fact, it is not necessary to do so). Too much time will affect the upload experience.

Based on the above reasons, my HASH value calculation idea is as follows:

  1. First, give the browser an ID, which is saved in the Cookie;

  2. The browser ID + file modification time + file name + file size result is MD5 to calculate the HASH value of a file;

  3. The ID of the browser is automatically granted to the browser when the browser accesses the file upload site.

//Simple Cookie help function   
function setCookie(cname,cvalue,exdays)  
{  
  var d = new Date();  
  d.setTime(d.getTime()+(exdays*24*60*60*1000));  
  var expires = "expires="+d.toGMTString();  
  document.cookie = cname + "=" + cvalue + "; " + expires;  
}  
   
   
function getCookie(cname)  
{  
  var name = cname + "=";  
  var ca = document.cookie.split(';');  
  for(var i=0; i<ca.length; i++)   
  {  
    var c = ca[i].trim();  
    if (c.indexOf(name)==0) return c.substring(name.length,c.length);  
  }  
  return "";  
}  
//  
//Simple file HASH value calculation, if you are not very exquisite, should be used for products.   
//Since a variety of data are used to calculate the file HASH value, the possibility of HASH conflict within the HYFileUploader system should be very small and can be used safely.   
//Any algorithm can be used to obtain the ID of the file. As long as the ID of the same file is the same, the length of the obtained ID shall not exceed 32 bytes   
//  
function getFileId (file)   
{  
    //Give the browser a unique ID to distinguish different browser instances (browsers of different machines or different manufacturers of the same machine)   
    var clientid = getCookie("HUAYIUPLOAD");  
    if (clientid == "") {  
        //Using a random value as the ID of the browser will be part of the file HASH value   
        var rand = parseInt(Math.random() * 1000);  
        var t = (new Date()).getTime();  
        clientid =rand+'T'+t;  
          
        setCookie("HUAYIUPLOAD",clientid,365);  
    }  
      
    var info = clientid;  
    if (file.lastModified)  
        info += file.lastModified;  
    if (file.name)  
        info += file.name;  
    if (file.size)  
        info += file.size;  
    //https://cdn.bootcss.com/blueimp-md5/2.10.0/js/md5.min.js  
    var fileid = md5(info);  
    return fileid;  
}  

The author believes that it is not necessary to calculate the HASH value by reading the contents of the file, which will be very slow. If you really need to realize HTTP second transmission, you may have to do so. In this way, if the file contents uploaded by different people are consistent, you can avoid repeated uploading and directly return the results.

The reason why the browser is given an ID is that it can further avoid the HASH value conflict of files with the same name and size on other computers.

2, Query the HASH value of the file

In file upload support, first query the file upload progress information from the upload server through the HASH value of the file, and then upload from the upload progress position. The code is as follows:

var fileObj = currentfile;  
var fileid = getFileId(fileObj);  
var t = (new Date()).getTime();  
//Obtain the file breakpoint continuation information through the following URL. The required parameter is fileid, and the t parameter is appended to avoid browser caching   
var url = resume_info_url + '?fileid='+fileid + '&t='+t;  
  
var ajax = new XMLHttpRequest();  
  
ajax.onreadystatechange = function () {   
    if(this.readyState == 4){  
        if (this.status == 200){  
            var response = this.responseText;  
              
            var result = JSON.parse(response);  
            if (!result) {  
                alert('The data returned by the server is incorrect. It may be an incompatible server');  
                return;  
            }  
            //The file object returned by the breakpoint continuation information contains the uploaded size   
            var uploadedBytes = result.file && result.file.size;  
            if (!result.file.finished && uploadedBytes < fileObj.size) {  
                upload_file(fileObj,uploadedBytes,fileid);  
            }  
            else {  
                //Once the file has been uploaded, don't upload it again. Just return the result directly   
                showUploadedFile(result.file);  
                //Simulation progress completed   
                //var progressBar = document.getElementById('progressbar');  
                //progressBar.value = 100;  
            }  
              
        }else {  
            alert('Failed to obtain file breakpoint renewal information');  
        }    
    }   
}  
  
ajax.open('get',url,true);  
ajax.send(null);  

The above is implemented through the jQuery file upload component. For the implementation code through the original Javascript, please refer to the h4resume.html sample code in the demos directory.

3, Execute upload

After querying the breakpoint continuation information of the file, if the file has indeed been uploaded before, the server will return the uploaded file size, and then we can upload the data from the uploaded file size.

The slice of the File object in html5 can be used to cut fragments from a File for uploading.

Definition and Usage

The slice() method extracts a part of a word file and returns the extracted part as a new string.

grammar

File.slice(start,end)

Parameter description

start the starting subscript of the fragment to be extracted. If it is a negative number, this parameter specifies the position from the end of the string. That is, - 1 refers to the last character of the string, - 2 refers to the penultimate character, and so on.

End is followed by the subscript at the end of the fragment to be extracted. If this parameter is not specified, the substring to be extracted includes the string from start to the end of the original string.

If the parameter is negative, it specifies the position from the end of the string.

The code for uploading fragment files is as follows:

/*  
File upload processing code   
fileObj : html5 File object   
start_offset: The starting position of the uploaded data relative to the file header   
fileid: The ID of the file. This is the getFileId above   Function,   
*/  
function upload_file(fileObj,start_offset,fileid)  
{  
 var xhr = new XMLHttpRequest();  
 var formData = new FormData();  
   
 var blobfile;  
   
 if(start_offset >= fileObj.size){  
  return false;  
 }  
   
 var bitrateDiv = document.getElementById("bitrate");  
 var finishDiv = document.getElementById("finish");  
 var progressBar = document.getElementById('progressbar');  
 var progressDiv = document.getElementById('percent-label');  
   
 var oldTimestamp = 0;  
 var oldLoadsize = 0;  
 var totalFilesize = fileObj.size;  
 if (totalFilesize == 0) return;  
   
 var uploadProgress = function (evt) {  
  if (evt.lengthComputable) {  
   var uploadedSize = evt.loaded + start_offset;   
   var percentComplete = Math.round(uploadedSize * 100 / totalFilesize);  
   
   var timestamp = (new Date()).valueOf();  
   var isFinish = evt.loaded == evt.total;  
   
   if (timestamp > oldTimestamp || isFinish) {  
    var duration = timestamp - oldTimestamp;  
    if (duration > 500 || isFinish) {  
     var size = evt.loaded - oldLoadsize;  
   
     var bitrate = (size * 8 / duration /1024) * 1000; //kbps  
     if (bitrate > 1000)  
      bitrate = Math.round(bitrate / 1000) + 'Mbps';  
     else  
      bitrate = Math.round(bitrate) + 'Kbps';  
   
     var finish = evt.loaded + start_offset;  
   
     if (finish > 1048576)  
      finish = (Math.round(finish / (1048576/100)) / 100).toString() + 'MB';  
     else  
      finish = (Math.round(finish / (1024/100) ) / 100).toString() + 'KB';  
   
     progressBar.value = percentComplete;  
     progressDiv.innerHTML = percentComplete.toString() + '%';  
     bitrateDiv.innerHTML = bitrate;  
     finishDiv.innerHTML = finish;  
   
     oldTimestamp = timestamp;  
     oldLoadsize = evt.loaded;  
    }  
   }  
  }  
  else {  
   progressDiv.innerHTML = 'N/A';  
  }  
 }  
   
 xhr.onreadystatechange = function(){  
    if ( xhr.readyState == 4 && xhr.status == 200 ) {  
      console.log( xhr.responseText );  
        
    }  
  else if (xhr.status == 400) {  
     
  }  
  };  
   
 var uploadComplete = function (evt) {  
  progressDiv.innerHTML = '100%';  
   
  var result = JSON.parse(evt.target.responseText);  
  if (result.result == 'success') {  
   showUploadedFile(result.files[0]);  
  }  
  else {  
   alert(result.msg);  
  }  
 }  
   
 var uploadFailed = function (evt) {  
  alert("Failed to upload file!");  
 }  
   
 var uploadCanceled = function (evt) {  
  alert("Upload cancelled or browser disconnected!");  
 }  
   
 //Set the timeout. Do not set the timeout for uploading large files   
 //xhr.timeout = 20000;  
 //xhr.ontimeout = function(event){  
  //   alert('the file upload time is too long, and the server does not respond within the specified time! ');   
  //}           
   
 xhr.overrideMimeType("application/octet-stream");   
   
 var filesize = fileObj.size;  
 var blob = fileObj.slice(start_offset,filesize);  
 var fileOfBlob = new File([blob], fileObj.name);  
 //Additional file data should be placed before the request   
 formData.append('filename', fileObj.name);  
 //The fileid information must be transmitted to the server, and the server will continue to transmit the file at a breakpoint only after obtaining the fileid information   
 formData.append('fileid', fileid);  
 //Please put the file data in the last field   
 //formData.append("file",blob, fileObj.name);  
 formData.append('file', fileOfBlob);  
   
 xhr.upload.addEventListener("progress", uploadProgress, false);  
   
 xhr.addEventListener("load", uploadComplete, false);  
 xhr.addEventListener("error", uploadFailed, false);  
 xhr.addEventListener("abort", uploadCanceled, false);  
 xhr.open('POST', upload_file_url);  
 //  
 xhr.send(formData);  
}  

In order to verify the file breakpoint continuation, the author has made a simple interface to display the status information during file upload. The interface is as follows:

Through HTML, you can calculate the progress of file upload, the size of the file that has been uploaded, the bit rate of file upload and other information. If there are any exceptions in the upload process, you can upload it again, and the uploaded part will not need to be uploaded again.

In order to verify the HTML5 breakpoint continuation, you can download the file through github and upload it to the server for testing.

https://github.com/wenshui2008/UploadServer

Topics: Java html5 Vue.js