Multithreaded Download File practice Tour

Posted by moleculo on Thu, 24 Oct 2019 07:38:52 +0200

Catalog

1. Use scenario

2. Principle of multi thread Download

3. How to download the request in sections

3.1. How to segment the requested data.

3.2. How to assemble the data downloaded in sections into a complete data file.

4. Key code implementation

3. Achievements

4, summary

5. Reference articles

1. Use scenario

Recently, we need to migrate audio and video files and document files previously on Baidu public cloud to alicloud. There's also a little episode here; a colleague came up with a way to send a mobile hard disk to Baidu cloud and let it Copy directly to the mobile hard disk. It is certainly impossible to follow the regular process. As a large enterprise, it must be operated in a standard way. Personal opinions should be able to contact a Baidu docking personnel through the corresponding sales personnel or the company; to negotiate with Baidu as a form of migration project; the payment. This is also possible. Finally, I can only use Baidu to provide API documents; get the network Url path of the original audio and video files. Write the related multi-threaded download file. Finally, Download 230G audio and video files on the accumulation account and 850G on the other account respectively.

2. Principle of multi thread Download

  • Client to download a file, first request the server, the server will transfer the file to the client, the client saved to the local, completed a download process.
  • The idea of multi thread download is that the client starts multiple threads to download at the same time. Each thread is only responsible for a part of the download file. When all threads download, the file download is completed.
    • It's not that the more threads, the faster downloads, have a lot to do with the network environment.
    • In the same network environment, multi thread download speed is higher than single thread.
    • Multi thread download takes up more resources than single thread, which is equivalent to exchanging resources for speed.

Multithreaded download technology is a very common download scheme, which makes full use of the advantages of multithreading. In the same period of time, multiple threads initiate download requests, and divide the data to be downloaded into multiple parts. Each thread is only responsible for downloading one part, and then assemble the downloaded data into a complete data file, which greatly speeds up the process. Download efficiency. Common downloaders, thunderbolt, QQ tornado, etc. all adopt this technology.

3. How to download the request in sections

3.1. How to segment the requested data.

Range, a new header field added in HTTP/1.1, allows clients to actually request only a part of a document, or a range.

                        Of course, there is a premise that the object has not changed from the last time the client requested the entity to the time when the scope request was issued. For example:

GET /bigfile.html HTTP/1.1
Host: www.joes-hardware.com
Range: bytes=4000-
User-Agent: Mozilla/4.61 [en] (WinNT; I)

In the above request, the client requests the part after the first 4000 bytes of the document (it is not necessary to give the number of the last bytes, because the requester may not know the size of the document). This form of Range request can be used if the client fails after receiving the first 4000 bytes. You can also use the Range header to request multiple ranges (these ranges can be given in any order, or they can overlap each other).

The Range header field is used as follows. For example:

Indicates the first 500 bytes: bytes=0-499  
Indicates the second 500 bytes: bytes=500-999  
Indicates the last 500 bytes: bytes=-500  
Indicates the range after 500 bytes: bytes=500-  
First and last bytes: bytes=0-0,-1 

The server receives the request message of thread 3 and finds that it is a GET request with Range header.

If everything is normal, the * * response message of the server will have the following line:

HTTP/1.1 206 OK**

Indicates that the request is processed successfully, and the response message has this line.

Content-Range: bytes 200-299/403

403 after the slash indicates the size of the file

The development of Http protocol

The HTTP protocol has gone through three versions of evolution so far. The first HTTP protocol was born in March 1989.

xml attribute describe
HTTP/0.9 1991
HTTP/1.0 1992-1996 years
HTTP/1.1 1997-1999 years
HTTP/2.0 2012-2014 years

That is to say, HTTP/1.1 has been used since 1997-1999, so now it basically supports breakpoint retransmission.

3.2. How to assemble the data downloaded in sections into a complete data file.

RandomAccessFile class

RandomAccessFile is suitable for files composed of records with known size, so we can use seek() to transfer records from one place to another, and then read or modify the records.

Random access to files behaves like a large byte array stored in the file system. There is a cursor or index to the implied array, which is called a file pointer; the input operation reads bytes from the file pointer and moves the file pointer forward as the bytes are read. If a random access file is created in read / write mode, the output operation is also available; the output operation writes bytes from the file pointer and moves the file pointer forward as bytes are written. An output operation after writing the current end of an implied array causes the array to expand. The file pointer can be read through the getFilePointer method and set through the seek method.

Download some resources through UrlConnection.
Note:
Range header required, key: range value: bytes:0-499
          urlconnection.setRequestPropety("Range","bytes:0-499")
2. You need to set the start location of each thread in the local file.
          RandomAccessFile randomfile =new RandomAccessFile(File file,String mode)
"> randomfile.seek(int startPostion); / / the starting location of this thread download and save.

Creates an optional random access File stream to read from and write to, which is specified by the File parameter. A new FileDescriptor object will be created to represent the connection to this File.
The mode parameter specifies the access mode used to open the file. Allowed values and their meanings are:

"r" - open as read-only. Any write method that calls the result object will cause an IOException to be thrown.
"rw" - open for reading and writing. If the file does not already exist, try creating it.
"rws" - open for reading and writing. For "rw", it also requires that every update of the content or metadata of the file be synchronously written to the underlying storage device.
"rwd" - open for reading and writing. For "rw", it also requires that every update of the file content be synchronously written to the underlying storage device.

4. Key code implementation

DownloadConstans.java

package com.wdcloud.publiccloud.files.tool.download.filedownload;

import java.util.concurrent.*;

/**
 * @Description
 * @auther jianxiapc
 * @create 2019-08-20 11:20
 */
public class DownloadConstans {
    public static final int MAX_THREAD_COUNT = getSystemProcessCount();
    private static final int MAX_IMUMPOOLSIZE = MAX_THREAD_COUNT;

    /**
     * Custom thread pool
     */
    private static ExecutorService MY_THREAD_POOL;
    /**
     * Custom thread pool
     */
    public static ExecutorService getMyThreadPool(){
        if(MY_THREAD_POOL == null){
            MY_THREAD_POOL = Executors.newFixedThreadPool(MAX_IMUMPOOLSIZE);
        }
        return MY_THREAD_POOL;
    }

    // Thread pool
    private static ThreadPoolExecutor threadPool;

    /**
     * Single instance, single task thread pool
     * @return
     */
    public static ThreadPoolExecutor getThreadPool(){
        if(threadPool == null){
            threadPool = new ThreadPoolExecutor(MAX_IMUMPOOLSIZE, MAX_IMUMPOOLSIZE, 3, TimeUnit.SECONDS,
                    new ArrayBlockingQueue<Runnable>(16),
                    new ThreadPoolExecutor.CallerRunsPolicy()
            );
        }
        return threadPool;
    }

    /**
     * Get the cpu cores of the server
     * @return
     */
    private static int getSystemProcessCount(){
        //int count =Runtime.getRuntime().availableProcessors();
        //Only four threads are started for download
        int count=4;
        return count;
    }
}

FileMultiPartDownLoad.java

package com.wdcloud.publiccloud.files.tool.download.filedownload;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.locks.ReentrantLock;

/**
 * @Description
 * @auther jianxiapc
 * @create 2019-08-20 11:02
 */
public class FileMultiPartDownLoad {
    private static Logger logger = LoggerFactory.getLogger(FileMultiPartDownLoad.class);

    /**
     * Thread download success flag
     */
    private static int flag = 0;

    /**
     * Server request path
     */
    private String netWorkFileUrlPath;
    /**
     * Local path
     */
    private String localPath;
    /**
     * Thread count synchronization auxiliary
     */
    private CountDownLatch latch;

    // Fixed length route pool
    private static ExecutorService threadPool;

    public FileMultiPartDownLoad(String netWorkFileUrlPath, String localPath) {
        this.netWorkFileUrlPath = netWorkFileUrlPath;
        this.localPath = localPath;
    }

    public boolean executeDownLoad() {
        try {
            URL url = new URL(netWorkFileUrlPath);
            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            conn.setConnectTimeout(5000);//Set timeout
            conn.setRequestMethod("GET");//Set request method
            conn.setRequestProperty("Connection", "Keep-Alive");
            int code = conn.getResponseCode();
            if (code != 200) {
                logger.error(String.format("Invalid network address:%s", netWorkFileUrlPath));
                return false;
            }
            //The length of the data returned by the server is actually the length of the file, in bytes
//            int length = conn.getContentLength(); / / if the file exceeds 2G, there will be problems.
            long length = getRemoteFileSize(netWorkFileUrlPath);

            logger.info("Total file length:" + length + "byte(B)");
            RandomAccessFile raf = new RandomAccessFile(localPath, "rwd");
            //Specifies the length of the file created
            raf.setLength(length);
            raf.close();
            //Split file
            int partCount = DownloadConstans.MAX_THREAD_COUNT;
            int partSize = (int)(length / partCount);
            latch = new CountDownLatch(partCount);
            threadPool = DownloadConstans.getMyThreadPool();
            for (int threadId = 1; threadId <= partCount; threadId++) {
                // Start location of each thread Download
                long startIndex = (threadId - 1) * partSize;
                // End location of each thread Download
                long endIndex = startIndex + partSize - 1;
                if (threadId == partCount) {
                    //The length of the last thread download is a little longer
                    endIndex = length;
                }
                logger.info("thread" + threadId + "download:" + startIndex + "byte~" + endIndex + "byte");
                threadPool.execute(new DownLoadThread(threadId, startIndex, endIndex, latch));
            }
            latch.await();
            if(flag == 0){
                return true;
            }
        } catch (Exception e) {
            logger.error(String.format("File download failed, file address:%s,Failure reason:%s", netWorkFileUrlPath, e.getMessage()), e);
        }
        return false;
    }


    /**
     * Inner class for download
     */
    public class DownLoadThread implements Runnable {
        private Logger logger = LoggerFactory.getLogger(DownLoadThread.class);

        /**
         * Thread ID
         */
        private int threadId;
        /**
         * Download from
         */
        private long startIndex;
        /**
         * Download end location
         */
        private long endIndex;

        private CountDownLatch latch;

        public DownLoadThread(int threadId, long startIndex, long endIndex, CountDownLatch latch) {
            this.threadId = threadId;
            this.startIndex = startIndex;
            this.endIndex = endIndex;
            this.latch = latch;
        }

        @Override
        public void run() {
            try {
                //logger.info("thread" + threadId + "downloading...);
                URL url = new URL(netWorkFileUrlPath);
                HttpURLConnection conn = (HttpURLConnection) url.openConnection();
                conn.setRequestProperty("Connection", "Keep-Alive");
                conn.setRequestMethod("GET");
                //The specified location of the file requesting the server to download part
                conn.setRequestProperty("Range", "bytes=" + startIndex + "-" + endIndex);
                conn.setConnectTimeout(5000);
                int code = conn.getResponseCode();
                //logger.info("thread" + threadId + "request return code=" + code);
                InputStream is = conn.getInputStream();//Return resources
                RandomAccessFile raf = new RandomAccessFile(localPath, "rwd");
                //Where to start when writing files at random
                raf.seek(startIndex);//Location file
                int len = 0;
                byte[] buffer = new byte[1024];
                while ((len = is.read(buffer)) != -1) {
                    raf.write(buffer, 0, len);
                }
                is.close();
                raf.close();
                logger.info("thread" + threadId + "Download completed");
            } catch (Exception e) {
                //Thread download error
                FileMultiPartDownLoad.flag = 1;
                logger.error(e.getMessage(),e);
            } finally {
                //Count minus one
                latch.countDown();
            }

        }
    }

    /**
     * Internal method, get remote file size
     * @param remoteFileUrl
     * @return
     * @throws IOException
     */
    private long getRemoteFileSize(String remoteFileUrl) throws IOException {
        long fileSize = 0;
        HttpURLConnection httpConnection = (HttpURLConnection) new URL(remoteFileUrl).openConnection();
        httpConnection.setRequestMethod("HEAD");
        int responseCode = 0;
        try {
            responseCode = httpConnection.getResponseCode();
        } catch (IOException e) {
            e.printStackTrace();
        }
        if (responseCode >= 400) {
            logger.debug("Web Server response error!");
            return 0;
        }
        String sHeader;
        for (int i = 1;; i++) {
            sHeader = httpConnection.getHeaderFieldKey(i);
            if (sHeader != null && sHeader.equals("Content-Length")) {
                fileSize = Long.parseLong(httpConnection.getHeaderField(sHeader));
                break;
            }
        }
        return fileSize;
    }

    /**
     * Download File actuator
     * @param netWorkFileUrlPath
     * @param localDirPath
     * @param fileName
     * @return
     */
    public synchronized static String downLoad(String netWorkFileUrlPath,String localDirPath,String fileName) {
        ReentrantLock lock = new ReentrantLock();
        lock.lock();

        String[] names = netWorkFileUrlPath.split("\\.");
        if (names == null || names.length <= 0) {
            return null;
        }
        String fileTypeName = names[names.length - 1];
        String localStorageDirPath =localDirPath+"/" +fileName;
        System.out.println("localStorageDirPath: "+localStorageDirPath);
        FileMultiPartDownLoad m = new FileMultiPartDownLoad(netWorkFileUrlPath, localStorageDirPath);
        long startTime = System.currentTimeMillis();
        boolean flag = false;
        try{
            flag = m.executeDownLoad();
            long endTime = System.currentTimeMillis();
            if(flag){
                logger.info(fileName+" : End of file download,Total time consuming" + (endTime - startTime)+ "ms");
                return localStorageDirPath;
            }
            logger.warn("File download failed");
            return null;
        }catch (Exception ex){
            logger.error(ex.getMessage(),ex);
            return null;
        }finally {
            FileMultiPartDownLoad.flag = 0; // Reset download status
            if(!flag){
                File file = new File(localStorageDirPath);
                file.delete();
            }
            lock.unlock();
        }
    }
}

Call method code

   /**
          * First, the basic information is obtained by calling the API interface of Baidu SDK, and then a single vod video file is downloaded using multilinear.
     * @param bceClient
     * @param vodMediaId Video id
     * @param fileStorageDiskPath Store download file path
     * @param excelFileName excel for saving file information after downloading
     * @param expiredInSeconds Expiration time 3600s by default
     */
    public void downloadSingleVodMediaFile (VodClient bceClient, String vodMediaId,String fileStorageDiskPath,String excelFileName,long expiredInSeconds) {
    	logger.info("vodMediaId = " + vodMediaId);
    	GetMediaSourceDownloadResponse response = bceClient.getMediaSourceDownload(vodMediaId,expiredInSeconds);
        String netWorkFileUrl = response.getSourceUrl();
        logger.info("netWorkFileUrl = " + netWorkFileUrl);
        //Test thread download and multi thread Download
        Date startDate = new Date();
        long downLoadStartTime=System.currentTimeMillis();
        //System.out.println("downLoadStartTime: "+downLoadStartTime);
        logger.info("downLoadStartTime: "+sdf.format(startDate));
        //OkHttpDownloadUtil.downNetWorkFile(netWorkFileUrl,fileStorageDiskPath,"single.mp4");
        Map<String, Object> vodFileInfoMap = getVodFileInfoByVodId(bceClient, vodMediaId);
        String fileName =vodFileInfoMap.get("title").toString();
        //Function call for downloading the current file
        FileMultiPartDownLoad.downLoad(netWorkFileUrl,fileStorageDiskPath,fileName);
        Date endDate = new Date();
        long downLoadEndTime=System.currentTimeMillis();
        long customDownloadTime=downLoadEndTime-downLoadStartTime;
        String downloadTimeFormat=CommonConvertUtils.formatMillisTime(customDownloadTime);
        //System.out.println("downloadTimeFormat: "+downloadTimeFormat);
        logger.info("file "+vodMediaId+" Download start time"+sdf.format(startDate)+" Download completed:"+sdf.format(endDate));
        logger.info("file "+vodMediaId+" downloadTimeFormat: "+downloadTimeFormat);
    }

3. Achievements

4, summary

Through the code writing of this download file, I deeply understand the point of multi-threaded download; at the same time, I systematically learn how to download files by multi-threaded. It also practices how to use multithreading to download files.

5. Reference articles

The principle and implementation of Java Multithread Download

The advantages of using multithreading in Java and the principle of breakpoint continuation

The principle of multi thread accelerating Download

Java -- multi thread breakpoint Download

The principle of multi thread download and multi thread breakpoint Download

Topics: Java network Mobile xml