Catalog
2. Principle of multi thread Download
3. How to download the request in sections
3.1. How to segment the requested data.
3.2. How to assemble the data downloaded in sections into a complete data file.
1. Use scenario
Recently, we need to migrate audio and video files and document files previously on Baidu public cloud to alicloud. There's also a little episode here; a colleague came up with a way to send a mobile hard disk to Baidu cloud and let it Copy directly to the mobile hard disk. It is certainly impossible to follow the regular process. As a large enterprise, it must be operated in a standard way. Personal opinions should be able to contact a Baidu docking personnel through the corresponding sales personnel or the company; to negotiate with Baidu as a form of migration project; the payment. This is also possible. Finally, I can only use Baidu to provide API documents; get the network Url path of the original audio and video files. Write the related multi-threaded download file. Finally, Download 230G audio and video files on the accumulation account and 850G on the other account respectively.
2. Principle of multi thread Download
- Client to download a file, first request the server, the server will transfer the file to the client, the client saved to the local, completed a download process.
- The idea of multi thread download is that the client starts multiple threads to download at the same time. Each thread is only responsible for a part of the download file. When all threads download, the file download is completed.
- It's not that the more threads, the faster downloads, have a lot to do with the network environment.
- In the same network environment, multi thread download speed is higher than single thread.
- Multi thread download takes up more resources than single thread, which is equivalent to exchanging resources for speed.
Multithreaded download technology is a very common download scheme, which makes full use of the advantages of multithreading. In the same period of time, multiple threads initiate download requests, and divide the data to be downloaded into multiple parts. Each thread is only responsible for downloading one part, and then assemble the downloaded data into a complete data file, which greatly speeds up the process. Download efficiency. Common downloaders, thunderbolt, QQ tornado, etc. all adopt this technology.
3. How to download the request in sections
3.1. How to segment the requested data.
Range, a new header field added in HTTP/1.1, allows clients to actually request only a part of a document, or a range.
Of course, there is a premise that the object has not changed from the last time the client requested the entity to the time when the scope request was issued. For example:
GET /bigfile.html HTTP/1.1 Host: www.joes-hardware.com Range: bytes=4000- User-Agent: Mozilla/4.61 [en] (WinNT; I)
In the above request, the client requests the part after the first 4000 bytes of the document (it is not necessary to give the number of the last bytes, because the requester may not know the size of the document). This form of Range request can be used if the client fails after receiving the first 4000 bytes. You can also use the Range header to request multiple ranges (these ranges can be given in any order, or they can overlap each other).
The Range header field is used as follows. For example:
Indicates the first 500 bytes: bytes=0-499 Indicates the second 500 bytes: bytes=500-999 Indicates the last 500 bytes: bytes=-500 Indicates the range after 500 bytes: bytes=500- First and last bytes: bytes=0-0,-1
The server receives the request message of thread 3 and finds that it is a GET request with Range header.
If everything is normal, the * * response message of the server will have the following line:
HTTP/1.1 206 OK**
Indicates that the request is processed successfully, and the response message has this line.
Content-Range: bytes 200-299/403
403 after the slash indicates the size of the file
The development of Http protocol
The HTTP protocol has gone through three versions of evolution so far. The first HTTP protocol was born in March 1989.
xml attribute | describe |
---|---|
HTTP/0.9 | 1991 |
HTTP/1.0 | 1992-1996 years |
HTTP/1.1 | 1997-1999 years |
HTTP/2.0 | 2012-2014 years |
That is to say, HTTP/1.1 has been used since 1997-1999, so now it basically supports breakpoint retransmission.
3.2. How to assemble the data downloaded in sections into a complete data file.
RandomAccessFile class
RandomAccessFile is suitable for files composed of records with known size, so we can use seek() to transfer records from one place to another, and then read or modify the records.
Random access to files behaves like a large byte array stored in the file system. There is a cursor or index to the implied array, which is called a file pointer; the input operation reads bytes from the file pointer and moves the file pointer forward as the bytes are read. If a random access file is created in read / write mode, the output operation is also available; the output operation writes bytes from the file pointer and moves the file pointer forward as bytes are written. An output operation after writing the current end of an implied array causes the array to expand. The file pointer can be read through the getFilePointer method and set through the seek method.
Download some resources through UrlConnection.
Note:
Range header required, key: range value: bytes:0-499
urlconnection.setRequestPropety("Range","bytes:0-499")
2. You need to set the start location of each thread in the local file.
RandomAccessFile randomfile =new RandomAccessFile(File file,String mode)
"> randomfile.seek(int startPostion); / / the starting location of this thread download and save.
Creates an optional random access File stream to read from and write to, which is specified by the File parameter. A new FileDescriptor object will be created to represent the connection to this File.
The mode parameter specifies the access mode used to open the file. Allowed values and their meanings are:
"r" - open as read-only. Any write method that calls the result object will cause an IOException to be thrown.
"rw" - open for reading and writing. If the file does not already exist, try creating it.
"rws" - open for reading and writing. For "rw", it also requires that every update of the content or metadata of the file be synchronously written to the underlying storage device.
"rwd" - open for reading and writing. For "rw", it also requires that every update of the file content be synchronously written to the underlying storage device.
4. Key code implementation
DownloadConstans.java
package com.wdcloud.publiccloud.files.tool.download.filedownload; import java.util.concurrent.*; /** * @Description * @auther jianxiapc * @create 2019-08-20 11:20 */ public class DownloadConstans { public static final int MAX_THREAD_COUNT = getSystemProcessCount(); private static final int MAX_IMUMPOOLSIZE = MAX_THREAD_COUNT; /** * Custom thread pool */ private static ExecutorService MY_THREAD_POOL; /** * Custom thread pool */ public static ExecutorService getMyThreadPool(){ if(MY_THREAD_POOL == null){ MY_THREAD_POOL = Executors.newFixedThreadPool(MAX_IMUMPOOLSIZE); } return MY_THREAD_POOL; } // Thread pool private static ThreadPoolExecutor threadPool; /** * Single instance, single task thread pool * @return */ public static ThreadPoolExecutor getThreadPool(){ if(threadPool == null){ threadPool = new ThreadPoolExecutor(MAX_IMUMPOOLSIZE, MAX_IMUMPOOLSIZE, 3, TimeUnit.SECONDS, new ArrayBlockingQueue<Runnable>(16), new ThreadPoolExecutor.CallerRunsPolicy() ); } return threadPool; } /** * Get the cpu cores of the server * @return */ private static int getSystemProcessCount(){ //int count =Runtime.getRuntime().availableProcessors(); //Only four threads are started for download int count=4; return count; } }
FileMultiPartDownLoad.java
package com.wdcloud.publiccloud.files.tool.download.filedownload; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.File; import java.io.IOException; import java.io.InputStream; import java.io.RandomAccessFile; import java.net.HttpURLConnection; import java.net.URL; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import java.util.concurrent.locks.ReentrantLock; /** * @Description * @auther jianxiapc * @create 2019-08-20 11:02 */ public class FileMultiPartDownLoad { private static Logger logger = LoggerFactory.getLogger(FileMultiPartDownLoad.class); /** * Thread download success flag */ private static int flag = 0; /** * Server request path */ private String netWorkFileUrlPath; /** * Local path */ private String localPath; /** * Thread count synchronization auxiliary */ private CountDownLatch latch; // Fixed length route pool private static ExecutorService threadPool; public FileMultiPartDownLoad(String netWorkFileUrlPath, String localPath) { this.netWorkFileUrlPath = netWorkFileUrlPath; this.localPath = localPath; } public boolean executeDownLoad() { try { URL url = new URL(netWorkFileUrlPath); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setConnectTimeout(5000);//Set timeout conn.setRequestMethod("GET");//Set request method conn.setRequestProperty("Connection", "Keep-Alive"); int code = conn.getResponseCode(); if (code != 200) { logger.error(String.format("Invalid network address:%s", netWorkFileUrlPath)); return false; } //The length of the data returned by the server is actually the length of the file, in bytes // int length = conn.getContentLength(); / / if the file exceeds 2G, there will be problems. long length = getRemoteFileSize(netWorkFileUrlPath); logger.info("Total file length:" + length + "byte(B)"); RandomAccessFile raf = new RandomAccessFile(localPath, "rwd"); //Specifies the length of the file created raf.setLength(length); raf.close(); //Split file int partCount = DownloadConstans.MAX_THREAD_COUNT; int partSize = (int)(length / partCount); latch = new CountDownLatch(partCount); threadPool = DownloadConstans.getMyThreadPool(); for (int threadId = 1; threadId <= partCount; threadId++) { // Start location of each thread Download long startIndex = (threadId - 1) * partSize; // End location of each thread Download long endIndex = startIndex + partSize - 1; if (threadId == partCount) { //The length of the last thread download is a little longer endIndex = length; } logger.info("thread" + threadId + "download:" + startIndex + "byte~" + endIndex + "byte"); threadPool.execute(new DownLoadThread(threadId, startIndex, endIndex, latch)); } latch.await(); if(flag == 0){ return true; } } catch (Exception e) { logger.error(String.format("File download failed, file address:%s,Failure reason:%s", netWorkFileUrlPath, e.getMessage()), e); } return false; } /** * Inner class for download */ public class DownLoadThread implements Runnable { private Logger logger = LoggerFactory.getLogger(DownLoadThread.class); /** * Thread ID */ private int threadId; /** * Download from */ private long startIndex; /** * Download end location */ private long endIndex; private CountDownLatch latch; public DownLoadThread(int threadId, long startIndex, long endIndex, CountDownLatch latch) { this.threadId = threadId; this.startIndex = startIndex; this.endIndex = endIndex; this.latch = latch; } @Override public void run() { try { //logger.info("thread" + threadId + "downloading...); URL url = new URL(netWorkFileUrlPath); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestProperty("Connection", "Keep-Alive"); conn.setRequestMethod("GET"); //The specified location of the file requesting the server to download part conn.setRequestProperty("Range", "bytes=" + startIndex + "-" + endIndex); conn.setConnectTimeout(5000); int code = conn.getResponseCode(); //logger.info("thread" + threadId + "request return code=" + code); InputStream is = conn.getInputStream();//Return resources RandomAccessFile raf = new RandomAccessFile(localPath, "rwd"); //Where to start when writing files at random raf.seek(startIndex);//Location file int len = 0; byte[] buffer = new byte[1024]; while ((len = is.read(buffer)) != -1) { raf.write(buffer, 0, len); } is.close(); raf.close(); logger.info("thread" + threadId + "Download completed"); } catch (Exception e) { //Thread download error FileMultiPartDownLoad.flag = 1; logger.error(e.getMessage(),e); } finally { //Count minus one latch.countDown(); } } } /** * Internal method, get remote file size * @param remoteFileUrl * @return * @throws IOException */ private long getRemoteFileSize(String remoteFileUrl) throws IOException { long fileSize = 0; HttpURLConnection httpConnection = (HttpURLConnection) new URL(remoteFileUrl).openConnection(); httpConnection.setRequestMethod("HEAD"); int responseCode = 0; try { responseCode = httpConnection.getResponseCode(); } catch (IOException e) { e.printStackTrace(); } if (responseCode >= 400) { logger.debug("Web Server response error!"); return 0; } String sHeader; for (int i = 1;; i++) { sHeader = httpConnection.getHeaderFieldKey(i); if (sHeader != null && sHeader.equals("Content-Length")) { fileSize = Long.parseLong(httpConnection.getHeaderField(sHeader)); break; } } return fileSize; } /** * Download File actuator * @param netWorkFileUrlPath * @param localDirPath * @param fileName * @return */ public synchronized static String downLoad(String netWorkFileUrlPath,String localDirPath,String fileName) { ReentrantLock lock = new ReentrantLock(); lock.lock(); String[] names = netWorkFileUrlPath.split("\\."); if (names == null || names.length <= 0) { return null; } String fileTypeName = names[names.length - 1]; String localStorageDirPath =localDirPath+"/" +fileName; System.out.println("localStorageDirPath: "+localStorageDirPath); FileMultiPartDownLoad m = new FileMultiPartDownLoad(netWorkFileUrlPath, localStorageDirPath); long startTime = System.currentTimeMillis(); boolean flag = false; try{ flag = m.executeDownLoad(); long endTime = System.currentTimeMillis(); if(flag){ logger.info(fileName+" : End of file download,Total time consuming" + (endTime - startTime)+ "ms"); return localStorageDirPath; } logger.warn("File download failed"); return null; }catch (Exception ex){ logger.error(ex.getMessage(),ex); return null; }finally { FileMultiPartDownLoad.flag = 0; // Reset download status if(!flag){ File file = new File(localStorageDirPath); file.delete(); } lock.unlock(); } } }
Call method code
/** * First, the basic information is obtained by calling the API interface of Baidu SDK, and then a single vod video file is downloaded using multilinear. * @param bceClient * @param vodMediaId Video id * @param fileStorageDiskPath Store download file path * @param excelFileName excel for saving file information after downloading * @param expiredInSeconds Expiration time 3600s by default */ public void downloadSingleVodMediaFile (VodClient bceClient, String vodMediaId,String fileStorageDiskPath,String excelFileName,long expiredInSeconds) { logger.info("vodMediaId = " + vodMediaId); GetMediaSourceDownloadResponse response = bceClient.getMediaSourceDownload(vodMediaId,expiredInSeconds); String netWorkFileUrl = response.getSourceUrl(); logger.info("netWorkFileUrl = " + netWorkFileUrl); //Test thread download and multi thread Download Date startDate = new Date(); long downLoadStartTime=System.currentTimeMillis(); //System.out.println("downLoadStartTime: "+downLoadStartTime); logger.info("downLoadStartTime: "+sdf.format(startDate)); //OkHttpDownloadUtil.downNetWorkFile(netWorkFileUrl,fileStorageDiskPath,"single.mp4"); Map<String, Object> vodFileInfoMap = getVodFileInfoByVodId(bceClient, vodMediaId); String fileName =vodFileInfoMap.get("title").toString(); //Function call for downloading the current file FileMultiPartDownLoad.downLoad(netWorkFileUrl,fileStorageDiskPath,fileName); Date endDate = new Date(); long downLoadEndTime=System.currentTimeMillis(); long customDownloadTime=downLoadEndTime-downLoadStartTime; String downloadTimeFormat=CommonConvertUtils.formatMillisTime(customDownloadTime); //System.out.println("downloadTimeFormat: "+downloadTimeFormat); logger.info("file "+vodMediaId+" Download start time"+sdf.format(startDate)+" Download completed:"+sdf.format(endDate)); logger.info("file "+vodMediaId+" downloadTimeFormat: "+downloadTimeFormat); }
3. Achievements
4, summary
Through the code writing of this download file, I deeply understand the point of multi-threaded download; at the same time, I systematically learn how to download files by multi-threaded. It also practices how to use multithreading to download files.
5. Reference articles
The principle and implementation of Java Multithread Download
The advantages of using multithreading in Java and the principle of breakpoint continuation
The principle of multi thread accelerating Download
Java -- multi thread breakpoint Download
The principle of multi thread download and multi thread breakpoint Download