For example: combined with Java Net package and URLConnection interface can realize the simple download function of network resources.
The following first introduces the definition and access path representation of File file in Java program; Secondly, it briefly describes the relationship between network resources and files, and the relationship between URL resource locator and File path; Thirdly, it introduces how to complete the local download operation of network resources through URLConnection; Finally, it introduces the case of HttpURLConnection to simulate the browser to search movie resources.
1. Analysis of file annotation
JDK The document comments in the source code will File Classes are described as (logical) abstract representations of (physical) files and paths. The user interface and operating system use strings that depend on the system path name to represent "files and paths"( user interfaces and operating systems use system-dependent pathname-strings to name files and directories),File Class represents an abstract( abstract),System independent( system-independent ) In the hierarchical path view, an abstract path name contains two parts: [1]An optional system independent prefix string—— Path separator:/perhaps\\ [2]0 One or more file names To: F:\Java-dependencies\apache-tomcat-9.0.43\apache-tomcat-9.0.43\conf\server.xml perhaps https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json, for example, -->The first name of the abstract path may be a folder name(F:\Java-dependencies\apache-tomcat-9.0.43\ apache-tomcat-9.0.43\conf\),It may also be a host name( geo.datav.aliyun.com),Characters at each level The sequence can be located to a folder (or a level domain name), and the last name is the file name( server.xml, 100000_full.json)Or directory name.
Combined with the actual code, it is not difficult to find that even if there is no path or File on the disk, the object of File class can still be created through the new keyword, which is precisely because of the abstract ion and system independence of the File at the logical concept level.
However, at the physical concept level, when actually moving, copying, deleting and renaming a File object, the corresponding string path must be required to exist on the disk. This is precisely because - at the bottom of the operating system under the JVM, User interfaces and operating systems use system dependent pathname strings to name files and directories
At present, the file path can be divided into relative path and absolute path, which will not be studied here. Just be clear: the string path used to represent the file in Java can be either this or that:
So: F:\Java-dependencies\apache-tomcat-9.0.43\apache-tomcat-9.0.43\conf\server.xml That way: https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json
2 Relationship between network resources and documents
According to the annotation analysis of 1 File class file, since the file path can be abstractly expressed as: https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json Then, the essence of network resources should be: an ordinary file stored on the server, such as pdf, xml, png, mp4
There are different types of files, so it is natural for network resources. The types of files can be distinguished according to the encoding format (GBK, UTF-8...), suffix (*. pdf, *. jpg, *. xml, *. png, *. mp3...), etc. How to distinguish the types of network resources?
Network resources are also identified by content type response header tag in HTTP hypertext transfer protocol, which is used to tell the client-side - the content type of network resources currently being returned by the server-side. This content type can be set through the setContentType(String) method of ServletResponse, which will determine the form and encoding of the browser client to read this network resource (file). For example, the context type shown below is text/html, which indicates that the network resource currently being accessed is an HTML file.
The value of context type attribute is called MIME type (i.e. media type). Common media formats are as follows and can also be accessed https://www.runoob.com/http/http-content-type.html View more MIME type values.
3 network resources and Java net. URL class
Through the above interpretation, it can be seen that the network resource is essentially a common file resource existing on the server. The type of this resource file is also called MIME media type, which is implemented by HTTP as a part of the standard. So, how to access this network resource?
3.1 path representation of network resources
The path of the network resource is represented by a URL.
The URL (Uniform Resource Locator) is called the resource locator. Baidu interprets it as:
stay WWW(World Wide Web)On the world wide web, each network resource has a unified and unique address on the Internet, which is called URL(Uniform Resource Locator,Uniform resource locator), which is the uniform resource location mark of the world wide web, refers to the network address.
Materialize the URL, that is, the string in the address bar of the browser window we usually see.
URL resource locator consists of four parts: protocol, host, port and path. The general syntax rules are as follows:
protocol :// hostname[:port] / path / [:parameters][?query]#fragment For example: https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json Then: protocol: https Domain name( hostname+port): geo.datav.aliyun.com((the host name and port number can be obtained through the domain name with the help of the domain name server) path: areas_v3/bound/100000_full.json
You can also use: Online domain name resolution tool.
3.2 java.net.URL class
In the Java programming language, its URL is abstracted as Java Net package. The part about its document comments is resolved as follows:
URL Class represents a resource locator( Uniform Resource Locator),Point to the World Wide Web( the World Wide Web)A resource on( Resource). A resource can be a simple file or directory Java Is abstracted as File Class), or It can be a more complex object (other types: a query operation for a database or search engine). One URL The port number of is optional. If it is not explicitly specified, it defaults to 80. URL Class can implement something like 3.1 The "domain name resolution" function is mentioned in (but this URL Class itself does not have this function, but is internally maintained URLStreamHandler Abstract classes)
Get URL-https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json Basic information about.
package com.xwd.demo; import java.io.File; import java.io.IOException; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; public class URLDemo{ //methods public static void main(String[] args) { try { URL url=new URL("https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json"); URLConnection connection=url.openConnection(); System.out.println(url.toString()); //Get communication protocol String protocol = url.getProtocol(); System.out.println("agreement="+protocol); //Get host name String host = url.getHost(); System.out.println("host="+host); //Get port number int port = url.getPort(); int defaultPort = url.getDefaultPort(); System.out.println("port:"+port+",defaultPort:"+defaultPort); //Get request parameters String query = url.getQuery(); System.out.println("Query parameters="+query); // String userInfo = url.getUserInfo(); System.out.println("User information="+userInfo); String ref = url.getRef(); System.out.println("URL Anchor point of#="+ref); String authority = url.getAuthority(); System.out.println("jurisdiction="+authority); //Get file name String file = url.getFile(); System.out.println("file name="+file); //Get path name String path = url.getPath(); System.out.println("File path"+path); } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
4 java.net.URLConnection and simple reading and writing of network resources
4.1 introduction to urlconnection
Based on the above interpretation, you can already get its corresponding file name and path through a URL that points to network resources. So, how to realize the read-write operation of network resources? For example: download a network resource locally through a URL.
Like file IO operations, IO operations of network resources need to obtain a URL connection object between a client and the server where the network resource is located, and then complete various IO operations through this connection object. The basic principle is shown in the figure below.
Java. Java is provided in the Java programming language net. Urlconnection to represent the connection channel between the client and network resources. Make the following brief interpretation of its document notes,
URLConnection Abstract classes represent all client programs and URL Connection between( link)The parent class of the class. The object of this class can be called URL of openConnection()Method, which is used for reading and writing URL point Resources.
4.2 URLConnection enables simple downloading of network resources
The example code of downloading network resources is as follows:
package com.xwd.demo; import java.io.*; import java.net.URL; import java.net.URLConnection; /** * @ClassName IODemo * @Description: com.xwd.demo * @Auther: xiwd * @Date: 2022/2/4 - 02 - 04 - 16:27 * @version: 1.0 */ public class IODemo { //methods public static void main(String[] args) { URL url=null; URLConnection connection=null; InputStream inputStream=null; OutputStream outputStream=null; byte[] buffer=new byte[1024]; int len=-1; try { //Provide URL - network resource locator url = new URL("https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json"); //Gets the name of the network resource String file = url.getFile(); String filename= file.lastIndexOf("/")==-1?file: file.substring(file.lastIndexOf("/")+1,file.length()); //Gets the connection object between the client and the URL connection = url.openConnection(); //Get input stream object inputStream = connection.getInputStream(); //Get the output stream object -- and specify the local storage location of network resources outputStream = new FileOutputStream(filename); //Download Network Resources while ((len = inputStream.read(buffer)) != -1) { outputStream.write(buffer,0,len); } System.out.println("SUCCESS"); } catch (IOException e) { e.printStackTrace(); System.out.println("FAILED"); } finally { //Free stream resources if (outputStream!=null) { try { outputStream.close(); } catch (IOException e) { e.printStackTrace(); } } if (inputStream!=null) { try { inputStream.close(); } catch (IOException e) { e.printStackTrace(); } } } } }
4.3 HttpURLConnection
The HttpURLConnection abstract class is a subclass of the URLConnection abstract class. An object of this class can be used to send GET requests and POST requests to the specified website. However, the underlying network connections of the HTTP server may be shared by multiple objects (a multithreaded processing mechanism of HTTP server). After the request is completed, the cyber source can be accessed by calling the close() method. However, it will not have any impact on other persistent connection s.
Based on URLConnection, it provides the following convenient methods:
int getResponseCode(); // Get the response code of the server. String getResponseMessage(); // Get the response message of the server. String getResponseMethod(); // Gets the method to send the request. void setRequestMethod(String method); // Set the method of sending the request.
4.4 search movie resources with HttpURLConnection simulation browser
The example code is as follows:
Among them: the response result obtained by the GET request is the source code of an HTML page, which is too lengthy to be printed; Later, Dom4j. Com can be generated according to the source code of the HTML page Jar package to realize page information crawling operation.
package com.xwd.demo; import com.sun.jmx.snmp.SnmpNull; import java.io.*; import java.net.HttpURLConnection; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Set; /** * @ClassName IODemo * @Description: com.xwd.demo * @Auther: xiwd * @Date: 2022/2/4 - 02 - 04 - 16:27 * @version: 1.0 */ public class IODemo { //methods public static void main(String[] args) { HttpURLConnectionTest(); } //http://www.sdpxgd.com/search.php?searchword= Painting Jianghu private static void HttpURLConnectionTest(Object... args) { URL url= null; HttpURLConnection connection =null; InputStream inputStream=null; BufferedReader reader=null; try{ //Get URL object url = new URL("http://www.sdpxgd.com/search.php?searchword=%E7%94%BB%E6%B1%9F%E6%B9%96"); //Get HttpURLConnection object connection = (HttpURLConnection) url.openConnection(); //Set request parameters connection.setDoOutput(false);//Output to HttpURLConnection connection.setDoInput(true);//Whether to read from HttpURLConnection connection.setRequestMethod("GET");//Set request mode connection.setUseCaches(true);//Set whether to use cache connection.setInstanceFollowRedirects(true);//Sets whether HTTP redirection should be performed automatically connection.setConnectTimeout(3000);//Set timeout response time //Execute connection connection.connect(); //Get status code int responseCode = connection.getResponseCode(); //get data String msg=""; if (responseCode==200){ //Get input stream object inputStream = connection.getInputStream(); reader=new BufferedReader(new InputStreamReader(inputStream)); //Read information String line=null; while ((line=reader.readLine())!=null) msg+=line+"\n"; } //Query result printing //System.out.println(msg);// The HTML code of the search results page is printed here. If there is too much content, it will not be printed //Print response body information Map<String, List<String>> headerFields = connection.getHeaderFields(); Set<Map.Entry<String, List<String>>> entries = headerFields.entrySet(); Iterator<Map.Entry<String, List<String>>> iterator = entries.iterator(); while (iterator.hasNext()) { System.out.println(iterator.next().toString()); } } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (reader!=null) { try { reader.close(); } catch (IOException e) { e.printStackTrace(); } } //Disconnect if (connection!=null) { connection.disconnect(); } } } }