XML & regular expression

Posted by oc1000 on Sat, 25 Dec 2021 16:03:09 +0100

Learning objectives

1,understand XML Development and history of
2,understand XML Follow HTML Differences between
3,skilled XML Syntax rules for
3,understand XML Constraints of
4,skilled XML Analysis of
5,Proficient in regular parsing

Chapter 1 XML

1.1 general

XML (EXtensible Markup Language) an EXtensible Markup Language. It is mainly used for data exchange. During the development of HTML, due to the vicious competition of browser manufacturers, they are compatible with nonstandard writing methods to attract developers. This is contrary to the original intention of W3C. Therefore, W3C formulated XML standard to replace HTML for data display. However, this does not attract developers. XML is the most popular It ended in failure. XML then seeks the way to survive in the direction of data interaction, and has achieved some success here. The main function is now used for data interaction between configuration files and the network.

1.2 differences between XML and HTML

XML tags are custom, and HTML tags are predefined
The syntax of XML is strict and the syntax of HTML is loose
XML stores data and HTML displays data

1.3 grammar rules

The suffixes of XML files must all be XML
The first line of XML must write a document declaration
There is one and only one root tag in the XML
Attribute values must use quotation marks, either single quotation marks or double quotation marks
Labels must be closed correctly, either in pairs or self closing
XML is case sensitive

1.4 XML composition

1.4. 1 document declaration

Format: <? XML attribute list? >

Attribute list

attribute	meaning
version	Version number, attribute required
encoding	The encoding method tells the parsing engine the character set used in the current document. The default is ISO-8859-1
standalone	Whether it is independent. The value of yes means it does not depend on other files, and the value of no means it depends on other files

Chestnuts
```
<?xml version="1.0" encoding="UTF-8"?>
```

1.4. 2 instruction (understand)

You can also define instructions (CSS) in xml. After all, the original intention of inventing xml is to display data. Just understand.

demo.xml

<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet  type="text/css" href="test.css"  ?>
<users>
	<user>
		<name>Zhang San</name>
		<age>18</age>
		<sex>male</sex>
	</user>

	<user>
		<name>Li Si</name>
		<age>28</age>
		<sex>female</sex>
	</user>

</users>

test.css
```
@CHARSET "UTF-8";
name{
	color:red;
}
```

1.4. 3 label

You can also customize tags in xml, with the following rules:

The name can contain letters, numbers and other characters
Cannot start with a number or punctuation mark
The name cannot start with XML or XML, XML, etc
The name cannot contain spaces

1.4. 4 properties

Attributes can also be defined in the tag, mainly in the form of K-V pairs. Attribute values must use quotation marks, single quotation marks and double quotation marks. The id attribute value is unique.

1.4. 5 text

Text content can also be defined in a label pair. If characters that need to be escaped are used in the text, they need to be escaped, such as: & lt; (<)` ,& (&),> (>).
The text in the CDATA area will be output as is, and special characters do not need to be escaped. <! [CDATA[ ]]>
```
	<code>
		<![CDATA[
				if(a>1 && a<3){
					
				}
		]]>
	</code>
```

1.4. 6 notes

Comments in xml are the same as HTML, <! --- >

1.5 constraints

XML is mainly used for data interaction, so we can restrict these data through rules.

1.5. 1 Classification

XML constraints are mainly divided into dtd and schema. dtd is a simple constraint technology, which is out of date. Schema is a complex constraint technology with more powerful functions.

1.5. 2 use of DTD

External DTD

Define the constraint rules of DTD in the external DTD file. According to the location of external DTD files, there are local DTD and network DTD.

Local DTD <! DOCTYPE root signature SYSTEM "dtd file location" >

student.xml

<?xml version="1.0" encoding="UTF-8"    ?>
<!DOCTYPE students SYSTEM "student.dtd">  <!-- External introduction dtd    -->
<students>
	<student number="1">
		<name>six</name>
		<age>16</age>
		<sex>girl</sex>
	</student>
	
	<student number="2">
		<name>seven</name>
		<age>23</age>
		<sex>boy</sex>
	</student>
 
</students>

student.dtd

<!-- Definitions can have labels students，Can have student Sub tags,*Indicates that the quantity is 0-N，  +Indicates 1-N -->
<!ELEMENT students (student+) >
<!-- Label defined student，Can have name，age，sex Sub tags, which need to meet the order -->
<!ELEMENT student (name,age,sex) >
<!-- Definitions can have labels name，Type is string-->
<!ELEMENT name (#PCDATA) >
<!-- Definitions can have labels age，Type is string-->
<!ELEMENT age (#PCDATA) >
<!-- Definitions can have labels sex，Type is string-->
<!ELEMENT sex (#PCDATA) >
<!-- definition student Tags can have attributes number，Required -->
<!ATTLIST student number ID #REQUIRED >

Network dtd <! DOCTYPE root signature PUBLIC "dtd file name" "dtd file location URL" >

<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">

Internal DTD

DTD constraints can also be defined directly in xml files. <! DOCTYPE students [ ]>

<?xml version="1.0" encoding="UTF-8" ?>

<!-- inside dtd -->
<!DOCTYPE students [
<!ELEMENT students (student+) >
<!ELEMENT student (name,age,sex) >
<!ELEMENT name (#PCDATA) >
<!ELEMENT age (#PCDATA) >
<!ELEMENT sex (#PCDATA) >
<!ATTLIST student number ID #REQUIRED >
]>

<students>
	<student number="1">
		<name>six</name>
		<age>16</age>
		<sex>girl</sex>
	</student>
	
	<student number="2">
		<name>seven</name>
		<age>23</age>
		<sex>boy</sex>
	</student> 

</students>

1.5.3 schema

Since DTD cannot constrain the data content, we can use another more powerful technology: schema.

schema import

Let's first look at a simple rule for using schema (student.xml)

<?xml version="1.0" encoding="UTF-8"?>
 <a:students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
			xmlns:a="http://www.blb.com/xml"
			xsi:schemaLocation="http://www.blb.com/xml student.xsd"

>
	<a:student number="blb_0001">
		<a:name>six</a:name>
		<a:age>111</a:age>
		<a:sex>boy</a:sex>
	</a:student>
	
</a:students>

The suffix of the schema file is xsd.
schema import information is written in the root tag.
In the root tag, you need to introduce the prefix xmlns: xsi of xsi=“ http://www.w3.org/2001/XMLSchema-instance "
Specify the schema file (student.xsd) to use through xsi:schemaLocation, and give this file an alias http://www.blb.com/xml
Give the alias a namespace prefix through xmlns: namespace. After the prefix is defined, the label needs to be prefixed
The namespace can not be set, and the tag comes from this xsd file by default. If there are multiple schema files, there can only be one default.

Definition of schema

student.xsd

<?xml version="1.0" ?>
<xsd:schema xmlns="http://www.blb.com/xml" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
	targetNamespace="http://www.blb.com/xml" elementFormDefault="qualified">
<!--  Define label students  Type is a custom type studentsType -->
	<xsd:element name="students" type="studentsType" />
	<xsd:complexType name="studentsType">
		<xsd:sequence>
		<!-- label student，Type is a custom type studentType  The minimum number of occurrences is 1 and the maximum number is unlimited -->
			<xsd:element name="student" type="studentType" minOccurs="1"
				maxOccurs="unbounded" />
		</xsd:sequence>
	</xsd:complexType>

<!-- Define custom types studentType -->
	<xsd:complexType name="studentType">
		<xsd:sequence>
		<!-- name The label is a string type  -->
			<xsd:element name="name" type="xsd:string" />
				<!-- age Labels are custom types ageType  -->
			<xsd:element name="age" type="ageType" />
				<!-- sex Labels are custom types sexType  -->
			<xsd:element name="sex" type="sexType" />
		</xsd:sequence>
		<!-- attribute number，Type is a custom type numberType，And necessary -->
		<xsd:attribute name="number" type="numberType" use="required" />
	</xsd:complexType>

	<!-- sexType Custom type -->
	<xsd:simpleType name="sexType">
		<xsd:restriction base="xsd:string">
		<!-- Enumeration: value can only be boy perhaps girl   -->
			<xsd:enumeration value="boy" />
			<xsd:enumeration value="girl" />
		</xsd:restriction>
	</xsd:simpleType>
	<!-- Custom type ageType  Range from 0-300 between -->
	<xsd:simpleType name="ageType">
		<xsd:restriction base="xsd:integer">
			<xsd:minInclusive value="0" />
			<xsd:maxInclusive value="300" />
		</xsd:restriction>
	</xsd:simpleType>
	<!-- custom numberType Type, must be blb_Start with 4 digits -->
	<xsd:simpleType name="numberType">
		<xsd:restriction base="xsd:string">
			<xsd:pattern value="blb_\d{4}" />
		</xsd:restriction>
	</xsd:simpleType>
</xsd:schema>

You don't need to master the syntax of schema. Just know how to use it.

reference resources

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://www.springframework.org/schema/beans"
	xmlns:context="http://www.springframework.org/schema/context"
	xmlns:mvc="http://www.springframework.org/schema/mvc"
	xsi:schemaLocation="http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-4.3.xsd
		http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
		http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.3.xsd">

	<context:component-scan base-package="com.blb.ssm.control"/>
	<mvc:default-servlet-handler/>
	<mvc:annotation-driven/>
	
	<bean id="ResourceViewResolver" class="org.springframework.web.servlet.view.InternalResourceViewResolver">
		<property name="prefix" value="/WEB-INF/jsp/"/>
		<property name="suffix" value=".jsp"/>
	</bean>
		
	</beans>

1.6 XML parsing

We usually need to use programs to operate defined XML files, which requires us to be able to parse XML. The main parsing methods are Dom and SAX. DOM is mainly based on the document object model, which is easy to operate and occupies more space. SAX is event driven. We usually use the framework dom4j to parse.

DOM parsing: it is easy to use. When using DOM, all XML document information will be stored in memory, and the traversal is simple. It supports XPath and enhances ease of use. The efficiency is low, the parsing speed is slow, and the memory consumption is too high, which is almost impossible to use for large files.

SAX parsing: SAX is an event driven "push" model for handling XML. Although it is not a W3C standard, it is a widely recognized API. The biggest advantage of SAX model is low memory consumption.

1.6.1 DOM4J

dom4j is a Java XML API and an upgrade of jdom. It is used to read and write XML files. dom4j is a very excellent Java API. It has the characteristics of excellent performance, powerful function and extremely easy to use. Its performance exceeds the official dom technology of Sun company. At the same time, it is also an open source software. Today, we can see that more and more Java software are using dom4j to read and write XML. In particular, it is worth mentioning that even sun's JAXM is also using dom4j. This is already a jar package that must be used.

1.6. 2. Acquisition of document objects

Suppose the document to be parsed is student XML, as follows

<?xml version="1.0" encoding="UTF-8"?>
<students>
  <student number="blb_0001">
    <name>six</name>
    <age>28</age>
    <sex>boy</sex>
  </student>
  <student number="blb_0002">
    <name>seven</name>
    <age>29</age>
    <sex>girl</sex>
  </student>
</students>

SAX mode

InputStream in = Demo.class.getClassLoader().getResourceAsStream("student.xml");	
SAXReader reader = new SAXReader();
Document document = reader.read(in);

DOM mode

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream in = Demo3.class.getClassLoader().getResourceAsStream("student.xml");
org.w3c.dom.Document w3cdoc=db.parse(in);
DOMReader domReader=new DOMReader();
//Will go to www.org.org w3c. dom. Document to org dom4j. Document
org.dom4j.Document document=domReader.read(w3cdoc);

1.6. 3 common methods

Acquisition of root element

Element root = document.getRootElement();

Get information about all sub tags under the tag

Iterator<Element> it =  root.elementIterator();
while(it.hasNext()){
    Element e = it.next();
  //   e.getName gets the tag name of the element
    System.out.print(e.getName() + "\t");
    // e.attributeValue("number") gets the value of the attribute specified by the tag
    System.out.print(e.attributeValue("number") + "\t");
    // Gets the child tag under the child tag
    Element c1 = e.element("age");
    // c1.getText() gets the text content in the tag.
    System.out.println(c1.getText());
    			
}

1.6.4 xpath parsing

It is troublesome when we use dom api parsing, especially when there are many nesting layers. We can use xpath expressions to parse. Using this function requires dom4j version above 2.0.

// Find that the value of the number attribute is BLB_ student sub tag under students tag of 0002
Node n = document.selectSingleNode("//students/student[@number='blb_0002']");
System.out.println(n.getName());


// selectSingleNode gets a node. If there are multiple nodes, the first one found is returned
Node selectSingleNode = n.selectSingleNode("name");
String name = selectSingleNode.getText();
System.out.println(name);


// Traverse to get all student tags under the students tag. The "/ /" in the path indicates that the hierarchy is ignored. If it is not used, it must start from the root
List<Node> nodes = document.selectNodes("//students/student");
ListIterator<Node> listIterator = nodes.listIterator(); 
// Traversal through iterators
while(listIterator.hasNext()){
    Node n = listIterator.next();
    // n.valueOf("@number") gets the value of the number attribute of the Node
    String number =  n.valueOf("@number");
    System.out.println(number);
    //Relative path name   
    Node selectSingleNode = n.selectSingleNode("name");
    // Gets the text content of the node
    String name = selectSingleNode.getText();
    System.out.println(name);
}



// It can also be filtered by the matching conditions of the Node
List<Node> nodes = document.selectNodes("//students/student");
Iterator<Node> iterator = nodes.iterator();
while(iterator.hasNext()){
    Node n = iterator.next();
    //			Filter by criteria. The value of the number property is not blb_0001
    if(n.matches("@number!='blb_0001'")){
        System.out.println(n.getName());
    }
}

1.6. 5 document creation

Above, we learned how to use programs to parse XML, so how to use programs to create and modify XML documents? To create a document, you need to create an XML object in the program and write it to the XML file on the hard disk through streaming.

// Create an XML Document object in memory.
Document document = DocumentHelper.createDocument();
// Add a root element in the document object labeled students
Element root = document.addElement("students");
// Continue to add the tag student to the tag, and add the attribute number with the value of blb_0001
Element student1 = root.addElement("student").addAttribute("number", "blb_0001");
// Add the name tag to the student tag, and the text content is six
student1.addElement("name").addText("six");
// Add the age tag to the student tag, and the text content is 18
student1.addElement("age").addText("18");
// Add a sex tag to the student tag, and the text content is boy
student1.addElement("sex").addText("boy");

// The function is the same as above.
Element student2 = root.addElement("student").addAttribute("number", "blb_0002");
student2.addElement("name").addText("seven");
student2.addElement("age").addText("29");
student2.addElement("sex").addText("girl");

// Create output format
OutputFormat prettyPrint = OutputFormat.createPrettyPrint();

// Create a file to output and write it out through the stream
FileOutputStream fs = new FileOutputStream("src/student.xml");
OutputStreamWriter osw = new OutputStreamWriter(fs,"utf-8");
XMLWriter xmlWriter = new XMLWriter(fs,prettyPrint);
xmlWriter.write(document);
fw.flush();
fw.close();

How to modify XML documents?

First read it into the program through parsing, modify the content in memory, and then write it out to the specified file through stream. Try it yourself.

Chapter 2 regular expressions

2.1 general

Regular Expression, also known as Regular Expression. (English: Regular Expression, often abbreviated as regex). Regular expressions are usually used to retrieve and replace text that conforms to a certain pattern (rule). Regular expressions are supported by PHP, Java, Python, JavaScript, etc. With regular expressions, writing code is more concise. Usually two or three lines of code can achieve the goal.

2.2 rules

1. Any character means matching any corresponding character, such as a matching a，7 Match 7,-matching-. 
2. []Represents any one of the characters in the matching brackets, such as[abc]matching a or b or c. 
3. -Inside and outside the brackets represent different meanings. If outside, they match-，If in square brackets[a-b]Indicates matching any of the 26 lowercase letters;[a-zA-Z]Match any of 52 letters in case;[0-9]Match any of the ten numbers.
4. ^The meaning inside the brackets is different from that outside. If it is outside, it means the beginning, such as^7[0-9]Indicates a string with a matching beginning of 7 and the second digit of any number; If it is in square brackets, it indicates any character other than this character(Including numbers, special characters)，as[^abc]Means match out abc Any character other than.
5. .Represents matching any character.
6. \d Represents a number.[0-9]
7. \D Indicates non numeric.
8. \w Indicates letters, numbers, underscores,[a-zA-Z0-9_]. 
9. \W Indicates that it is not composed of letters, numbers and underscores.
10.	[\u4e00-\u9fa5]Matching Chinese characters
11. ?Indicates 0 or 1 occurrences.
12. +Indicates one or more occurrences.
13. *Indicates 0, 1, or more occurrences.
14. {n}Indicates presence n Times.
15. {n,m}Indicates presence n~m Times.
16. {n,}Indicates presence n Times or n More than times.>=n

2.3 use

Understand the basic regular expression rules, and then combine Java util. regex. Pattern and Java util. regex. Matcher can verify the string.

Code demonstration

Requirements: test whether the string aabbb conforms to the regularity of a*b.

Mode 1

String s = "aaaaaaaabbb" ; 
//Define regular expressions (rules)
Pattern p = Pattern.compile("a*b");
Matcher matcher = p.matcher(s);
System.out.println(matcher.matches());// true: the tested string conforms to the defined rules

Mode 2

System.out.println(Pattern.matches("a*b", "aaaaaaaabbb"));// true: the tested string conforms to the defined rules

Mode 3

String s = "aaaaaaaabbb" ; 
s.matches("a*b");

tips:

Check whether the string conforms to the defined rules. You can use the matches and find methods of the Matcher class to determine the difference

matches takes the entire test string to verify.
find is to judge whether the substring matches.

Code demonstration

Requirements: test the API related to regularization and learn other related rules

 public static void main(String[] args) {
        System.out.println(Pattern.matches("a*b", "aaaaabb"));
		System.out.println(Pattern.matches("[ab]", "ab"));
		System.out.println(Pattern.matches("[a-zA-H]", "Z"));
		System.out.println(Pattern.matches("[^abc]", "d"));
		System.out.println(Pattern.matches("[^abc]", "d"));
		System.out.println(Pattern.matches("[a-k&&c-z]", "z"));
		System.out.println(Pattern.matches("a[a-z]c", "abc"));
		System.out.println(Pattern.matches("\\d", "10"));
		System.out.println(Pattern.matches("\\D", "a"));
		System.out.println(Pattern.matches("\\s", "\t"));
		System.out.println(Pattern.matches("\\S", "a"));
		System.out.println(Pattern.matches("\\w", "a"));//[a-zA-Z_0-9]
		System.out.println(Pattern.matches("\\W", "a"));
		System.out.println(Pattern.matches("^a\\d{4}f$", "a1234f"));
		System.out.println(Pattern.matches("ab?c", "abbc"));
		System.out.println(Pattern.matches("ab*c", "abbbbbbbc"));
		System.out.println(Pattern.matches("a\\d+c", "a12312312312c"));
		System.out.println(Pattern.matches("a\\d{3}c", "a123c"));
		System.out.println(Pattern.matches("a\\d{3,}c", "a12312312312c"));
        System.out.println(Pattern.matches("a\\d{3,5}c", "a123456c"));
    }

2.4 grouping

Sometimes we not only judge whether the string conforms to the rules, but also obtain the matching field data. For example, the string 2021-01-06 conforms to the regularity of the date (\ d{4}-\d{2}-\d{2}), but we need to obtain the year data, month data and date data of the test string. At this time, we need to use the grouping function.

The grouping function is to enclose the fields to be obtained in parentheses, and use the Matcher's group method to obtain the corresponding grouping data.

Code demonstration

   public static void main(String[] args) {
        String s = "2021-01-06";
		Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
		Matcher matcher = p.matcher(s);
//		Use the find method to match
		if(matcher.find()){
// 		matcher.group() returns the data of the whole string
			System.out.println(matcher.group());//2021-01-06
			// Grouping: get the group number by index. The index is the number from left to right(  
			System.out.println(matcher.group(0));//2021-01-06
			System.out.println(matcher.group(1));//2021
			System.out.println(matcher.group(2));//01
			System.out.println(matcher.group(3));//06
		}
    }
}

Here, the group is obtained by index. The index is the number from left to right, starting from 0. You can also give each group an alias and obtain the data of the corresponding field by name.

Code demonstration

public static void main(String[] args) {
        String s = "2021-01-06";
		Pattern p = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
		Matcher matcher = p.matcher(s);
//		Use the find method to match
		if(matcher.find()){
// 		matcher.group() returns the data of the whole string
			System.out.println(matcher.group());//2021-01-06
			// Grouping: obtain the data value of the corresponding field through the grouping alias
			System.out.println(matcher.group("year"));//2021
			System.out.println(matcher.group("month"));//01
			System.out.println(matcher.group("day"));//06
		} 
    }
}

Topics: Java

Programmer Think

XML & regular expression

Learning objectives

Chapter 1 XML

1.1 general

1.2 differences between XML and HTML

1.3 grammar rules

1.4 XML composition

1.4. 1 document declaration

1.4. 2 instruction (understand)

1.4. 3 label

1.4. 4 properties

1.4. 5 text

1.4. 6 notes

1.5 constraints

1.5. 1 Classification

1.5. 2 use of DTD

External DTD

Internal DTD

1.5.3 schema

schema import

Definition of schema

reference resources

1.6 XML parsing

1.6.1 DOM4J

1.6. 2. Acquisition of document objects

1.6. 3 common methods

1.6.4 xpath parsing

1.6. 5 document creation

Chapter 2 regular expressions

2.1 general

2.2 rules

2.3 use

2.4 grouping

Hot Topics