[advanced features of Java] java learning journey 38 XML parsing method

Posted by Rakim on Wed, 09 Mar 2022 17:23:15 +0100

Definition: Extensible Markup Language

characteristic:

  • XML has nothing to do with programming languages
  • It can realize the data conversion between systems realized by different programming languages

Purpose:

  1. Data exchange
  2. Implementation project profile

Comparison with json:

  1. json is more lightweight than xml
  2. xml is more readable and structured than json

matters needing attention:

  1. xml uses double tags
  2. xml tag names are case sensitive
  3. It is recommended to use letters plus numbers for label names. Special characters and spaces are not allowed
  4. < > 'is not recommended for attribute nodes&
  5. It is recommended that peer labels be indented and aligned

Resolution method:

  1. DOM parsing method: Based on the DOM number, all elements in the document are parsed into node objects according to their hierarchical relationship
  • A bit: build the attribute structure of XML file in memory, and you can traverse and modify nodes.
  • Disadvantages: if the file is large and the memory is under pressure, the parsing time will be long
  • Applicable: modify XML data
  1. SAX parsing method: event driven, scanning documents line by line and parsing while scanning. Equivalent to DOM, SAX can stop parsing at any time when parsing a document. It is a faster and more efficient method.
  • Advantages: parsing can start immediately, with high speed and no memory pressure
  • Disadvantages: nodes cannot be modified
  • Applicable: read XML file
  1. Dom4j parsing method has a more complex api, so Dom4j has greater flexibility than jdom. Dom4j has the best performance. Even Sun's JAXM is also using Dom4j. At present, Dom4j is widely used in many open source projects. For example, the famous Hibernate also uses Dom4j to read XML configuration files. If portability is not considered, use Dom4j
  • Advantages: high flexibility, ease of use, powerful function and excellent performance
  • Disadvantages: complex api, poor portability
  • Applicable: optional
  1. JDOM parsing method: JDOM is a pure java api for processing XML. Using concrete classes instead of interfaces, JDOM has tree traversal and SAX Java rules. JDOM and DOM are different in two aspects. First, JDOM uses only concrete classes instead of, which simplifies the API in some ways, but also limits flexibility. Second, the API makes extensive use of the Collections class, simplifying the use of Java developers who are already familiar with these classes. JDOM itself does not contain a parser. It usually uses the SAX2 parser to parse and validate the input XML document (although it can also take the previously constructed DOM representation as input). It contains converters that have output JDOM representations into SAX2 event streams, DOM models, or XML text documents. JDOM is an open source release under the Apache license variant
  • Advantages: Based on DOM tree, it is simpler and faster than dom
  • Disadvantages: if the file is large and the memory is under pressure, the corresponding traversal package in DOM is not supported
  • Applicable: optional

use:
Analysis: books xml

<?xml version="1.0" encoding="UTF-8"?> 
<!-- xml Declaration of documents: xml Document compliance xml1.0 Version format protocol, text coding utf-8 -->
<books> <!-- Root node: there can only be one -->
	<book id="1"> <!-- book yes`Element node`,id yes`Attribute node` -->
		<name>java From getting started to giving up</name> <!-- name Is the element node, and the text content is`Text node` -->
		<author>Wang Ziyu</author>
		<page>10</page>
	</book>
	<book id="2">
		<name>mysql From deleting the library to running away</name>
		<author>Zhang San</author>
		<page>20</page>
	</book>
</books>

Entity class:

public class Book {
	
	private Long id;
	
	private String name;
	
	private String author;
	
	private Integer page;
	}
  1. DOM parsing mode (read operation)
public static void main(String[] args){
    ArrayList<Book> bookEmptyList = new ArrayList<>();
    Book bookEmpty = null;
    
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    try{
        //Parse the xml file object by creating a documentBuilder object
        DocumentBuilder db = dbf.newDocumentBuilder();
        //After parsing, a (w3c)document object will be returned
        Document document = db.parse("books.xml");
        //Get all the book tag objects in the xml file objects and return them in the form of node collection
        NodeList bookList = document.getElementsByTagName("book");
        System.out.println("Altogether" + bookList.getLength() + "This book.");
        //Traverse the book tag node collection object
        for (int i = 0; i < bookList.getLength(); i ++){
            System.out.println("Start traversing page" + (i + 1) + "Content of this book");
            //Build a book entity class to store the parsed information
            bookEmpty = new Book();
            //Start traversing the elements in the collection array object to obtain the node object in the node collection
            Node book = bookList.item(i);
            //Get all the attribute value objects in the current node object and store them in the map collection object
            NamedNodeMap attrs = book.getArrtibutes();
            //Traverse all attribute values in the current node
            System.out.println("The first" + (i + 1) + "This book has" + attrs.getLength() + "Attributes");
            for(int j = 0; j < attrs.getLength(); j++) {
                //Gets the ith attribute node object in the attribute collection object
                Node attr = attrs.item(j);
                //Name of the output node
                System.out.println("Attribute name:" + attr.getNodeName());
                //Output node value
                System.out.println("Attribute value:" + attr.getNodeValue());
            }
            
            //Get all the child node objects of the current book object, and return them in the form of a node collection
            NodeList childNode = book.getChildNodes();
            //Traverse all child node objects of the book node
            System.out.println("The first" + (i + 1) + "In this book" + childNode.getLength() + "Child node");
            //There are 9 child nodes displayed here, but you can see that there are only four child nodes in the xml document, because the system will automatically put the book tag on the child node, and the space + line feed between the child node and the child node will be regarded as a node to traverse and output the child nodes in the book tag
            for (int k = 0; k < childNode.getLength(); k++) {
                //Output the name of the K-th child node in the child node set
                //Only the element nodes containing label objects are output, and the blank text object nodes (#text) are not output
                System.out.println("The first" + (k + 1) + "Node names of nodes" + childNode.item(k).getNodeName());
                //Gets the text content of the child node object
                System.out.println("The value of the node is" + childNode.item(k).getTextContent());
            }
        }
        
    }
}

(write operation)

public static void main(String[] args) throws Execption {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document document = db.newDocument();
    
    //Create books root node
    Element books = document.createElement("books");
    //Create a book node
    Element book1 = document.createElement("book");
    //Set the attribute name and attribute value for the book node
    book1.setArrribute("id", "1");
    //Create name node
    Element name1 = document.createElement("name");
    //Create node author
    Elemtent author1 = document.createElement("author");
    //Create page node
    Element page1 = document.createElement("page");
    
    //Set the name author page node as a child of the book node
    book1.appendChild(name1);
    book1.appendChild(author1);
    book1.appendChild(page1);
    
    //Set the text node of name author page
    name1.setTextContent("java From getting started to giving up");
    author1.setTextContent("Wang Ziyu");
    page1.setTextContent("10");
    
    //Make the book node a child of the book1 node
    books.appendChild(book1);
    
    Element book2 = document.createElement("book");
    book2.setAttribute("id", "2");
    Element name2 = document.createElement("name");
    Element author2 = document.createElement("author");
    Element page2 = document.createElement("page");
    book2.appendChild(name2);
    book2.appendChild(author2);
    book2.appendChild(page2);
    name2.setTextContent("mysql From deleting the library to running away");
    author2.setTextContent("Han Wenlong");
    page2.setTextContent("20");
    books.appendChild(book2);
    
    //Add the number of nodes to the dom books
    document.appendChild(books);
    
    TransformerFactory tff = TransformerFactory.newInstance();
    Transformer tf = tff.newTransformer();
    
    //Set wrap
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    //Set indent
    tf.setOutputProperty(
            "http://xml.apache.org/xslt}indent-amount" , "2");
            
    tf.transform(new DOMSource(document), new StreamResult(new File("book_dom.xml")));
}
  1. SAX parsing:
public static void main(String[] args) throws Exception {
    //Get SAX parse factory object
    SAXParserFactory factory = SAXParserFactory.newInstance();
    //Call the newSAXParser method of the factory to get the SAXParser parser object
    SAXParser parser = factory.newSAXParser();
    //Get the myhandler object to get the resource corresponding to the XML file
    MyHander dh = new MyHander();
    //Use the SAXParser parser object to call the parse method to parse the XML file
    parser.parse("NewFile.xml", dh);
    //Use the get method to get the Book type List collection in myhandler
    List<Book> bookList = dh.getBooks();
    //Printout
    for(Book book : bookList) {
        System.out.println(book);
    }
}
//Create a new myhandler. This class needs to inherit defaulthandler, which implements the contenthandler interface,
//Here, we use the method of inheriting defaulthandler. This class is the core of SAX parsing, and several methods need to be rewritten, as follows:
//1. startDocument(): called at the beginning of document parsing. This method will only be called once
//2. startElements(String uri, String localName, String qName, Attributes attrubutes)
//Called at the beginning of node parsing
//url: namespace of the xml document
//localName: the name of the node
//qName: name of node with namespace
//attributes: the attribute set of the node

//3. characters(char[] ch, int start, int length): called when parsing the contents of the tag
//ch: byte array of textnode currently read
//Start: the position where the byte starts. If it is 0, it means to start reading from 0 and read all
//Length: the length of the current TextNode

//4. endElements(String uri, String localName, String qName): after the node is parsed, it will be called.

//5. endDocument(): when the document is parsed, it will be called once.

public class MyHander extends DefaultHandler {
    //Initialize a List array
    private List<Book> books =  new ArrayList<>();
    //Used to record the current book
    private Book book = null;
    //Provide a method of getBooks() to collect books from the List of the external Lake area
    public List<Book> getBooks() {
        return books;
    }
    
    boolean bName = false;
    boolean bAuthor = false;
    boolean bPage = false;
    
    //Node start parsing
    @Override
    public void startElenment(String url, String localName, String qName, Attributes attributes) throws Exception {
        super.startElement(url, localName, qName, attributes);
        if(qName.equals("book")) {
            String id = attributes.getValue("id");
            book = new Book();
            book.setId(Long.parseLong(id));
        }else if (qName.equals("name")) {
            bName = true;
        }else if (qName.equals("author")) {
            bAuthor = true;
        } else if (qName.equals("page")) {
            bPage = true;
        }
    }
    
    //When node resolution starts
    @Override
    public void characters(char[] ch, int start, int length) throws Exception {
        super.character(ch, start, length);
        if(bName) {
            book.setName(new String(ch, start, length));
            bName = false;
        } else if (bAuthor) {
            book.setAuthor(new String(ch, start, length));
        } else if (bPage) {
            book.setPage(Integer.parseInt(new String(ch, start, length)));
            bPage = false;
        }
    }
    
    //Called at the end
    @Override
    public void endElement(String url, String localName, String qName) thorws Exception {
        super.endElement(url, localName, qName);
        if (qName.equals("book")) {
            books.add(book);
        }
    }
}
  1. Parsing XML using dom4j (read operation) requires importing the jar package of dom4j
public static void main(String[] args) {
    ArrayList<Book> books = new ArrayList<>();
    
    SAXReader read = new SAXReader();
    //Get document file
    Document document = null;
    try{
        document = read.read(new File("books.xml"));
    } catch (DocumentExceprion e) {
        System.out.pirntln("File not found");
    }
    //Get root node
    Element root = document.getRootElement();
    //iterator 
    Iterator rootIt = root.elementIterator();
    while (rootIt.hasNext()) {
        Book book = new Book();
        Element element = (Element) rootIt.next();
        //Get attribute element
        List<Attribute> attrubytes = element.attributes();
        for (Attribute attribute : attributes) {
            if (attribute.getName().equals("id")) {
                book.setId(Long.valueOf(attribute.getValue()));
            }
        }
        //iterator 
        Iterator child = element.elementIterator();
        while(child.hasNext()) {
            Element child Element = (Element) child.next();
            //Get element name
            String name1 = childElement.getName();
            //Gets the value of the element
            String value = childElement.getStringValue();
            switch (name1) {
                case "name":
                    book.setName(value);
                    break;
                case "author":
                    book.setAuthor(value);
                    break;
                case "page":
                    book.getPage(Integer.valueOf(value));
                    break;
            }
        }
        books.add(book);
    }
    books,forEach(System.out::println);
}

(write operation)

public staic void main(String[] args) {
    String path = "books_Dom4j.xml"
    Document doc = DocumentHelper.createDocumemt();
    
    //Create root object
    Element root  = doc.addElement("books");
    
    //Create a child object
    Element book1 = root.addElement("book");
    
    //Add attribute values to first level child elements
    book1.setAttribute("id", "1");
    
    //Add secondary child elements to primary child elements
    Element name1 = book1.addElement("name");
    nam1.setText("Cover the sky");
    Element author1 = book1.addElement("author");
    author1.setText("Chen Dong");
    Element page1 = book1.addElement("page");
    page1.setText("123");
    
    Element name2 = book2.addElement("name");
    name2.setText("Manghuang period");
    Element author2 = book2.addElement("author");
    author.setText("I eat tomatoes");
    Element page2 = book2.addElement("page");
    page2.setText("567");
    
    //6. Set the output stream to generate an xml file
    OutputStream os = null;
    try {
        os = new FileOutputStream(path);
    } catch (FileNotFoundException e) {
        System.out.println(e.getMessage());
    }
    
    //Set output format
    OutputFormat format = OutputFormat.createPrettyPrint();
    //Set xml encoding
    format.setEncoding("utf-8");
    //Write: two parameters. First, where is the xml file of the output stream. Second, it means setting the format of xml
    XMLWriter xw = null;
    try{
        xw = null;
    } catch (Exception e) {
        e.printStackTrace();
    }
    
    try{
        xw = new XMLWriter(os, format);
    } catch (UnsupportedEncodingExceprion e) {
        e.printStackTrace();
    }
    //Encapsulate the combined xml into the created document object and write out the xml file that is really saved
    try {
        xw.write(doc);
    } catch (IOException e) {
        e.printStackTrace();
    }
    
    //wipe cache 
    try {
        xw.flush();
        xw.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
  1. Parsing XML using JDOM (requires JDOM jar package)
public void readXML() throws Excprion {
    //1. Create a parser SAXBuilder
    SAXBuilder builder = new SAXBuilder();
    //2. Create a file input stream
    FileInputStream fis = new FileInputStream("books_jdom.xml");
    //3. Load the stream into the parser
    org.jdom2.Document doc = builder.build(fis);
    //4. Get the root node of the document
    Element book = doc.getRootElement();
    //5. Traverse the root node
    getAllElement(book);
}

private void getAllElement(Element node) {
    List<Attribute> attributes = node.getAttributes();
    System.out.pirntln("Node:" + node.getName() + "content:" + node.getText().trim());
    if(node.hasArributes()) {
        for (Attrubute attr : attrubutes) {
            System.out.pirntln("attribute" + attr.getName()  + "Value:" + attr.getValue());
        }
    }
    
    List<Element> children = node.getChildren();
    for (Element element : children) {
        getAllElement(element);
    }
}

Write operation:

public void writeXML() throws IOException {
    // Create a root node
    Element books = new Element("books");
    Document doc = new Document(books);
    //Create the first child node under the root node
    Element book1 = new Element("book");
    book1.setAttrubute(new Attribute("id", "001"));
    //Create the first child node under the first child node
    Element name1 = new Element("name");
    name1.setText("java From giving up to getting started");
    Element author1 = new Element("author");
    author1.setText("Wang Ziyu");
    //Create a third child node under the first child node
    Element page1 = new Element("page");
    page1.setText("35");
    
    //Create under the second child node
    Element book2 = new Element("book");
    book2.setAttribute(new Attribute("id", "002"));
    //Create the first child node under the second child node
    Element name2 = new Element("name");
    name2.setText("mysql From deleting the library to running away");
    //Create a second child node under the second child node
    Element author2 = new Element("author");
    author2.setText("Han Wenlong");
    //Create a third child node under the second child node
    Element page2 = new Element("page");
    page2.setText("53");
    
    books.addContent(book1);
    books.addContent(book2);
    
    book1.addContent(name1);
    book1.addContent(author1);
    book1.addContent(page1);
    
    book2.addContent(name2);
    book2.addContent(author2);
    book2.addContent(page2);
    
    //format
    Format format = Format.getCompactFormat();
    format.setEncoding("utf-8");
    //Ensure the format of xml after output
    format.setIndent("  ");
    XMLOutputter out = new XMLOutputter(format);
    ByteArrayOutputStream byteRsp = new ByteArrayOutputStream();
    out.output(doc, byteRsp);
    String str = byteRsp.toString("utf-8");
    System.out.println(str);
    
    out.output(doc, new FileOutputStream("books_jdom.xml"));
    
}

Topics: Java