Parsing of XML message in Java Development Notes

Posted by Randy on Sat, 19 Feb 2022 03:51:01 +0100

preface

xml messages need to be parsed in project tasks.
So I began to learn relevant knowledge. After checking many blog posts, I found a good one, which is very practical.

Reprint source: Java Development Notes (109) definition and parsing of XML message

The following is the reproduced text, with slight deletion

text

Although json strings are short and concise and can effectively express the hierarchy, each element can only find the corresponding element value, which can not reflect richer style features. For example, in addition to transmitting its string text, an element also wants to transmit the type, font size, font color and other characteristics of the text, and these additional styles have nothing to do with business logic. Naturally, it is not suitable to set parameter fields for them separately.
If we use JSON format to define text elements including style features, we can either abandon the additional attribute of style, or list the style as a special field parameter. However, no matter which approach, it can not properly solve the expression problem of additional attributes.
It can be seen that the lightweight JSON format is still inadequate. Therefore, people invented the XML format with strong presentation ability early. The full name of XML is "Extensible Markup Language". It not only supports the description of structured data, but also supports the definition of various additional attributes, which is very suitable for transmitting information in the network.

Let's take a look at an example of a shopping order in XML message format:

<?xml version="1.0" encoding="gbk"?>
<order>
    <user_info>
    <name type="string">Thoughtless</name>
    <address type="string">123 shuilian cave, Taohua Island</address>
    <phone type="string">15960238696</phone>
    </user_info>
    <goods_list>
        <goods_item>
            <goods_name type="string">Mate30</goods_name>
            <goods_number type="int">1</goods_number>
            <goods_price type="double">8888</goods_price>
        </goods_item>
        <goods_item>
            <goods_name type="string">Gree central air conditioner</goods_name>
            <goods_number type="int">1</goods_number>
            <goods_price type="double">58000</goods_price>
        </goods_item>
        <goods_item>
            <goods_name type="string">Red Dragonfly leather shoes</goods_name>
            <goods_number type="int">3</goods_number>
            <goods_price type="double">999</goods_price>
        </goods_item>
    </goods_list>
</order>

Next, analyze the characteristics of XML format based on the above XML example. The analysis results are listed as follows:

  1. Each element is still composed of parameter name and parameter value. The parameter name is wrapped in angle brackets and is divided into tag head and tag tail. The tag tail has mu lt iple diagonal bars inside the angle brackets. In this way, the complete form of a field is < parameter name > parameter value < / parameter name >.
  2. Because each element has its own tag header and tag tail, it is easy to distinguish where to start and where to end, so there is no need for additional separators between elements. As long as there is a tag header and tag tail, it is enough to distinguish.
  3. Each structure also needs a special tag head and tag tail, and several elements or other structures are filled in the middle.
  4. For the data in the form of array, the XML message adopts multiple structure tags with the same name to list side by side, indicating that there is array information with the same name structure here, which can also be regarded as list information.
  5. XML format allows to specify the character encoding type of the current message at the encoding attribute at the beginning of the message. Common include Chinese character internal code specification GBK and world general encoding specification UTF-8.
    6. Each structure or element node also supports the filling of additional attributes in the tag header to specify specific information other than parameter values.

If you have a general understanding of the format specification of XML message, you have to parse it in the program.

Traditional XML parsing methods include DOM and SAX:

  • DOM mode will read the whole XML message, and all nodes will be automatically loaded into a tree structure, and then each node value will be read into the tree structure.
  • SAX mode does not read the whole XML message in advance, but scans from the beginning of the message according to the node name. Once the tag head position of the node is found, immediately look for the tag tail of the node, and the data between the tag head and tail of the node is the node value.

In terms of the parsing process of a node value alone, the DOM method of loading all nodes is obviously more time-consuming, and the SAX method of searching from scratch is more efficient. However, if it is required to obtain the values of multiple nodes at the same time, the DOM method of traversing the tree structure has better overall performance, and the SAX method, which starts from scratch every time, undoubtedly does repeated work. In short, the analysis results of the two methods have their own advantages and disadvantages, and the choice needs to be determined according to the actual scene.

Although JDK integrates Dom and SAX parsing tools, the DOM parsing tool is encapsulated in the package org w3c. In DOM, the SAX parsing tool is encapsulated in the package javax xml. Parsers, but they are really laborious to use, the parsing process is difficult and obscure, and they are basically not used in actual development. The third-party Dom4j is the most widely used XML parsing tool. The parsing method of Dom4j follows the DOM rules, but it is much easier to use than the DOM tool of Java, and its performance is also very excellent. It has almost become a necessary XML parsing artifact for java development. The steps of parsing XML message through Dom4j mainly include the following five steps:

  1. Create a SAXReader reader object;
  2. Converting XML message in string form into input stream object;
  3. Command the reader object to read the Document object from the input stream;
  4. Obtain the root node Element of the document object;
  5. Analyze the node value of each level from the root node down;

In the specific node resolution process, the relevant methods of Element will be called frequently. The common methods are described as follows:

  • getText: get the string value of the current node.
  • element: obtain the child node object with the specified name under the current node.
  • elementText: get the child node value of the specified name under the current node.
  • elements: obtain the list of child nodes with the specified name under the current node.
  • Attribute: get the attribute object with the specified name of the current node itself.
  • attributeValue: get the attribute value of the specified name of the current node.
  • Attributes: get the list of all attributes owned by the current node.

Code example

Still taking the XML message mentioned above as an example, the following is a code example of parsing the XML string using Dom4j:

// Parsing xml strings through dom4j
private static GoodsOrder testParserByDom4j(String xml) {
    GoodsOrder order = new GoodsOrder(); // Create a shopping order object
    // Create a SAXReader reader object
    SAXReader reader = new SAXReader();
    // Build byte array input stream from string
    try (InputStream is = new ByteArrayInputStream(xml.getBytes(CHARSET))) {
        // The command reader reads the document object from the input stream
        Document document = reader.read(is);
        // Gets the root node of the document object
        Element root = document.getRootElement();
        // Get the name user under the root node_ Info node
        Element user_info = root.element("user_info");
        // Get user_ The node value named name under Info node
        order.user_info.name = user_info.element("name").getText();
        // Get user_ The node value called address under Info node
        order.user_info.address = user_info.element("address").getText();
        // Get user_ The node value called phone under Info node
        order.user_info.phone = user_info.element("phone").getText();
        System.out.println(String.format("User information is as follows: name=%s,address=%s,cell-phone number=%s",
                order.user_info.name, order.user_info.address, order.user_info.phone));
        // Get the root node named goods_list of nodes
        List<Element> goods_list = root.element("goods_list").elements();
        for (int i=0; i<goods_list.size(); i++) { // Traverse the list of commodity nodes
            Element goods_item = goods_list.get(i);
            GoodsItem item = new GoodsItem(); // Create an item object
            // Get the current commodity item node, which is called goods_ Node value of name
            item.goods_name = goods_item.element("goods_name").getText();
            // Get the current commodity item node, which is called goods_ Node value of number
            item.goods_number = Integer.parseInt(goods_item.element("goods_number").getText());
            // Get the current commodity item node, which is called goods_ Node value of price
            item.goods_price = Double.parseDouble(goods_item.element("goods_price").getText());
            System.out.println(String.format("The first%d Item name:=%s,quantity=%d,Price=%f",
                    i+1, item.goods_name, item.goods_number, item.goods_price));
            order.goods_list.add(item); // Add the specified item object to the item list
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return order; // Return the parsed shopping order object
}

Run the above parsing code and observe the following shopping order logs. It can be seen that the parsing operation of xml string to object has been successfully realized:

The user information is as follows: name = siwuye, address = No. 123 shuilian cave, Taohua Island, mobile number = 15960238696
The first item: name = Mate30, quantity = 1, price = 8888.000000
The second item: name = Gree central air conditioner, quantity = 1, price = 58000.000000
The third item: name = Red Dragonfly leather shoes, quantity = 3, price = 999.000000

In addition to resolving the node value of each node, Dom4j can also resolve the attribute value of each node. If you want to normally resolve the attribute value of the specified name, you need to clarify the following three elements:

  • The parent node object of this attribute
  • The node name of the node where the attribute is located
  • The property name of the property.

With these three elements, you can successfully obtain the attribute value from the specified attribute of the specified node through the following methods:

// Prints the specified attribute value of the specified node name
private static void printValueAndAttr(Element parent, String node_name, String attr_name) {
    // Gets the name of the child node specified below the parent node
    Element element = parent.element(node_name);
    // Gets the node value of the child node
    String node_value = element.getText();
    String attr_value = "";
    // Get the corresponding attribute object of the child node according to the attribute name
    Attribute attr = element.attribute(attr_name);
    if (attr != null) {
        attr_value = attr.getText(); // Gets the property value of the property
    }
    // Print the details of child nodes, including node name, node value, attribute name and attribute value
    System.out.println(String.format("Node name=%s, Node value=%s, Attribute name=%s, Attribute value=%s",
            node_name, node_value, attr_name, attr_value));
}

Next, add the following line of attribute parsing code to the original XML parsing code:

//Print user_ The type attribute value of the name child node of Info node is printValueAndAttr(user_info, "name", "type");

Run the XML parsing code again, and the following line of log is observed in the output shopping order log, indicating that the type attribute value of the name node is parsed:

Node name = name, node value = siwuxie, attribute name = type, attribute value = string

epilogue

Very easy to use, reprint a copy

Topics: Java xml