XML for File Processing--DOM Approach

Posted by thirdeye on Thu, 04 Jul 2019 18:18:14 +0200

For several ways to parse XML, you can go to someone else's blog. Here we show examples of how to generate and parse XML using DOM, SAX and DOM4J.
Four ways to generate and parse XML documents (introduction + pros and cons + examples)

Generate xml as DOM:

public static void DomCreateXml() throws Exception {
        // 1. Create an object for DocumentBuilderFactory
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        // 2. Create a DocumentBuilder object
        DocumentBuilder db = dbf.newDocumentBuilder();
        // 3. Create a new Document
        Document document = db.newDocument();
        // Create Elements
        Element songs = document.createElement("songs");
        // Create child elements (use a loop to create child elements)
        for (int i = 0; i < 5; i++) {
            // Create Elements
            Element song = document.createElement("song");
            Element name = document.createElement("name");
            // Add data to name
            // If you change the data, you can either define three arrays directly or create an entity class, which they put into the set and assign values to
            name.setTextContent("In Spring");
            Element time = document.createElement("time");
            time.setTextContent("5:20");
            Element size = document.createElement("size");
            size.setTextContent("30m");
            //Elements added to the previous level
            song.appendChild(name);
            song.appendChild(time);
            song.appendChild(size);
            songs.appendChild(song);
        }
        // Add elements to the document
        document.appendChild(songs);

        // Save to project demo/directory
        // First create a TransformerFactory object
        TransformerFactory tff = TransformerFactory.newInstance();
        // Create a Transformer using tff
        Transformer tf = tff.newTransformer();
        // xml line break
        tf.setOutputProperty(OutputKeys.INDENT, "yes");
        // Associate documents and create xml files
        tf.transform(new DOMSource(document), new StreamResult("demo/dom.xml"));

    }

When using DOM to create xml, attention should be paid to the hierarchical relationship of DOM numbers, because DOM is the standard of w3c, so DOM has some universality. It is similar to the above code to operate on html in JavaScript, not to mention here.Let's see how it works again

Parse xml as DOM:

Write a code snippet herepublic static void DomParseXml() throws Exception {
        // 1. Create an object for DocumentBuilderFactory
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        // 2. Create a DocumentBuilder object
        DocumentBuilder db = dbf.newDocumentBuilder();
        // 3. Load xml file by calling parse of db
        Document parse = db.parse(new FileInputStream("demo/dom.xml"));
        // Once you get the parse object, you can parse the node and begin parsing the song node
        NodeList songs = parse.getElementsByTagName("song");
        for (int i = 0; i < songs.getLength(); i++) {
            // Locate the collection of each song with item
            Node song = songs.item(i);
            // Get the collection of attributes within the current song node from Song
            NamedNodeMap attributes = song.getAttributes();
            // Output the properties inside
            System.out.println(attributes.getLength());
            // traversal attributes
            for (int j = 0; j < attributes.getLength(); j++) {
                // One of the attributes in that song node
                Node item = attributes.item(j);
                // Print attribute values
                System.out.println(item.getNodeName() + " = " + item.getNodeValue() + "");
            }
            // Traversing child nodes
            NodeList childNodes = song.getChildNodes();
            // Output the length of the child node at once
            System.out.println(childNodes.getLength());
            // Traverse child nodes and output
            for (int k = 0; k < childNodes.getLength(); k++) {
                // Determine if it is a space
                if (childNodes.item(k).getNodeType() == Node.ELEMENT_NODE) {
                        System.out.println(childNodes.item(k).getNodeName() + " = " +   childNodes.item(k).getTextContent());
                }
            }
        }
    }

When getting the value of a child node, it is important to note that it is not the getNodeVaue() method, but the getTextContent() method. Of course, you can also use the childNodes.item(k).getFirstChild().getNodeValue() method to get the node value.

Generating XML by SAX

public static void SAXCreateXml() throws Exception {
        // Create an xml file
        StreamResult streamResult = new StreamResult(new FileOutputStream("demo/sax.xml"));
        // Settings are required after the xml file has been created, including setting up the xml content
        // Create SAXTransformerFactory object
        SAXTransformerFactory tff = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
        // Create TransformerHandler object via tff
        TransformerHandler handler = tff.newTransformerHandler();
        // Format xml document
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        // Setting handler and xml document associations
        handler.setResult(streamResult);

        // After associating, we need to add elements to the xml file 
        // Open the document first
        handler.startDocument();
        // Easy to add attributes
        AttributesImpl attr = new AttributesImpl();
        // open
        handler.startElement("", "", "songs", attr);
        for (int i = 0; i < 5; i++) {
            //You can also define arrays or collections by defining strings directly here (you need to decide if the collection is empty or not)
            String s1 = "Once Upon a Time",s2="4:30 second",s3="4 M";
            // Clear the properties first
            attr.clear();
            attr.addAttribute("", "", "id", "", "1");
            handler.startElement("", "", "song", attr);

            attr.clear();
            handler.startElement("", "", "name", attr);
            handler.characters(s1.toCharArray(), 0, s1.length());
            handler.endElement("", "", "name");

            attr.clear();
            handler.startElement("", "", "time", attr);
            handler.characters(s2.toCharArray(), 0, s2.length());
            handler.endElement("", "", "time");

            attr.clear();
            handler.startElement("", "", "size", attr);
            handler.characters(s3.toCharArray(), 0, s3.length());
            handler.endElement("", "", "size");

            handler.endElement("", "", "song");
        }

        // Close
        handler.endElement("", "", "songs");
        handler.endDocument();
    }

Here I use my own custom string, but of course I can also define arrays or collections myself. When defining a collection, I need to decide if it is empty and not report a null pointer.

Parsing XML by SAX

main function

public static ArrayList<Song> SAXParseXml() throws Exception {
        // 1,Get one SAXParserFactory object
        SAXParserFactory factory = SAXParserFactory.newInstance();
        // 2,adopt factory Establish SAXParser object
        SAXParser parser = factory.newSAXParser();
        // 3,parse Of the object parse Method Loading xml file
        // Create a tool class to inherit DefaultHandler
        SAXHandler saxHandler = new SAXHandler();
        parser.parse(new FileInputStream("demo/songs.xml"), saxHandler);
        // Output the length of the list
        System.out.println(saxHandler.getList().size());
        List<Song> list = saxHandler.getList();
        for (Song s : list) {
            System.out.println(s.getId());
            System.out.println(s.getName());
            System.out.println(s.getTime());
            System.out.println(s.getSize());
        }
        return (ArrayList<Song>) saxHandler.getList();
    }

SAXHandler class:

public class SAXHandler extends DefaultHandler {

    private List<Song> list ;
    private Song song = null;
    private String string = null;

    // Generate getList method output
    public List<Song> getList() {
        return list;
    }

    public SAXHandler() {
        this.list =new ArrayList<Song>();
    }

    @Override
    public void startDocument() throws SAXException {
        super.startDocument();
        System.out.println("Start parsing");
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        // TODO Auto-generated method stub
        super.startElement(uri, localName, qName, attributes);
        if (qName.equals("song")) {
            // Create object of entity class song
            song = new Song();
            // Traversing song
            for (int i = 0; i < attributes.getLength(); i++) {
                System.out.println(attributes.getQName(i) + " = " + attributes.getValue(i));
                System.out.println("\n==========================");

                // Save id to entity class
                if (attributes.getQName(i).equals("id")) {
                    song.setId(attributes.getValue(i));
                }

            }
        } else if (!qName.equals("song") && !qName.equals("songs")) {
            System.out.println("Node name:" + qName);
        }

    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        // TODO Auto-generated method stub
        super.characters(ch, start, length);
        // Node values must be defined as global variables or endElement will have difficulty getting values
        string = new String(ch, start, length);
        // Remove line breaks
        if (!string.trim().equals("")) {
            System.out.println("Node value:" + string + "\n");
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        // TODO Auto-generated method stub
        super.endElement(uri, localName, qName);
        if (qName.equals("song")) {
            System.out.println("=============End===============");
            // Add to Collection
            list.add(song);
            // Clear the song after adding it to the collection
            song = null;
        } else if (qName.equals("name")) {
            song.setName(string);
        } else if (qName.equals("time")) {
            song.setTime(string);
        } else if (qName.equals("size")) {
            song.setSize(string);
        }

    }

    @Override
    public void endDocument() throws SAXException {
        // TODO Auto-generated method stub
        super.endDocument();
        System.out.println("End parsing");
    }

}

Entity class Song:

package com.entity;

public class Song {

    private String id;
    private String name;
    private String time;
    private String size;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getTime() {
        return time;
    }

    public void setTime(String time) {
        this.time = time;
    }

    public String getSize() {
        return size;
    }

    public void setSize(String size) {
        this.size = size;
    }
}

Operation above: Parse xml document - > Save in collection - > Output
Of course, it is also possible to delete code about entity classes and collections without saving it to the direct output of the collection, but we usually need to save the value before taking it out, because I feel that there is some universality, so I paste an example of saving to the collection.
The SAX method is actually the fastest of the four ways to parse. Yes, it is faster in small and medium-sized xml data than dom4j. Dom4j is the best in large data. Hibernate uses Dom4j as an example.As you know from the blog at the beginning, SAX also has some drawbacks, which I won't dwell on here.

Generating XML documents in DOM4J mode

public static void DOM4JCreateXML() throws Exception {
        //1. Create a document object 
        Document document = DocumentHelper.createDocument();
        //2. Create Root Node
        Element rootElement = document.addElement("songs");
        //3. Create child element nodes
        Element song = rootElement.addElement("song");

        Element name = song.addElement("name");
        name.setText("Marry Me Today");
        Element time = song.addElement("time");
        time.setText("5:20");
        Element size = song.addElement("size");
        size.setText("10M");
        //4. Generate xml file
        XMLWriter xmlWriter = new XMLWriter(
                new FileOutputStream("demo/dom4j.xml"),
                //Format 
                OutputFormat.createPrettyPrint()
                );
        //Write to Document
        xmlWriter.write(document);
        //Close xmlWriter
        xmlWriter.close();
    }

If there are no special requirements, I think it is better to use DOM4J, less code, and beautiful xml documents.In addition, DOM4J can support a lot of complex xml documents, which I have not used before. If you know you can also tell me, it is very welcome!

Parsing XML documents in DOM4J mode

// Create SAXReader Object
        SAXReader saxReader = new SAXReader();
        // Load books.xmlFile and get document object
        Document document = saxReader.read(new InputStreamReader(new  FileInputStream("demo/dom4j.xml"), "utf-8"));
        //Get root node from document
        Element rootElement = document.getRootElement();
        //Get Iterator from Root Node
        Iterator elementIterator = rootElement.elementIterator();
        while (elementIterator.hasNext()) {
            Element song = (Element) elementIterator.next();
            //Get the property name and value of book
            List<Attribute> attributes = song.attributes();
            for (Attribute attr : attributes) {
                System.out.println(attr.getName()+" = "+attr.getValue());
            }

            //Traverse through song's child node names and values
            Iterator elementIterator2 = song.elementIterator();
            while (elementIterator2.hasNext()) {
                Element songChild = (Element) elementIterator2.next();
                System.out.println("Node name:"+songChild.getName()+"  "+songChild.getStringValue());
            }
        }
    }

This paper introduces three ways to parse XML documents. If JDOM is used, I do not think it is necessary to use it. I can use DOM4J instead. In addition, DOM can not handle large files very well. So I recommend using SAX and DOM4J to parse and generate XML documents.Most importantly, it is almost ignored that DOM and SAX are built into JAVA and do not need to import additional JAR packages, while both JDOM and DOM4 need to import additional JAR packages.

Topics: xml Attribute Spring Javascript