Chapter 19 using% XML TextReader
%XML. The textreader class provides a simple and easy way to read any XML document that may or may not be mapped directly to the InterSystems IRIS object. Specifically, this class provides a way to navigate a well formed XML document and view the information in it (elements, attributes, comments, namespace URI s, etc.). This class also provides complete document validation based on DTD or XML schema. However, with% XML Unlike the reader,% XML Textreader does not provide a method to return dom. If you need DOM, see the previous chapter "importing XML into objects".
Note: the XML declaration of any XML document used should indicate the character encoding of the document, and the document should be encoded in the manner of declaration. If no character encoding is declared, InterSystems IRIS uses the default values described in the previous "character encoding for input and output". If these defaults are incorrect, modify the XML declaration to specify the actual character set used.
Create a Text Reader text reader method
To read any XML document that does not necessarily have any relationship to the IRIS object class, you can call% XML Method of the textreader class, which will open the document and load it into temporary storage as a text reader object. The text reader object contains a navigable tree of nodes, each containing information about the source document. The method can then navigate the document and find information about it. The properties of the object provide information about the document, depending on the current location in the document. If there are validation errors, they can also be used as nodes in the tree.
Overall structure
The method shall perform some or all of the following operations:
- Specify the document source through the first parameter of one of the following methods:
Method | First Argument |
---|---|
ParseFile() | File name with full path. Note that file names and paths can only contain ASCII characters. |
ParseStream() | flow |
ParseString() | character string |
ParseURL() | URL |
In any case, the source document must be a well formed XML document; That is, it must follow the basic rules of XML syntax. Each of these methods returns a status ($OK or failure code) indicating whether the result was successful. Common mechanisms can be used to test the status; In particular, you can use $system Status. Displayerror (status) view the text of the error message.
For each of these methods, if the method returns $OK, it returns a text reader object containing the information in the XML document by reference (its second parameter).
Other parameters allow you to control entity resolution, validation, which items are found, and so on. These contents will be described in "parameter list of parsing method" later in this chapter.
- Check the status returned by the parsing method and exit if appropriate.
If the parsing method returns $OK, there is a text reader object corresponding to the source XML document. You can navigate this object.
The document may contain nodes such as "element", "endelement", "startprefixmapping", etc.
Important: in case of any validation error, the document contains an error or warning node.
The code should check these nodes.
- Start reading the document using one of the following example methods.
- Use Read() to navigate to the first node of the document.
- Use ReadStartElement() to navigate to the first element of a specific type.
- Use MoveToContent() to navigate to the first node of type "chars".
- Gets the value, if any, of the attribute of interest to the node. Available properties include name, value, depth, and so on.
- Continue to navigate through the document and get property values as needed.
If the current node is an element, you can use the MoveToAttributeIndex() or MoveToAttributeName() methods to move the focus to the attribute of the element. To return to the element (if applicable), use MoveToElement().
- If necessary, you can use the Rewind() method to return to the beginning of the document (before the first node). This is the only way to go backwards in the source code.
After the method runs, the text reader object will be destroyed and all related temporary storage will be cleared.
Example 1
The following is a simple method that can read any XML file and display the serial number, type, name and value of each node:
/// w ##class(PHA.TEST.Xml).WriteNodes("E:\temp\textReader.txt") ClassMethod WriteNodes(myfile As %String) { set status = ##class(%XML.TextReader).ParseFile(myfile,.textreader) //Check status if $$$ISERR(status) {do $System.Status.DisplayError(status) quit} //Traverse the document node by node while textreader.Read() { Write !, "Node ", textreader.seq, " is a(n) " Write textreader.NodeType," " If textreader.Name'="" { Write "named: ", textreader.Name } Else { Write "and has no name" } Write !, " path: ",textreader.Path If textreader.Value'="" { Write !, " value: ", textreader.Value } } q "" }
This example does the following:
- It calls the ParseFile() class method. This will read the source file, create a text reader object, and return the object in the variable doc by reference.
- If ParseFile() is successful, the method then calls the read() method to find every subsequent node in the document.
- For each node, the method writes an output line containing the node serial number, node type, node name (if any), node path, and node value (if any). The output will be written to the current device.
The following sample source documents:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/css" href="mystyles.css"?> <Root> <s01:Person xmlns:s01="http://www.root.org"> <Name attr="yx">yaoxin</Name> <DOB>1990-04-25</DOB> </s01:Person> </Root>
For this source document, the previous method produces the following output:
DHC-APP>w ##class(PHA.TEST.Xml).WriteNodes("E:\temp\textReader.txt") Node 1 is a(n) processinginstruction named: xml-stylesheet path: value: type="text/css" href="mystyles.css" Node 2 is a(n) element named: Root path: /Root Node 3 is a(n) startprefixmapping named: s01 path: /Root value: s01 http://www.root.org Node 4 is a(n) element named: s01:Person path: /Root/s01:Person Node 5 is a(n) element named: Name path: /Root/s01:Person/Name Node 6 is a(n) chars and has no name path: /Root/s01:Person/Name value: yaoxin Node 7 is a(n) endelement named: Name path: /Root/s01:Person/Name Node 8 is a(n) element named: DOB path: /Root/s01:Person/DOB Node 9 is a(n) chars and has no name path: /Root/s01:Person/DOB value: 1990-04-25 Node 10 is a(n) endelement named: DOB path: /Root/s01:Person/DOB Node 11 is a(n) endelement named: s01:Person path: /Root/s01:Person Node 12 is a(n) endprefixmapping named: s01 path: /Root value: s01 Node 13 is a(n) endelement named: Root path: /Root
Note that the comment has been ignored; By default,% XML Textreader ignores comments.
Example 2
The following example reads an XML file and lists each element in it
/// w ##class(PHA.TEST.Xml).ShowElements("E:\temp\textReader.txt") ClassMethod ShowElements(myfile As %String) { set status = ##class(%XML.TextReader).ParseFile(myfile,.textreader) if $$$ISERR(status) {do $System.Status.DisplayError(status) quit} while textreader.Read() { if (textreader.NodeType = "element") { write textreader.Name,! } } q "" }
This method uses the NodeType property to check the type of each node. If the node is an element, the method prints its name to the current device. For the XML source document shown earlier, this method generates the following output:
DHC-APP>w ##class(PHA.TEST.Xml).ShowElements("E:\temp\textReader.txt") Root s01:Person Name DOB
Node type
Each node of the document is one of the following types:
Node types in text reader documents
Type | Description |
---|---|
"attribute" | XML attributes. |
"chars" | A set of characters, such as the contents of an element.% n XML. The textreader class recognizes other node types (CDATA, EntityReference, and EndEntity), but automatically converts them to characters. |
"comment" | XML comments. |
"element" | The beginning of the XML element. |
"endelement" | The end of the XML element. |
"endprefixmapping" | Declares the end of the context for the namespace. |
"entity" | XML entity. |
"error" | Validation errors found by the parser. |
"ignorablewhitespace" | White space between tags in the mixed content model. |
"processinginstruction" | XML processing instructions. |
"startprefixmapping" | XML namespace declaration, which may or may not include namespaces. |
"warning" | The parser found validation warnings. |
Note that the XML element consists of multiple nodes. For example, the following XML fragment:
<Person> <Name>Willeke,Clint B.</Name> <DOB>1925-10-01</DOB> </Person>
The SAX parser treats this XML as a set of nodes:
Document node example
Node Number | Type of Node | Name of Node, If Any | Value of Node, If Any |
---|---|---|---|
1 | element | Person | |
2 | element | Name | |
3 | chars | Willeke,Clint B. | |
4 | endelement | Name | |
5 | element | DOB | |
6 | chars | 1925-10-01 | |
7 | endelement | DOB | |
8 | endelement | Person |
For example, note that the < DOB > element is considered to be three nodes: an element node, a character node, and an end element node. Also note that the content of this element can only be used as the value of the chars node.