TechAlpine – The Technology world

How to Use XPath to Extract Information?


XML documents often play an important role in the interchange of data over the Internet. These are used for tons of applications like RSS, Atom, and XHTML, and can be used in office programs to add and modify information. Web developers often work with XML when they are engaged in various software and app development projects, and database management tasks. In information technology, data is king, and XML documents provide a way to work with it.

So, how do you access all of that data so you can do useful stuff with it? XPath is one way to work with it. XPath was defined by the World Wide Web Consortium and is a query language that is used to select nodes from an XML document. XPath can also be used to compute values, such as numbers, strings, or Boolean values from the data that is in an XML document.

Nodes are a way to organize information in the document and are useful for navigating and extracting information from databases. When you use XPath, you will find the nodes of use. In fact, XPath is known as an alternative method to using ESPL, which is used to enter expressions into the property fields of specific built-in nodes.

The actual process of working with the data and extracting information is spread over several different commands. Here are some of those commands, and a brief summary of how they work.

The evaluateXPath() Method

First, we’ll start with the evaluateXPath() method. This method is found in the Java user-defined node API and it works with XPath 1.0. You can call evaluateXPath() on a MbMessage object if you are working with absolute paths, or you can call it on an MbElement object for any relative paths.

Keep in mind that the XPath expression is passed to the method as a string. There is a second form of the method that can take an MbXPath object, which includes an Xpath expression as well as variable bindings and namespace mappings, if required.

The evaluateXPath() method returns an object that can be one of four types, which depends on the expression return type. You can get a number type (java.lang.Double), a string type (java.lang.String), a Boolean type (java.lang.Boolean), or an element representing the XPath node set (java.util.List).

You can also refer to variables that have been assigned before the expressions have been evaluated with the MbXPath class and its methods. With the MbXPath, you can assign and remove variable bindings from the user Java code with values like string, Boolean, number, and node set.



Working with XML Messages

Namespaces in XML messages can be referred to by mapping from an abbreviated namespace prefix to a full namespace URI. The namespace prefix is usually used for representing the namespace, but it only works in the document that defines the mapping. It is the namespace URI that defines the global meaning.

The namespace prefix is also not a meaningful concept when it comes to documents that are created in a message flow, as a namespace URI can be assigned to a syntax element without defining an XMLNS mapping.

The XMLNSC and MRM parsers only expose the namespace URI to the broker and to ESQL or user-defined code. When you use ESQL, you can have your own mappings to create abbreviations to these URIs, which can be pretty long and cumbersome. The mappings are not actually related to the prefixes that are defined in the XML document.

When you use the XPath processor, you can map the namespace abbreviations on to the URIs, which are expanded when it comes to evaluation. The MbXPath class contains the methods you need to assign and remove these namespace mappings.

Modifying the Message with XPath Extensions

XPath includes several extra functions that can be used to modify the message tree with. Let’s look at a few examples.

The first function we can start with is: set-value(). This function can set the string value of the context node to a value that is specified in the argument. Within the parenthesis, you can provide any valid expression, which is converted to a string as if the call to string function was used.

Another useful function includes: set-local-name(). This function sets the local part of the expanded name of the context node to the value that is specified in the function’s argument. In this parenthesis, you can provide any valid expression that is then converted to a string, as if the call to string function is used.

Finally, we have the function: set-namespace-uri(). With this function, you can set the namespace URI part of the expanded name of the context node to the value that is specified in the argument. Within the parenthesis, you can include a valid expression, which like the other two functions, is converted to a string as if the call to string function is used.

Building Syntax Element Trees

If you want to allow for syntax element trees to be built, as well as modified, you can use one of 14 axes that are available, as defined in the XPath 1.0 specification. The axis looks like this: select-or-create::name.

If name is @name, then an attribute is created or selected. This selects the child nodes that are matching the specified name, or it creates new nodes.

When you create new nodes, you can expect to follow a specific set of rules. ?name selects children called name, if they exist. If a child called name does not exist,?name creates it as the last child, and then selects it.

?$name creates name as the last child, and then selects it. ?^name creates name as the first child, and then selects it. ?<name creates name as the previous sibling, then selects it. ?>name creates name as the next sibling, and then selects it.

Jump In

So, now you’ve seen a variety of methods you can use to request and manipulate information, as well as some functions to modify your syntax element trees. All of these methods and functions are part of XPath, and can be used to extract information out of your XML documents.

Keep in mind that XPath is designed to work with XML documents, but these commands can also be used with other tree structures in order to query contents. This is one way to work with data, and if it works for you, you’ll have yourself a new set of tools for working with information on various projects online.

The best thing to do is jump right in and start working with the commands. Programming works best as a hands-on activity, and the more you use these methods and functions, the more they will make sense in the given context. Good luck and happy programming!

Author Bio: Carolyn Clarke is an experienced SEO consultant and freelance writer from Amherst, NH. As an expert in Internet marketing, she helps clients all over the world achieve greater visibility in the market, including a prominent Los Angeles SEO firm. Carolyn loves spending time outdoors and enjoys hiking along the Appalachian Trail.

Tagged on:

Leave a Reply

Your email address will not be published. Required fields are marked *

9 + 5 =

TechAlpine Books