java xpath使用

技术2024-03-22 79

先决条件和示例

在本文中，我假设您熟悉Brett McLaughlin的“从Java™平台评估XPath”中描述的技术细节。如果你不知道如何使用XPath运行Java程序，请参阅Brett的文章（请参阅相关主题的文章链接。）同样是真实的加载一个XML文件，并评估XPath表达式所需的API 。

所有示例都将使用以下XML文件：

清单1.示例XML

<?xml version="1.0" encoding="UTF-8"?> <books:booklist xmlns:books="http://univNaSpResolver/booklist" xmlns="http://univNaSpResolver/book" xmlns:fiction="http://univNaSpResolver/fictionbook"> <science:book xmlns:science="http://univNaSpResolver/sciencebook"> <title>Learning XPath</title> <author>Michael Schmidt</author> </science:book> <fiction:book> <title>Faust I</title> <author>Johann Wolfgang von Goethe</author> </fiction:book> <fiction:book> <title>Faust II</title> <author>Johann Wolfgang von Goethe</author> </fiction:book> </books:booklist>

这个XML示例在根元素中声明了三个命名空间，在结构中更深的元素中声明了一个命名空间。您将看到此设置导致的差异。

常用缩略语

API：应用程序编程接口 DOM：文档对象模型 URI：通用资源标识符 XHTML：可扩展超文本标记语言 XML：可扩展标记语言 XSD：XML架构定义 XSLT：可扩展样式表语言转换

关于此XML示例的第二件有趣的事情是，元素booklist具有三个子级，均名为book 。但是第一个子级具有命名空间science ，而第二个子级具有命名空间fiction 。这意味着这些元素与XPath完全不同。您将在下面的示例中看到结果。

关于示例源代码的一点说明：该代码不是为维护而优化的，而是为了提高可读性。这意味着它具有一些冗余。通过System.out.println()以最简单的方式产生输出。与输出有关的所有代码行均在本文中缩写为“ ...”。另外，我不在本文中介绍辅助方法，但它们包含在下载文件中（请参阅下载）。

理论背景

命名空间的含义是什么，为什么要关心它们？名称空间是元素或属性的标识符的一部分。您可以具有具有相同本地名称但具有不同名称空间的元素或属性。他们是完全不同的。请参见上面的示例（ science:book和fiction:book ）。如果合并来自不同来源的XML文件，则需要命名空间来解决命名冲突。以一个XSLT文件为例。它由XSLT命名空间的元素，您自己的命名空间的元素和XHTML命名空间的元素（通常）组成。使用名称空间，可以避免有关具有相同本地名称的元素的歧义。

名称空间由URI定义（在本示例中为http://univNaSpResolver/booklist ）。为了避免使用此长字符串，请定义与此URI关联的前缀（在示例中为books ）。请记住，前缀就像一个变量：其名称无关紧要。如果两个前缀引用相同的URI，则带前缀的元素的名称空间将相同（有关此示例，请参见清单5中的示例1）。

XPath表达式使用前缀（例如， books:booklist/science:book ），并且您必须提供与每个前缀关联的URI。这就是NamespaceContext进入的地方。它正是这样做的。

本文介绍了在前缀和URI之间提供映射的不同方法。

在XML文件中，映射由xmlns属性提供，例如： xmlns:books="http://univNaSpResolver/booklist"或xmlns="http://univNaSpResolver/book" （默认名称空间）。

提供名称空间解析的必要性

如果您具有使用名称空间的XML，那么如果不提供NamespaceContext，则XPath表达式将失败。清单2中的示例0显示了这种情况。 XPath对象是在加载的XML文档上构造和评估的。首先，尝试写一个表达式没有任何名称空间前缀（ result1 ）。在第二部分中，使用名称空间前缀（ result2 ）编写表达式。

清单2.没有名称空间解析的示例0

private static void example0(Document example) throws XPathExpressionException, TransformerException { sysout("\n*** Zero example - no namespaces provided ***"); XPath xPath = XPathFactory.newInstance().newXPath(); ... NodeList result1 = (NodeList) xPath.evaluate("booklist/book", example, XPathConstants.NODESET); ... NodeList result2 = (NodeList) xPath.evaluate( "books:booklist/science:book", example, XPathConstants.NODESET); ... }

这将导致以下输出。

清单3.示例0的输出

*** Zero example - no namespaces provided *** First try asking without namespace prefix: --> booklist/book Result is of length 0 Then try asking with namespace prefix: --> books:booklist/science:book Result is of length 0 The expression does not work in both cases.

在这两种情况下，XPath评估均不返回任何节点，也不例外。 XPath找不到节点，因为缺少前缀到URI的映射。

硬编码的名称空间解析

可以将名称空间作为硬编码值提供，看起来像清单4中的类：

清单4.硬编码的名称空间解析

public class HardcodedNamespaceResolver implements NamespaceContext { /** * This method returns the uri for all prefixes needed. Wherever possible * it uses XMLConstants. * * @param prefix * @return uri */ public String getNamespaceURI(String prefix) { if (prefix == null) { throw new IllegalArgumentException("No prefix provided!"); } else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) { return "http://univNaSpResolver/book"; } else if (prefix.equals("books")) { return "http://univNaSpResolver/booklist"; } else if (prefix.equals("fiction")) { return "http://univNaSpResolver/fictionbook"; } else if (prefix.equals("technical")) { return "http://univNaSpResolver/sciencebook"; } else { return XMLConstants.NULL_NS_URI; } } public String getPrefix(String namespaceURI) { // Not needed in this context. return null; } public Iterator getPrefixes(String namespaceURI) { // Not needed in this context. return null; } }

请注意，名称空间http://univNaSpResolver/sciencebook绑定到前缀technical （而不是以前的science ）。您将在下面的示例中看到后果（清单6）。在清单5中，使用此解析器的代码使用新的前缀。

清单5.具有硬编码的名称空间解析的示例1

private static void example1(Document example) throws XPathExpressionException, TransformerException { sysout("\n*** First example - namespacelookup hardcoded ***"); XPath xPath = XPathFactory.newInstance().newXPath(); xPath.setNamespaceContext(new HardcodedNamespaceResolver()); ... NodeList result1 = (NodeList) xPath.evaluate( "books:booklist/technical:book", example, XPathConstants.NODESET); ... NodeList result2 = (NodeList) xPath.evaluate( "books:booklist/fiction:book", example, XPathConstants.NODESET); ... String result = xPath.evaluate("books:booklist/technical:book/:author", example); ... }

这是此示例的输出。

清单6.示例1的输出

*** First example - namespacelookup hardcoded *** Using any namespaces results in a NodeList: --> books:booklist/technical:book Number of Nodes: 1 <?xml version="1.0" encoding="UTF-8"?> <science:book xmlns:science="http://univNaSpResolver/sciencebook"> <title xmlns="http://univNaSpResolver/book">Learning XPath</title> <author xmlns="http://univNaSpResolver/book">Michael Schmidt</author> </science:book> --> books:booklist/fiction:book Number of Nodes: 2 <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust I</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust II</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> The default namespace works also: --> books:booklist/technical:book/:author Michael Schmidt

如您所见，XPath现在可以找到节点。优点是您可以根据需要重命名前缀，这是我对前缀science所做的。 XML文件包含前缀science ，而XPath使用另一个前缀technical 。因为URI相同，所以XPath可以找到节点。缺点是您必须在更多地方维护名称空间：XML，也许是XSD，XPath表达式和名称空间上下文。

从文档中读取名称空间

名称空间及其前缀记录在XML文件中，因此您可以从那里使用它们。最简单的方法是将查找委托给文档。

清单7.直接从文档中解析名称空间

public class UniversalNamespaceResolver implements NamespaceContext { // the delegate private Document sourceDocument; /** * This constructor stores the source document to search the namespaces in * it. * * @param document * source document */ public UniversalNamespaceResolver(Document document) { sourceDocument = document; } /** * The lookup for the namespace uris is delegated to the stored document. * * @param prefix * to search for * @return uri */ public String getNamespaceURI(String prefix) { if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) { return sourceDocument.lookupNamespaceURI(null); } else { return sourceDocument.lookupNamespaceURI(prefix); } } /** * This method is not needed in this context, but can be implemented in a * similar way. */ public String getPrefix(String namespaceURI) { return sourceDocument.lookupPrefix(namespaceURI); } public Iterator getPrefixes(String namespaceURI) { // not implemented yet return null; } }

记住这些事情：

如果在使用XPath之前对文档进行了更改，则此更改仍将反映在名称空间的查找中，因为委托是在需要时使用文档的当前版本完成的。名称空间或前缀的查找是在所用节点（在本例中为sourceDocument的祖先中完成的。这意味着，使用提供的代码，您只会获得在根节点上声明的名称空间。在我们的示例中找不到名称空间science 。 XPath评估时会调用查找，因此会花费一些额外的时间。

这是示例代码：

清单8.直接从文档中进行名称空间解析的示例2

private static void example2(Document example) throws XPathExpressionException, TransformerException { sysout("\n*** Second example - namespacelookup delegated to document ***"); XPath xPath = XPathFactory.newInstance().newXPath(); xPath.setNamespaceContext(new UniversalNamespaceResolver(example)); try { ... NodeList result1 = (NodeList) xPath.evaluate( "books:booklist/science:book", example, XPathConstants.NODESET); ... } catch (XPathExpressionException e) { ... } ... NodeList result2 = (NodeList) xPath.evaluate( "books:booklist/fiction:book", example, XPathConstants.NODESET); ... String result = xPath.evaluate( "books:booklist/fiction:book[1]/:author", example); ... }

该示例的输出为：

清单9.示例2的输出

*** Second example - namespacelookup delegated to document *** Try to use the science prefix: no result --> books:booklist/science:book The resolver only knows namespaces of the first level! To be precise: Only namespaces above the node, passed in the constructor. The fiction namespace is such a namespace: --> books:booklist/fiction:book Number of Nodes: 2 <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust I</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust II</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> The default namespace works also: --> books:booklist/fiction:book[1]/:author Johann Wolfgang von Goethe

如您在输出中看到的那样，未解析在book元素上声明的带有前缀science的名称空间。评估方法抛出XPathExpressionException。要解决此问题，您可以从文档中提取节点science:book并将该节点用作委托。但这意味着需要额外的文档解析，并且不够优雅。

从文档中读取名称空间并进行缓存

下一个版本的NamespaceContext更好。它仅提前一次在构造函数中读取名称空间。每次对命名空间的调用都会从缓存中得到答复。因此，文档的更改无关紧要，因为名称空间列表是在Java对象创建时缓存的。

清单10.从文档缓存名称空间解析

public class UniversalNamespaceCache implements NamespaceContext { private static final String DEFAULT_NS = "DEFAULT"; private Map<String, String> prefix2Uri = new HashMap<String, String>(); private Map<String, String> uri2Prefix = new HashMap<String, String>(); /** * This constructor parses the document and stores all namespaces it can * find. If toplevelOnly is true, only namespaces in the root are used. * * @param document * source document * @param toplevelOnly * restriction of the search to enhance performance */ public UniversalNamespaceCache(Document document, boolean toplevelOnly) { examineNode(document.getFirstChild(), toplevelOnly); System.out.println("The list of the cached namespaces:"); for (String key : prefix2Uri.keySet()) { System.out .println("prefix " + key + ": uri " + prefix2Uri.get(key)); } } /** * A single node is read, the namespace attributes are extracted and stored. * * @param node * to examine * @param attributesOnly, * if true no recursion happens */ private void examineNode(Node node, boolean attributesOnly) { NamedNodeMap attributes = node.getAttributes(); for (int i = 0; i < attributes.getLength(); i++) { Node attribute = attributes.item(i); storeAttribute((Attr) attribute); } if (!attributesOnly) { NodeList chields = node.getChildNodes(); for (int i = 0; i < chields.getLength(); i++) { Node chield = chields.item(i); if (chield.getNodeType() == Node.ELEMENT_NODE) examineNode(chield, false); } } } /** * This method looks at an attribute and stores it, if it is a namespace * attribute. * * @param attribute * to examine */ private void storeAttribute(Attr attribute) { // examine the attributes in namespace xmlns if (attribute.getNamespaceURI() != null && attribute.getNamespaceURI().equals( XMLConstants.XMLNS_ATTRIBUTE_NS_URI)) { // Default namespace xmlns="uri goes here" if (attribute.getNodeName().equals(XMLConstants.XMLNS_ATTRIBUTE)) { putInCache(DEFAULT_NS, attribute.getNodeValue()); } else { // The defined prefixes are stored here putInCache(attribute.getLocalName(), attribute.getNodeValue()); } } } private void putInCache(String prefix, String uri) { prefix2Uri.put(prefix, uri); uri2Prefix.put(uri, prefix); } /** * This method is called by XPath. It returns the default namespace, if the * prefix is null or "". * * @param prefix * to search for * @return uri */ public String getNamespaceURI(String prefix) { if (prefix == null || prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) { return prefix2Uri.get(DEFAULT_NS); } else { return prefix2Uri.get(prefix); } } /** * This method is not needed in this context, but can be implemented in a * similar way. */ public String getPrefix(String namespaceURI) { return uri2Prefix.get(namespaceURI); } public Iterator getPrefixes(String namespaceURI) { // Not implemented return null; } }

请注意，代码中有调试输出。检查并存储每个节点的属性。不检查子级，因为构造函数中的boolean toplevelOnly设置为true 。如果布尔值设置为false ，则将在存储属性后开始对子代的检查。有一点要考虑有关的代码：在DOM中，第一个节点代表的文档作为一个整体，所以，要获得元素book来读的命名空间，你必须去给孩子只有一个时间。

在这种情况下，使用NamespaceContext非常简单：

清单11.具有高速缓存的名称空间解析的示例3（仅顶层）

private static void example3(Document example) throws XPathExpressionException, TransformerException { sysout("\n*** Third example - namespaces of toplevel node cached ***"); XPath xPath = XPathFactory.newInstance().newXPath(); xPath.setNamespaceContext(new UniversalNamespaceCache(example, true)); try { ... NodeList result1 = (NodeList) xPath.evaluate( "books:booklist/science:book", example, XPathConstants.NODESET); ... } catch (XPathExpressionException e) { ... } ... NodeList result2 = (NodeList) xPath.evaluate( "books:booklist/fiction:book", example, XPathConstants.NODESET); ... String result = xPath.evaluate( "books:booklist/fiction:book[1]/:author", example); ... }

结果为以下输出：

清单12.示例3的输出

*** Third example - namespaces of toplevel node cached *** The list of the cached namespaces: prefix DEFAULT: uri http://univNaSpResolver/book prefix fiction: uri http://univNaSpResolver/fictionbook prefix books: uri http://univNaSpResolver/booklist Try to use the science prefix: --> books:booklist/science:book The cache only knows namespaces of the first level! The fiction namespace is such a namespace: --> books:booklist/fiction:book Number of Nodes: 2 <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust I</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust II</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> The default namespace works also: --> books:booklist/fiction:book[1]/:author Johann Wolfgang von Goethe

此代码仅查找根元素的名称空间。确切地说：构造函数将节点的命名空间传递给方法examineNode 。这可以加快构造函数的速度，因为它不必遍历整个文档。但是，从输出中可以看到，无法解析science前缀。 XPath表达式导致一个异常（ XPathExpressionException ）。

从文档及其所有元素中读取名称空间并进行缓存

此版本从XML文件读取所有名称空间声明。现在，甚至前缀science上的XPath都可以使用。一种情况使该版本变得复杂：如果前缀过载（在不同URI上的嵌套元素中声明），则最后一个获胜。在现实世界中，这通常不是问题。

在此示例中使用NamespaceContext与上一个示例相同。构造函数中的布尔toplevelOnly必须设置为false 。

清单13.具有高速缓存的名称空间解析的示例4（所有级别）

private static void example4(Document example) throws XPathExpressionException, TransformerException { sysout("\n*** Fourth example - namespaces all levels cached ***"); XPath xPath = XPathFactory.newInstance().newXPath(); xPath.setNamespaceContext(new UniversalNamespaceCache(example, false)); ... NodeList result1 = (NodeList) xPath.evaluate( "books:booklist/science:book", example, XPathConstants.NODESET); ... NodeList result2 = (NodeList) xPath.evaluate( "books:booklist/fiction:book", example, XPathConstants.NODESET); ... String result = xPath.evaluate( "books:booklist/fiction:book[1]/:author", example); ... }

结果为以下输出：

清单14.示例4的输出

*** Fourth example - namespaces all levels cached *** The list of the cached namespaces: prefix science: uri http://univNaSpResolver/sciencebook prefix DEFAULT: uri http://univNaSpResolver/book prefix fiction: uri http://univNaSpResolver/fictionbook prefix books: uri http://univNaSpResolver/booklist Now the use of the science prefix works as well: --> books:booklist/science:book Number of Nodes: 1 <?xml version="1.0" encoding="UTF-8"?> <science:book xmlns:science="http://univNaSpResolver/sciencebook"> <title xmlns="http://univNaSpResolver/book">Learning XPath</title> <author xmlns="http://univNaSpResolver/book">Michael Schmidt</author> </science:book> The fiction namespace is resolved: --> books:booklist/fiction:book Number of Nodes: 2 <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust I</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> <?xml version="1.0" encoding="UTF-8"?> <fiction:book xmlns:fiction="http://univNaSpResolver/fictionbook"> <title xmlns="http://univNaSpResolver/book">Faust II</title> <author xmlns="http://univNaSpResolver/book">Johann Wolfgang von Goethe</author> </fiction:book> The default namespace works also: --> books:booklist/fiction:book[1]/:author Johann Wolfgang von Goethe

结论

您可以从几种实现名称空间解析的想法中进行选择，这些想法可能比对它进行硬编码更好：

如果您的示例很小，并且所有名称空间都位于top元素中，则委托给文档即可。如果您有更大的XML文件，带有深层嵌套和多个XPath评估，则最好缓存名称空间列表。但是，如果您无法控制XML文件，并且有人可以向您发送他们希望的任何前缀，那么独立于他们的选择可能会更好。您可以像示例1（HardcodedNamespaceResolver）一样编写自己的名称空间解析，并在XPath表达式中使用它们。

在所有其他情况下，从XML文件解析的NamespaceContext可以使您的代码更通用，更小。

翻译自: https://www.ibm.com/developerworks/java/library/x-nmspccontext/index.html

相关资源：jdk-8u281-windows-x64.exe

Processed: 0.022, SQL: 9