XPath normalize-space()返回一系列规范化字符串 [英] XPath normalize-space() to return a sequence of normalized strings

查看:199
本文介绍了XPath normalize-space()返回一系列规范化字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用XPath函数normalized-space()来规范化我要从XHTML文档中提取的文本: http://test.anahnarciso.com/clean_bigbook_0.html

I need to use the XPath function normalized-space() to normalize the text I want to extract from a XHTML document: http://test.anahnarciso.com/clean_bigbook_0.html

我正在使用以下表达式:

I'm using the following expression:

//*[@slot="address"]/normalize-space(.)

这在我用来测试XPath表达式的工具Qizx Studio中非常有效。

Which works perfectly in Qizx Studio, the tool I use to test XPath expressions.

    let $doc := doc('http://test.anahnarciso.com/clean_bigbook_0.html')
    return $doc//*[@slot="address"]/normalize-space(.)

这个简单的查询返回一个 xs:string

This simple query returns a sequence of xs:string.

144 Hempstead Tpke
403 West St
880 Old Country Rd
8412 164th St
8412 164th St
1 Irving Pl
1622 McDonald Ave
255 Conklin Ave
22011 Hempstead Ave
7909 Queens Blvd
11820 Queens Blvd
1027 Atlantic Ave
1068 Utica Ave
1002 Clintonville St
1002 Clintonville St
1156 Hempstead Tpke
Route 49
10007 Rockaway Blvd
12694 Willets Point Blvd
343 James St

现在,我想使用之前的我的Java代码中的表达式。

Now, I want to use the previous expression in my Java code.

String exp = "//*[@slot=\"address"\"]/normalize-space(.)";
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(exp);
Object result = expr.evaluate(doc, XPathConstants.NODESET);

但最后一行抛出异常:

无法将XPath值转换为Java对象:必需的类是org.w3c.dom.NodeList;提供的值有类型xs:string

很明显,我应该更改 XPathConstants.NODESET 为了某件事;我尝试了 XPathConstants.STRING ,但它只返回序列的第一个元素。

Obvsiously, I should change XPathConstants.NODESET for something; I tried XPathConstants.STRING but it only returns the first element of the sequence.

我怎样才能获得类似的东西一系列字符串?

How can I obtain something like an array of Strings?

提前致谢。

推荐答案

你的表达式在XPath 2.0中有效,但在XPath 1.0(在Java中使用)中是非法的 - 它应该是 normalize-space(// * [@ slot ='address'])

Your expression works in XPath 2.0, but is illegal in XPath 1.0 (which is used in Java) - it should be normalize-space(//*[@slot='address']).

无论如何,在XPath 1.0中,在节点集上调用 normalize-space()时,只有第一个节点(按文档顺序)。

Anyway, in XPath 1.0, when normalize-space() is called on a node-set, only the first node (in document order) is taken.

为了做你想做的事,你需要使用兼容XPath 2.0的解析器,或者遍历结果节点集并在每个节点上调用 normalize-space()

In order to do what you want to do, you'll need to use a XPath 2.0 compatible parser, or traverse the resulting node-set and call normalize-space() on every node:

XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr;

String select = "//*[@slot='address']";
expr = xpath.compile(select);
NodeList result = (NodeList)expr.evaluate(input, XPathConstants.NODESET);

String normalize = "normalize-space(.)";
expr = xpath.compile(normalize);

int length = result.getLength();
for (int i = 0; i < length; i++) {
    System.out.println(expr.evaluate(result.item(i), XPathConstants.STRING));
}

...完全输出您的给定输出。

...outputs exactly your given output.

这篇关于XPath normalize-space()返回一系列规范化字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆