获取数据的 xpath 以特定字符或字符串开头 [英] xpath to get data starts with specific character or string
问题描述
我需要从以下代码中提取某些文本元素.
<h2>德意志交通法规<br>Verkehrswacht 多特蒙德 e.五、<br><h3>标准号: <span style="font-weight: normal;">4.E08</span><div class="clear"></div><br>本尼迪克特内大街 82 号<br>44287 多特蒙德<br>德国<br><br>电话:+49 231 447687<br>传真:+49 231 447136<br>电子邮件:info@verkehrswacht-dortmund.de<br><a href="http://www.verkehrswacht-dortmund.de" class="url" target="_blank">www.verkehrswacht-dortmund.de</a><br><div class="social"></div><br>
要提取电话:+49 231 447687,我可以使用 div[@class='inhalt-links']/text()[4]
.对于传真、电子邮件、网站等其他详细信息,我只需要更改 text() 元素的位置编号.但是,这些文本的位置有时会有不同的顺序,例如以下代码:
<h2>DEW21<br><h3>标准号: <span style="font-weight: normal;">4.B56</span><div class="clear"></div><br>Günter-Samtlebe-Platz 1<br>44135 多特蒙德<br>邮局:104141<br>44041 多特蒙德<br>德国<br><br>电话:+49 231 544-0<br>传真:+49 231 544-1130<br>邮箱:vertrieb@dew21.de<br><a href="http://www.dew21.de" class="url" target="_blank">www.dew21.de</a><br><div class="social"></div><br>
xpath div[@class='inhalt-links']/text()[4]
将选择文本44041 Dortmund"而不是电话:+49 231 544-0.有没有像 "div[@class='inhalt-links']/text[starts with "Tel.:"]"
这样的 xpath 来选择 Tel.:
元素?
" 有没有像
"//div[@class='inhalt-links']/text[starts with "Tel.:"]"
这样的 xpath 来选择电话:
元素?"
当然,试试这个:
//div[@class='inhalt-links']/text()[starts-with(normalize-space(), 'Tel.:')]
XPath 返回文本节点——而不是元素——在删除前导和尾随空格后*,关键字Tel.:
.
*) 参考 normalize-space()
做的更精确:
normalize-space
函数从字符串中去除前导和尾随空格,用单个空格替换空格字符序列,并返回结果字符串.[Mozilla 开发者网络]
I need to extract certain text elements from the following code.
<div class="inhalt-links">
<h2>
Deutsche Verkehrswacht
<br>
Verkehrswacht Dortmund e. V.
<br>
</h2>
<h3>
Standnummer:
<span style="font-weight: normal;">4.E08</span>
</h3>
<div class="clear"></div>
<br>
Benediktinerstraße 82
<br>
44287 Dortmund
<br>
Deutschland
<br>
<br>
Tel.:+49 231 447687
<br>
Fax:+49 231 447136
<br>
E-Mail:info@verkehrswacht-dortmund.de
<br>
<a href="http://www.verkehrswacht-dortmund.de" class="url" target="_blank">www.verkehrswacht-dortmund.de</a>
<br>
<div class="social"></div>
<br>
</div>
For extracting the Tel.:+49 231 447687, i can use div[@class='inhalt-links']/text()[4]
. And for other details like Fax, Email, Website, i just need to change the position number of text() element. But, the position of these texts will be of different order sometimes, like in the following code:
<div class="inhalt-links">
<h2>
DEW21
<br>
</h2>
<h3>
Standnummer:
<span style="font-weight: normal;">4.B56</span>
</h3>
<div class="clear"></div>
<br>
Günter-Samtlebe-Platz 1
<br>
44135 Dortmund
<br>
Postfach:104141
<br>
44041 Dortmund
<br>
Deutschland
<br>
<br>
Tel.:+49 231 544-0
<br>
Fax:+49 231 544-1130
<br>
E-Mail:vertrieb@dew21.de
<br>
<a href="http://www.dew21.de" class="url" target="_blank">www.dew21.de</a>
<br>
<div class="social"></div>
<br>
</div>
The xpath div[@class='inhalt-links']/text()[4]
will select the text "44041 Dortmund" instead of Tel.:+49 231 544-0. Is there any xpath like "div[@class='inhalt-links']/text[starts with "Tel.:"]"
to select the Tel.:
element?
" Is there any xpath like
"//div[@class='inhalt-links']/text[starts with "Tel.:"]"
to select theTel.:
element?"
Sure, try this way :
//div[@class='inhalt-links']/text()[starts-with(normalize-space(), 'Tel.:')]
The XPath returns text node -rather than element- that starts with, after removing leading and trailing whitespaces*, the keyword Tel.:
.
*) For reference of what normalize-space()
is doing more precisely :
The
normalize-space
function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string. [Mozilla Developer Network]
这篇关于获取数据的 xpath 以特定字符或字符串开头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!