lxml XPath position() 不起作用 [英] lxml XPath position() does not work
问题描述
我尝试通过 XPath 抓取页面,但无法按预期工作.
I tried to scrape a page via XPath but I could not get it work as expected.
页面就像,
<tag1>
<tag2>
....
<div id=article>
<p> stuff1 </p>
<p> stuff2 </p>
<p> ...... </p>
<p> stuff30 </p>
我想将 stuff1
到 stuff30
提取为字符串.这是我的 Python 代码片段.
I want to extract stuff1
through stuff30
as string. Here is my Python code snippet.
import lxml.html
import urllib.request
html = urllib.request.urlopen('http://www.something.com/news/blah/').read()
root = lxml.html.fromstring(html)
content = root.xpath('string(//div[@id="article"]/p[position()=>1 and position()<=last()]/.)')
此代码没有返回任何内容.
This code did not return anything.
如果我从 position()
语句重写为单个元素索引,它会起作用.
If I rewrite from position()
statement to individual element index, it works.
content = root.xpath('string(//div[@id="article"]/p[25]/.)')
此代码正确返回stuff25
.
我不想为此运行 for 循环.我相信有一种方法可以让我的代码与 position()
一起工作,但不确定我的代码有什么问题.
I don't want to run for loop just for this. I believe there is a way to get my code work with position()
, but not sure what's wrong in my code.
推荐答案
那是因为你有 position()=>1,应该是 position()>=1
Thats because you have position()=>1, should be position()>=1
content = root.xpath('string(//div[@id="article"]/p[position()>=1 and position()<=last()]/.)')
将内容设置为 stuff1.
will set content to stuff1.
这篇关于lxml XPath position() 不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!