lxml XPath position() 不起作用 [英] lxml XPath position() does not work

查看:34
本文介绍了lxml XPath position() 不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试通过 XPath 抓取页面,但无法按预期工作.

I tried to scrape a page via XPath but I could not get it work as expected.

页面就像,

<tag1>
    <tag2>
          ....
              <div id=article>
                  <p> stuff1 </p>
                  <p> stuff2 </p>
                  <p> ...... </p>
                  <p> stuff30 </p>

我想将 stuff1stuff30 提取为字符串.这是我的 Python 代码片段.

I want to extract stuff1 through stuff30 as string. Here is my Python code snippet.

import lxml.html
import urllib.request

html = urllib.request.urlopen('http://www.something.com/news/blah/').read()
root = lxml.html.fromstring(html)

content = root.xpath('string(//div[@id="article"]/p[position()=>1 and position()<=last()]/.)')

此代码没有返回任何内容.

This code did not return anything.

如果我从 position() 语句重写为单个元素索引,它会起作用.

If I rewrite from position() statement to individual element index, it works.

content = root.xpath('string(//div[@id="article"]/p[25]/.)')

此代码正确返回stuff25.

我不想为此运行 for 循环.我相信有一种方法可以让我的代码与 position() 一起工作,但不确定我的代码有什么问题.

I don't want to run for loop just for this. I believe there is a way to get my code work with position(), but not sure what's wrong in my code.

推荐答案

那是因为你有 position()=>1,应该是 position()>=1

Thats because you have position()=>1, should be position()>=1

content = root.xpath('string(//div[@id="article"]/p[position()>=1 and position()<=last()]/.)')

将内容设置为 stuff1.

will set content to stuff1.

这篇关于lxml XPath position() 不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆