使用lxml从HTML解析段落 [英] Parse paragraphs from HTML using lxml
本文介绍了使用lxml从HTML解析段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是lxml的新手,想从给定的URL中提取<p>PARAGRAPHS</p>
和<li>PARAGRAPHS</li>
,并将其用于进一步的步骤.
I am new to lxml and want to extract <p>PARAGRAPHS</p>
and <li>PARAGRAPHS</li>
from a given url and use them for further steps.
我遵循了 post 中的示例,并尝试了以下代码没有运气:
I followed an example from a post, and tried the following code with no luck:
html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')
我试图查看 lxml.html 中的示例,但没有找到任何示例使用网址.
I tried to look into the examples in lxml.html, but didn't find any example using url.
您能给我一些关于我应该使用哪种方法的提示吗?谢谢.
Could you give me any hint on what methods should I use? Thanks.
推荐答案
import lxml.html
htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')
print htmltree.xpath('//p/text()')
这篇关于使用lxml从HTML解析段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文