使用 xpath 获取特定键的所有值(python 网络抓取) [英] Get all values of specific key with xpath (python web scraping)
本文介绍了使用 xpath 获取特定键的所有值(python 网络抓取)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我们有网页
<div class="specific-row" data-id="101736782"></div>
<div class="yellow-box-row" data-id="112376244"></div>
<div class="specific-row" data-id="179218312"></div>
<div class="vip-row" data-id="123749014"></div>
如何获取所有 data-id 值?像 ['101736782', '112376244', '179218312', '123749014']
How can I get all data-id values?
Like ['101736782', '112376244', '179218312', '123749014']
我使用了 tree.xpath
I used tree.xpath
import requests
from lxml import html
r = requests.get(url)
tree = html.fromstring(r.content)
tree.xpath("//div@data-id=['any']")
推荐答案
我试试这个...
from lxml import etree, html
doc = '<root><div class="specific-row" data-id="101736782"></div><div class="yellow-box-row" data-id="112376244"></div><div class="specific-row" data-id="179218312"></div><div class="vip-row" data-id="123749014"></div></root>'
root = etree.XML(doc) # EQUALS TO >>> root = html.fromstring(doc)
xpatheval = etree.XPathEvaluator(root)
divs = xpatheval('//div')
ids = [el.get('data-id') for el in divs]
## If you have installed cssselect you can do
divs = root.cssselect('[data-id]')
ids = [el.get('data-id') for el in divs]
# (cssselect) use the same schema of selection of 'some_element_node.querySelector("data-id")' of browsers
# Maybe this is what you are looking for -- https://lxml.de/tutorial.html#elementpath
root.findall('div[@data-id]')
我使用这个链接来帮助我.
这篇关于使用 xpath 获取特定键的所有值(python 网络抓取)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文