python tree.xpath 返回空列表 [英] python tree.xpath return empty list

查看:80
本文介绍了python tree.xpath 返回空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难弄清楚为什么下面代码中显示的 tree.xpath 方法会返回一个空列表.在这个例子中,我只是想检索雅虎财经中人们也关注"横幅下的股票行情,这看起来很微不足道,但到目前为止还无法使其工作.

I'm having hard time figuring out why the tree.xpath method displayed in my code below would return an empty list. In this example, I'm just trying to retrieve the stock tickers under the "People also Watch" banner in yahoo finance which seems pretty trivial but was unable to make it work so far.

我正在从检查元素页面复制 xpath.也尝试过手动更改 xpath,例如删除 'tbody',但它也不起作用.任何帮助将非常感激.谢谢

I'm copying the xpath from the inspect element page. Have also tried changing the xpath manually such as removing 'tbody', but it didn't work neither. Any help would be really appreciated. Thank you

import requests
from lxml import html


ticker = 'TSLA'
url = 'https://finance.yahoo.com/quote/'+str(ticker)+'?p='+str(ticker)
page = requests.get(url)
tree = html.fromstring(page.content)
tree.xpath('//*[@id="rec-by-symbol"]/table/tbody/tr[1]/td[1]/a')```




推荐答案

您正在尝试解析被浏览器呈现为 HTML 代码的页面.如果您打开页面的源代码 - 您会看到,它有一个大脚本标记,其中包含所有要渲染的数据.

You're trying to parse page, that is being rendrered to HTML code by browser. If you open page's source code - you will see, that it has one big script tag with all data to be renderer.

您有两种处理这种情况的方法:

You have two ways how to deal with this situation:

1.渲染页面并在其中运行 XPathes.

它是关于在浏览器中打开页面,从中获取渲染的 DOM 并运行 XPath.

It is about opening page in browser, getting rendered DOM from it and running XPath.

在这种情况下使用的最佳工具 - selenium 与某种网络驱动程序(通过 python 代码控制浏览器的实用程序)

Best tool to use in this case - selenium with some kind of webdriver (util to control browser via python code)

适合您情况的示例代码:

Example code for your situation:

from selenium import webdriver
driver = webdriver.Chrome()

ticker = 'TSLA'
url = 'https://finance.yahoo.com/quote/'+str(ticker)+'?p='+str(ticker)
driver.get(url)

xpath = '//*[@id="rec-by-symbol"]/table/tbody/tr[1]/td[1]/a'
found_nodes = driver.find_elements_by_xpath(xpath)

for node in found_nodes:
    print(node.text)

driver.close()
driver.quit()

但是您需要安装 selenium 并下载适当的驱动程序.对于我在示例中使用的 Chrome,它将是 chromedriver(您可以在此处获取它:https://chromedriver.chromium.org/):

But you need to install selenium and download proper driver. For Chrome I used in example it will be chromedriver (you can get it here: https://chromedriver.chromium.org/):

pip install selenium

2.将脚本解析为 Object(尤其是节点 root.App.main)并使用它

这是更复杂的方式,但不需要浏览器.

It is more complicated way, but without need of browser.

工作流程:

a. Download page via requests;
b. Get script with target data (via regular expressions);
c. load root.App.main as json Object (json.loads method);
d. Find necessary nodes in Object.

我不会为这种情况提供任何代码,因为它需要为您的任务编写几乎整个解析器.

I won't provide any code for this case, because it requires to write almost whole parser for your task.

这篇关于python tree.xpath 返回空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆