将 XPath 与 Morningstar 关键比率一起使用时返回的空白列表 [英] Blank List returned when using XPath with Morningstar Key Ratios
问题描述
我正在尝试使用 XPath 从 Morningstar 关键比率页面中提取任何给定股票的数据.我有返回结果的完整路径,用于 google chrome 的 XPath Helper 工具栏附加组件,但是当我将它插入我的代码时,我得到一个返回的空白列表.
如何获得想要返回的结果?这甚至可能吗?我是否使用了错误的方法?
非常感谢任何帮助!
我想返回的数据:
AMD 关键比率示例:
我的代码:
from urllib.request import urlopen导入 os.path导入系统从 lxml 导入 html进口请求page = requests.get('http://financials.morningstar.com/ratios/r.html?t=AMD®ion=USA&culture=en_US')树 = html.fromstring(page.content)rev = tree.xpath('/html/body/div[1]/div[3]/div[2]/div[1]/div[1]/div[1]/table/tbody/tr[2]/td[1]')打印(转)
代码结果:
[]
来自 XPath Helper 的期望结果:
谢谢,不是欧拉
这是分阶段下载大部分内容的页面之一.如果您在仅使用 requests
后查找您想要的项目,您会发现它尚不可用,如下所示.
处理这些页面的一种策略涉及使用 selenium 库.在这里,selenium 启动 Chrome 浏览器的副本,加载该 url,然后使用 xpath 表达式来定位感兴趣的 td
元素.最后,您想要的数字可用作该元素的 text
属性.
I am trying to pull a piece of data from the morningstar key ratio page for any given stock using XPath. I have the full path that returns a result in the XPath Helper tooldbar add-on for google chrome but when I plug it into my code I get a blank list returned.
How do I get the result that I want returned? Is this even possible? Am I using the wrong approach?
Any help is much appreciated!
Piece of Data that I want returned:
AMD Key Ratios Example:
My Code:
from urllib.request import urlopen
import os.path
import sys
from lxml import html
import requests
page = requests.get('http://financials.morningstar.com/ratios/r.html?t=AMD®ion=USA&culture=en_US')
tree = html.fromstring(page.content)
rev = tree.xpath('/html/body/div[1]/div[3]/div[2]/div[1]/div[1]/div[1]/table/tbody/tr[2]/td[1]')
print(rev)
Result of code:
[]
Desired result from XPath Helper:
Thanks, Not Euler
This is one of those pages that downloads much of its content in stages. If you look for the item you want after using just requests
you will find that it's not yet available, as shown here.
>>> import requests
>>> url = 'http://financials.morningstar.com/ratios/r.html?t=AMD®ion=USA&culture=en_US'
>>> page = requests.get(url).text
>>> '5,858' in page
False
One strategy for processing these pages involves the use of the selenium library. Here, selenium launches a copy of the Chrome browser, loads that url then uses an xpath expression to locate the td
element of interest. Finally, the number you want becomes available as the text
property of that element.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get(url)
>>> td = driver.find_element_by_xpath('.//th[@id="i0"]/td[1]')
<selenium.webdriver.remote.webelement.WebElement (session="f436b07c27742abb36b262639245801f", element="0.12745670001529863-2")>
>>> td.text
'5,858'
这篇关于将 XPath 与 Morningstar 关键比率一起使用时返回的空白列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!