python:使用lxml xpath从更改span类中获取数据 [英] python: get data from changing span class using lxml xpath

查看:126
本文介绍了python:使用lxml xpath从更改span类中获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从wsj网站提取资产回报".但是,我的代码不够健壮,无法在不同条件下工作. 我可以使用以下代码提取代码"SCGM"的数据,但对于"AASIA",其结果为<span class="marketDelta deltaType-negative">

I want to extract 'Return On Assets' from wsj websites. However, my code is not robust enough to work in different conditions. I able to extract data for ticker 'SCGM' using the code below but fail for'AASIA' as <span class="marketDelta deltaType-negative">

from lxml import html
import requests

StockData =['SCGM','AASIA']
page_wsj1 = requests.get('http://quotes.wsj.com/MY/'+StockData[x]+'/financials')
wsj1 = html.fromstring(page_wsj1.content)
wsj_fig = wsj1.xpath('//span[@class="marketDelta noChange"]/text()')
ROA = wsj_fig[25]

SCGM没问题,但AASIA没问题,因为跨度类已更改. 对于SCGM,如下所示的html标签.完整链接此处

No issue for SCGM but for AASIA, it did not work as the span class is changed. For SCGM, the html tags as below. Full link here

<tr> <td> <span class="data_lbl">Return on Assets</span> <span class="data_data"> <span class="marketDelta noChange">18.26</span> </span> </td> </tr>

对于AASIA,html标签如下.完整链接此处

For AASIA, the html tags as below . Full link here

<tr> <td> <span class="data_lbl">Return on Assets</span> <span class="data_data"> <span class="marketDelta deltaType-negative">-1.36</span> </span> </td> </tr>

如何编写一个既适用于这两种情况又直接指向资产回报率"的代码?

How to have a code that work for both conditions or point straight to 'Return on Assets'?

推荐答案

//td[normalize-space(span) = "Return on Assets"]/span[@class = "data_data"]/span

这篇关于python:使用lxml xpath从更改span类中获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆