无法在python中使用XPATH获取文本值 [英] Can't get text values using XPATH in python
问题描述
我正在尝试从此银行网站中解析货币.在代码中:
I'm trying to parse currencies from this bank website. In code:
import requests
import time
import logging
from retrying import retry
from lxml import html
logging.basicConfig(filename='info.log', format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
@retry(wait_fixed=5000)
def fetch_data_from_nb_ved_ru():
try:
page = requests.get('http://www.nbu.com/exchange_rates')
#print page.text
tree = (html.fromstring(page.text))
#fetched_ved_usd_buy = tree.xpath('//div[@class="exchangeRates"]/table/tbody/tr[5]/td[5]')
fetched_ved_usd_buy = tree.xpath('/html/body/div[1]/div//div[7]/div/div/div[1]//text()')
print fetched_ved_usd_buy
fetched_ved_usd_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[6]/text()')).strip()
fetched_ved_eur_buy = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[5]/text()')).strip()
fetched_ved_eur_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[6]/text()')).strip()
fetched_cb_eur = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[4]/text()')).strip()
fetched_cb_rub = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[18]/td[4]/text()')).strip()
fetched_cb_usd = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[4]/text()')).strip()
except:
logging.warning("NB VED UZ fetch failed")
raise IOError("NB VED UZ fetch failed")
return fetched_ved_usd_buy, fetched_ved_usd_sell, fetched_cb_usd, fetched_ved_eur_buy, fetched_ved_eur_sell,\
fetched_cb_eur, fetched_cb_rub
while True:
f = open('values_uzb.txt', 'w')
ved_usd_buy, ved_usd_sell, cb_usd, ved_eur_buy, ed_eur_sell, cb_eur, cb_rub = fetch_data_from_nb_ved_ru()
f.write(str(ved_usd_buy)+'\n'+str(ved_usd_sell)+'\n'+str(cb_usd)+'\n'+str(ved_eur_buy)+'\n'+str(ed_eur_sell)+'\n'
+ str(cb_eur)+'\n'+str(cb_rub))
f.close()
time.sleep(120)
但是它总是返回空字符串,但是,如果我执行 print page.text
,我可以看到这些值就在它们的位置.我从萤火虫那里得到了xpath.Chrome提供了相同的xpath.试图构造自己的xpath//div [@ class ="exchangeRates"]/table/tbody/tr [5]/td [5]
但这对它无效.
But it always returns empty string, however if I do print page.text
, i can see that the values are on their's places.
I got that xpath from firebug. Chrome gives the same xpath.
Tried to construct own xpath
//div[@class="exchangeRates"]/table/tbody/tr[5]/td[5]
but it happens to be not valid to.
有什么建议吗?谢谢.
推荐答案
我不确定您要查找的是什么,但这有效:
I am not certain what you are looking for exactly, but this works:
tree.xpath("/html/body/div[1]/div[7]/div/div/div[1]//text()")
关于从类 exchangeRates
开始,我发现是通过使用 tree.xpath("//div [@ class ='exchangeRates']/table")[0] .getchildren()
,即使浏览器说存在,也没有 table
的 tbody
子级.请参阅此SO问题以获取解释.从原始xpath删除 tbody
确实可行.但是,您选择的( td [5]
)为空,因此返回 []
.试试
As for starting with the class exchangeRates
, I found by using tree.xpath("//div[@class='exchangeRates']/table")[0].getchildren()
that there is no tbody
child of table
, even though browsers say there is. See this SO question for an explanation. Removing tbody
from your original xpath does work. However, the one you chose (td[5]
) is empty, thus returning []
. Try
tree.xpath("//div[@class='exchangeRates']/table/tr[5]/td[4]//text()")
# ['706.65']
或
tree.xpath("//div[@class='exchangeRates']/table/tr[6]/td[5]//text()")
# ['2638.00']
这篇关于无法在python中使用XPATH获取文本值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!