无法在python中使用XPATH获取文本值 [英] Can't get text values using XPATH in python

查看:113
本文介绍了无法在python中使用XPATH获取文本值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此银行网站中解析货币.在代码中:

I'm trying to parse currencies from this bank website. In code:

import requests
import time
import logging
from retrying import retry
from lxml import html

logging.basicConfig(filename='info.log', format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

@retry(wait_fixed=5000)
def fetch_data_from_nb_ved_ru():
try:
    page = requests.get('http://www.nbu.com/exchange_rates')
    #print page.text
    tree = (html.fromstring(page.text))
    #fetched_ved_usd_buy = tree.xpath('//div[@class="exchangeRates"]/table/tbody/tr[5]/td[5]')
    fetched_ved_usd_buy = tree.xpath('/html/body/div[1]/div//div[7]/div/div/div[1]//text()')
    print fetched_ved_usd_buy
    fetched_ved_usd_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[6]/text()')).strip()
    fetched_ved_eur_buy = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[5]/text()')).strip()
    fetched_ved_eur_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[6]/text()')).strip()
    fetched_cb_eur = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[4]/text()')).strip()
    fetched_cb_rub = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[18]/td[4]/text()')).strip()
    fetched_cb_usd = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[4]/text()')).strip()
except:
    logging.warning("NB VED UZ fetch failed")
    raise IOError("NB VED UZ  fetch failed")
return fetched_ved_usd_buy, fetched_ved_usd_sell, fetched_cb_usd, fetched_ved_eur_buy, fetched_ved_eur_sell,\
    fetched_cb_eur, fetched_cb_rub

while True:
    f = open('values_uzb.txt', 'w')
    ved_usd_buy, ved_usd_sell, cb_usd, ved_eur_buy, ed_eur_sell, cb_eur, cb_rub = fetch_data_from_nb_ved_ru()
               f.write(str(ved_usd_buy)+'\n'+str(ved_usd_sell)+'\n'+str(cb_usd)+'\n'+str(ved_eur_buy)+'\n'+str(ed_eur_sell)+'\n'
        + str(cb_eur)+'\n'+str(cb_rub))

    f.close()
    time.sleep(120)

但是它总是返回空字符串,但是,如果我执行 print page.text ,我可以看到这些值就在它们的位置.我从萤火虫那里得到了xpath.Chrome提供了相同的xpath.试图构造自己的xpath//div [@ class ="exchangeRates"]/table/tbody/tr [5]/td [5] 但这对它无效.

But it always returns empty string, however if I do print page.text, i can see that the values are on their's places. I got that xpath from firebug. Chrome gives the same xpath. Tried to construct own xpath //div[@class="exchangeRates"]/table/tbody/tr[5]/td[5] but it happens to be not valid to.

有什么建议吗?谢谢.

推荐答案

我不确定您要查找的是什么,但这有效:

I am not certain what you are looking for exactly, but this works:

tree.xpath("/html/body/div[1]/div[7]/div/div/div[1]//text()")

关于从类 exchangeRates 开始,我发现是通过使用 tree.xpath("//div [@ class ='exchangeRates']/table")[0] .getchildren(),即使浏览器说存在,也没有 table tbody 子级.请参阅此SO问题以获取解释.从原始xpath删除 tbody 确实可行.但是,您选择的( td [5] )为空,因此返回 [] .试试

As for starting with the class exchangeRates, I found by using tree.xpath("//div[@class='exchangeRates']/table")[0].getchildren() that there is no tbody child of table, even though browsers say there is. See this SO question for an explanation. Removing tbody from your original xpath does work. However, the one you chose (td[5]) is empty, thus returning []. Try

tree.xpath("//div[@class='exchangeRates']/table/tr[5]/td[4]//text()")
# ['706.65']

tree.xpath("//div[@class='exchangeRates']/table/tr[6]/td[5]//text()")
# ['2638.00']

这篇关于无法在python中使用XPATH获取文本值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆