如何检查 xpath 是否存在,然后在文本存在时返回值? [英] How can I check if either xpath exists and then return the value if text is present?

查看:60
本文介绍了如何检查 xpath 是否存在,然后在文本存在时返回值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理第二个 r.html.xpath 请求时遇到问题.当某件商品有特价时,第二个Xpath从

I'm having trouble with the second r.html.xpath request. When there is a special deal on an item, the second Xpath changes from

//*[@id="priceblock_ourprice"]

//*[@id="priceblock_ourprice"]

//*[@id="priceblock_dealprice"]

//*[@id="priceblock_dealprice"]

这会导致脚本失败,因为无法返回正确的 xpath.如何包含仅偶尔出现的第二个 xpath?我想看看 xpath 是否存在,如果存在,则返回,或返回 N/A.搜索到的第一个 url 具有 ourprice xpath,第二个 url 具有 dealprice xpath.我在这里错过了什么?

This causes the script to fail since there the right xpath cannot be returned. How can I include this second xpath that only shows up occasionally? I would like to see if either xpath exists, if so return that, or return N/A. The first url that is searched has the ourprice xpath and the second url has the dealprice xpath. What am I missing here?

from requests_html import HTMLSession
import pandas as pd

urls = ['http://amazon.com/dp/B01KZ6V00W',
'http://amazon.com/dp/B089FBPFHS'
          ]

def getPrice(url):
    s = HTMLSession()
    r = s.get(url)
    r.html.render(sleep=1,timeout=20)
    product = {
        'title': str(r.html.xpath('//*[@id="productTitle"]', first=True).text),
        'price': str(r.html.xpath('//*[@id="priceblock_ourprice"]', first=True).text),
        'details': str(r.html.xpath('//*[@id="detailBulletsWrapper_feature_div"]', first=True).text)
    }
    res = {}
    for key in list(product):
        res[key] = product[key].replace('\n',' ')

    print(res)
    return res

prices = []
for url in urls:
    prices.append(getPrice(url))


df = pd.DataFrame(prices)
print(df.head(15))
df.to_csv("testfile.csv",index=False)
print(len(prices))

追溯

  'price': str(r.html.xpath('//*[@id="priceblock_ourprice"]', first=True).text),
AttributeError: 'NoneType' object has no attribute 'text'

推荐答案

为什么不使用 try 和 except 命令来检查值是否存在.您收到错误是因为您尝试获取的值中没有文本.

Why don't you use the try and except command to check if the value exists. You get the error because the value you are trying to get has no text in it.

我没有requests_html,但我会展示使用selenium模块的代码.

I haven't got requests_html, but I will show the code using the selenium module.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep, strftime
import pandas as pd

urls = ['http://amazon.com/dp/B01KZ6V00W', 'http://amazon.com/dp/B089FBPFHS']

webdriver = webdriver.Chrome()
old_price = ""


def getPrice(url):
    global old_price
    global webdriver

    webdriver.get(url)

    sleep(5)

    title = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[1]/div/h1/span").text

    try:
        old_price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[10]/div[1]/div/table/tbody/tr[1]/td[2]/span[1]").text
        price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[1]/div[5]/div/div/div/div/div/form/div/div/div/div/div[1]/div/span[1]").text
        if old_price[1:] == price[1:]:
            deal_type = "normal"
        else:
            deal_type = "deal"
    
    except:
        price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[1]/div[5]/div/div/div/div/div/form/div/div/div/div/div[1]/div/span[1]").text
        deal_type = "normal"
    
    print(old_price)
    print(title)
    print(price)
    print(deal_type)

    return price

prices = []

for url in urls:
    prices.append(getPrice(url))

print(prices)

df = pd.DataFrame(prices)
print(df.head(15))
df.to_csv("testfile.csv",index=False)
print(len(prices))

让我解释一下:

前 4 行导入必要的模块,例如 selenium 和 pandas.下一行保存 URL.之后, webdriver = webdriver.Chrome() 将浏览器设置为 chrome.

The first 4 lines import the necessary modules such as selenium and pandas. The next line saves the URLs. After, webdriver = webdriver.Chrome() sets the brower to chrome.

之后,在 getPrice 中,我们使用 webdriver.get(url) 打开 url.

After, in getPrice, we open the url using webdriver.get(url).

然后,我们从 xpath 变量中获取标题.

Then, we get the title from the xpath variable.

try 命令检查显示交易的 xpath 是否存在.如果是,它会获取新旧价格,并将产品保存为交易.如果交易的 xpath 不存在,它会移动到 except 并将产品保存为普通产品.

The try command checks to see if the xpath which shows the deal exists. if it does, it gets the old and new price, and saves the product as a deal. If the xpath for a deal does NOT exist, it moves onto the except and saves the prodcut as a normal one.

然后打印价格、标题和交易类型.

It then prints the price, title and deal type.

最后,它为每个 URL 运行该函数,并将其保存到 CSV 文件中.

Finally, it runs the function for every URL, and saves it to a CSV file.

我希望这可以帮助您解决问题.我解释了代码,以便您可以将其转换为 requests_html.

I hope this helps your problem. I explained the code so that you could turn it into requests_html.

这篇关于如何检查 xpath 是否存在,然后在文本存在时返回值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆