Xpath 返回 null [英] Xpath returns null

查看:74
本文介绍了Xpath 返回 null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要刮这个页面的价格:

有人可以帮忙吗?

谢谢!

使用 XHR 提取时:

如何检索价格?

解决方案

您的问题不是 xpath,而是使用 XHR 检索价格.

如果你使用scrapy sheel并输入view(response),你可以看到价格没有被生成:

查看原网页出处,搜索价格:

然后用这个网址刮价格:

 def parse(self, response):进口重新price_url = 'https://www.asos.com' + re.search(r'window.asos.pdp.config.stockPriceApiUrl = \'(.+)\'', response.text).group(1)产量scrapy.Request(url=price_url,方法='GET',回调=self.parse_price,标头=self.headers)def parse_price(自我,响应):导入jsonjsonresponse = json.loads(response.text)………………………………………………………………

我无法通过我提供的标头解决 403 错误,但也许你会有更多的运气.

为了从 json 文件中获取价格,实际上不需要 json.loads

 def parse_price(self, response):jsonresponse = response.json()[0]price = jsonresponse['productPrice']['current']['text']# 如果您愿意,也可以使用 jsonresponse.get()打印(价格)

输出:

£10.00

I need to scrape the price of this page: https://www.asos.com/monki/monki-lisa-cropped-vest-top-with-ruched-side-in-black/prd/23590636?colourwayid=60495910&cid=2623

However it is always returning null:

My code:

'price' :response.xpath('//*[contains(@class, "current-price")]').get()

Can someone help please?

Thanks!

When Extracted using XHR:

How to retrieve price?

解决方案

Your problem is not the xpath, it's that the price is being retrieved with XHR.

If you use scrapy sheel and type view(response) you can see that the price is not being generated:

Look at the source of the original webpage and search for the price:

Then use this url the scrape the price:

    def parse(self, response):
        import re
        price_url = 'https://www.asos.com' + re.search(r'window.asos.pdp.config.stockPriceApiUrl = \'(.+)\'', response.text).group(1)
        yield scrapy.Request(url=price_url,
                             method='GET',
                             callback=self.parse_price,
                             headers=self.headers)

    def parse_price(self, response):
        import json
        jsonresponse = json.loads(response.text)
        ...............
        ...............
        ...............

I couldn't get around 403 error with the headers I provided, but maybe you'll have more luck.

Edit:

In order to get the price from the json file there's actually no need for json.loads

    def parse_price(self, response):
        jsonresponse = response.json()[0]
        price = jsonresponse['productPrice']['current']['text']
        # You can also use jsonresponse.get() if you prefer
        print(price)

Output:

£10.00

这篇关于Xpath 返回 null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆