JSON URL 有时会返回空响​​应 [英] JSON URL sometimes returns a null response

查看:42
本文介绍了JSON URL 有时会返回空响​​应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在抓取一个从单个 JSON 文件加载产品数据的网站.我通过检查网络流量找到了 JSON 的 URL.

I'm scraping a website which loads product data from individual JSON files. I found the URLs to the JSONs by inspecting the network traffic.

问题是:当我遵循 JSON URL 时,大多数链接将提供 JSON 结果.但是其中包含特殊字符的产品的 JSON URL,例如 é,返回空响应.浏览器上当然是显示数据了,但是好像不能直接得到JSON响应.

The problem is this: when I follow the JSON URLs, most of the links will provide a JSON result. But the JSON URLs of products that have special characters in them, eg é, return a null response. Of course the data is shown on the browser but I can't seem to get the JSON response directly.

有什么建议吗?

(我试图找到一个类似的网站,其行为方式相同,因此我可以将其发布在这里)

(I'm trying to find a similar website that acts in the same way so I can post it here for example)

这是一个例子

产品 A 网址:https://www.boozebud.com/p/hopnationbrewingco/thedamned

作品:A 的 JSON 网址:https://www.boozebud.com/a/producturl/p/hopnationbrewingco/thedamned

WORKS: A's JSON url: https://www.boozebud.com/a/producturl/p/hopnationbrewingco/thedamned

产品 B 网址:https://www.boozebud.com/p/àbloc/superprestigenaturalblondebeer

返回空值:B 的 JSON 网址:https://www.boozebud.com/a/producturl/p/àbloc/superprestigenaturalblondebeer

RETURNS NULL: B's JSON url: https://www.boozebud.com/a/producturl/p/àbloc/superprestigenaturalblondebeer

(与我之前未回答的问题有关:scrapy:处理特殊字符url 可能需要根据这个问题进行修改)

(Related to my previous unanswered question: scrapy: dealing with special characters in url which might need to be revised in light of this question)

推荐答案

在我看来问题是标题,它似乎对至少 Content-Type 标题非常敏感,似乎它在服务器内部用于解码传入的 URL 或类似的东西.尝试像这样下载请求(这是内部js在做的)

It seems to me that the problem is the headers, it seems to be very sensitive to at least the Content-Type header, it seems it's used internally on the server to decode the incoming URL or something like that. Try downloading the request like this (this is what the internal js is doing)

yield Request('https://www.boozebud.com/a/producturl/p/%C3%A0bloc/superprestigenaturalblondebeer', 
              headers={"Content-Type": "application/json; charset=UTF-8"})

这篇关于JSON URL 有时会返回空响​​应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆