请求具有base64数据编码的URL [英] Requesting URLs with base64 data encoded

查看:187
本文介绍了请求具有base64数据编码的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试请求一个URL,其中包含以base64编码的数据,如下所示:

I'm trying to request a URL with data encoded in base64 on it, like so:

http://www.somepage.com/es_e/bla_bla#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ==

我要做的是构建一个JSON对象,将其编码为base64,然后将其附加到如下网址:

What I do, is build a JSON object, encode it into base64, and append it to a url like this:

new_data = {"data": {"countryId": "ES", "regionId": "920", "duration": 7, "minPersons": 1}, "config": {"page": 2}}
json_data = json.dumps(new_data)
new_url = "http://www.somepage.com/es_es/bla_bla#" + base64.b64encode(json_data)
yield scrapy.Request(url=new_url, callback=self.parse)

问题是Scrapy仅对URL http://www.somepage.com/es_es/bla_bla的这一部分进行爬网,而没有对数据进行编码和附加......但是,如果我将new_url粘贴到浏览器中,它将显示我想要的结果数据编码!

The problem is that Scrapy crawls only this part of the URL http://www.somepage.com/es_es/bla_bla without the data encoded and appended to it...however, if I paste the new_url into the browser, it shows me the result I want with the data encoded!

不知道发生了什么...有人可以帮我吗?

Don't know what's happening...Can anyone give me a hand?

推荐答案

经过大量搜索后,我读到这种URL,在末尾带有#的URL(即我的URL http://www.somepage.com/es_e/bla_bla#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ==)是称为 片段URL ,基本上,它们表示资源中的位置,例如锚点(您可以

After been searching a lot, I read that this kind of URLs, the one with a # at the end (i.e. my URL http://www.somepage.com/es_e/bla_bla#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ==) are called Fragment URLs and basically they indicate a location within a resource, like an anchor (you can read it here).

然后,由于此帖子,我了解到这些内容需要页面加载,因此网站本身会发出获取数据的请求(发送请求),所以我要做的是使用Firefox开发人员版搜索这些 发送请求 (您可以使用任何其他向您显示这些请求的系统(例如篡改数据),并构建提供给我所需HTML内容的URL.

And then, thanks to this post I learned that those contents need to be loaded by the page, so the website itself makes requests to get that data (Outgoing Requests), so what I did was to search for those Outgoing Requests using Firefox Developer Edition (you can use any other system that shows you these requests, like Tamper Data), and build the URL that gives me the HTML content I was looking for.

# The base64 data encoded as a JSON is appended after the 'searchRequest=' instead of using the '#' element, and voilà!
"http://www.somewebsite.es/?controller=ajaxresults&action=getresults&searchRequest=eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6N30sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ=="

我也可以使用 Selium 库来实现这一目标,正如您在其他

I could also achieve this by using the Selium library, as you can see in this other post, but isn't the best practice...

这篇关于请求具有base64数据编码的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆