大括号的URL编码问题 [英] URL encoding issues with curly braces

查看:651
本文介绍了大括号的URL编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在从 GitHub存档获取数据时遇到问题.

I'm having issues getting data from GitHub Archive.

主要问题是我在URL中编码{}..时遇到了问题.可能是我误读了Github API或无法正确理解编码.

The main issue is my problem with encoding {} and .. in my URL. Maybe I am misreading the Github API or not understanding encoding correctly.

require 'open-uri'
require 'faraday'

conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday|
  faraday.request  :url_encoded             # form-encode POST params
  faraday.response :logger                  # log requests to STDOUT
  faraday.adapter  Faraday.default_adapter  # make requests with Net::HTTP
end

#query = '2015-01-01-15.json.gz' #this one works!!
query = '2015-01-01-{0..23}.json.gz' #this one doesn't work
encoded_query = URI.encode(query)

response = conn.get(encoded_query)
p response.body

推荐答案

用于检索一系列文件的GitHub Archive示例为:

The GitHub Archive example for retrieving a range of files is:

wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

wget本身将{0..23}部分解释为0的范围.23.您可以通过使用-v标志执行该命令来返回以下内容来对其进行测试:

The {0..23} part is being interpreted by wget itself as a range of 0 .. 23. You can test this by executing that command with the -v flag which returns:

wget -v http://data.githubarchive.org/2015-01-01-{0..1}.json.gz
--2015-06-11 13:31:07--  http://data.githubarchive.org/2015-01-01-0.json.gz
Resolving data.githubarchive.org... 74.125.25.128, 2607:f8b0:400e:c03::80
Connecting to data.githubarchive.org|74.125.25.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2615399 (2.5M) [application/x-gzip]
Saving to: '2015-01-01-0.json.gz'

2015-01-01-0.json.gz                                        100%[===========================================================================================================================================>]   2.49M  3.03MB/s   in 0.8s

2015-06-11 13:31:09 (3.03 MB/s) - '2015-01-01-0.json.gz' saved [2615399/2615399]

--2015-06-11 13:31:09--  http://data.githubarchive.org/2015-01-01-1.json.gz
Reusing existing connection to data.githubarchive.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 2535599 (2.4M) [application/x-gzip]
Saving to: '2015-01-01-1.json.gz'

2015-01-01-1.json.gz                                        100%[===========================================================================================================================================>]   2.42M   867KB/s   in 2.9s

2015-06-11 13:31:11 (867 KB/s) - '2015-01-01-1.json.gz' saved [2535599/2535599]

FINISHED --2015-06-11 13:31:11--
Total wall clock time: 4.3s
Downloaded: 2 files, 4.9M in 3.7s (1.33 MB/s)

换句话说,wget将值替换为URL,然后获取该新URL.这不是明显的行为,也没有很好的记录,但是您可以在那里"找到它的提法.例如,在"您应该了解的所有Wget命令中:

In other words, wget is substituting values into the URL and then getting that new URL. This isn't obvious behavior, nor is it well documented, but you can find mention of it "out there". For instance in "All the Wget Commands You Should Know":

7. Download a list of sequentially numbered files from a server
wget http://example.com/images/{1..20}.jpg

要执行所需的操作,您需要使用未经测试的代码在Ruby中迭代范围:

To do what you want, you need to iterate over the range in Ruby using something like this untested code:

0.upto(23) do |i|
  response = conn.get("/2015-01-01-#{ i }.json.gz")
  p response.body
end

这篇关于大括号的URL编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆