"open_http":对于字符串"Steve_Jobs",禁止使用403(OpenURI :: HTTPError).但不适用于任何其他字符串 [英] `open_http': 403 Forbidden (OpenURI::HTTPError) for the string "Steve_Jobs" but not for any other string

查看:154
本文介绍了"open_http":对于字符串"Steve_Jobs",禁止使用403(OpenURI :: HTTPError).但不适用于任何其他字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在浏览 http://ruby.bastardsbook.com/提供的Ruby教程,并且遇到了以下代码:

I was going through the Ruby tutorials provided at http://ruby.bastardsbook.com/ and I encountered the following code:

require "open-uri"

remote_base_url = "http://en.wikipedia.org/wiki"
r1 = "Steve_Wozniak"
r2 = "Steve_Jobs"
f1 = "my_copy_of-" + r1 + ".html"
f2 = "my_copy_of-" + r2 + ".html"

# read the first url
remote_full_url = remote_base_url + "/" + r1
rpage = open(remote_full_url).read

# write the first file to disk
file = open(f1, "w")
file.write(rpage)
file.close

# read the first url
remote_full_url = remote_base_url + "/" + r2
rpage = open(remote_full_url).read

# write the second file to disk
file = open(f2, "w")
file.write(rpage)
file.close

# open a new file:
compiled_file = open("apple-guys.html", "w")

# reopen the first and second files again
k1 = open(f1, "r")
k2 = open(f2, "r")

compiled_file.write(k1.read)
compiled_file.write(k2.read)

k1.close
k2.close
compiled_file.close

代码失败,并显示以下跟踪信息:

The code fails with the following trace:

/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:277:in `open_http': 403 Forbidden (OpenURI::HTTPError)
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:518:in `open'
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/open-uri.rb:30:in `open'
    from /Users/arkidmitra/tweetfetch/samecode.rb:11

我的问题不是代码失败,而是每当我将r2更改为Steve_Jobs之外的任何其他东西时,它都能工作.这是怎么回事?

My problem is not that the code fails but that whenever I change r2 to anything other than Steve_Jobs, it works. What is happening here?

推荐答案

我认为这种情况发生在诸如史蒂夫·乔布斯",阿尔·戈尔"等被锁定的条目上. :

I think this happens for locked down entries like "Steve Jobs", "Al-Gore" etc. This is specified in the same book that you are referring to:

对于某些页面(例如Al Gore的锁定条目),Wikipedia将 如果未指定User-Agent,则不响应Web请求.这 用户代理"通常是指您的浏览器,您可以通过以下方式查看 在浏览器中检查您为任何页面请求发送的标题. 通过提供"User-Agent"键值对,(我基本上使用"Ruby" 似乎可行),我们可以将其作为哈希值传递(我使用常量 HEADERS_HASH(在示例中)作为方法的第二个参数 打电话.

For some pages – such as Al Gore's locked-down entry – Wikipedia will not respond to a web request if a User-Agent isn't specified. The "User-Agent" typically refers to your browser, and you can see this by inspecting the headers you send for any page request in your browser. By providing a "User-Agent" key-value pair, (I basically use "Ruby" and it seems to work), we can pass it as a hash (I use the constant HEADERS_HASH in the example) as the second argument of the method call.

稍后在 http://ruby.bastardsbook.com/chapters/web-爬行/

这篇关于"open_http":对于字符串"Steve_Jobs",禁止使用403(OpenURI :: HTTPError).但不适用于任何其他字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆