如何通过用户代理获取URL并通过Ruby中的某个代理超时? [英] How to GET a URL with User-Agent and timeout through some Proxy in Ruby?
问题描述
如果我需要通过某个代理获取某个网址,它必须有一个超时的最大值n。秒和一个用户代理?
需要'nokogiri'
需要'网/ http'
require'rexml / document'
def get_with_max_wait(param,proxy,timeout)
url =http://example.com/?p=# {参数}
uri = URI.parse(url)
proxy_uri = URI.parse(代理)
http = Net :: HTTP.new(uri.host,80,proxy_uri.host, proxy_uri.port)
http.open_timeout = timeout
http.read_timeout = timeout
response = http.get(url)
doc = Nokogiri.parse(response.body)
doc.css(.css .goes .here)[0] .content.strip
结束
以上代码通过超时代理获取网址,但缺少 User-Agent 。
您应该使用open-uri并设置用户代理作为打开函数中的参数。
下面是一个例子,我将用户Agent设置为一个变量,并将其用作打开函数中的参数。
require'rubygems'
需要'nokogiri'
需要'open-uri'
user_agent =Mozilla / 5.0(Macintosh; Intel Mac OS X 10_7_0)AppleWebKit / 535.2(KHTML,如Gecko)Chrome / 15.0.854.0 Safari / 535.2
$ b url =http: //www.somedomain.com/somepage/
@doc = Nokogiri :: HTML(open(url,'proxy'=>'http://(ip_address):(port)' ,'User-Agent'=> user_agent,'read_timeout'=> 10),nil,UTF-8)
有一个选项可以在openURI中设置readtime
您可以在下面的链接中查看Open URI的文档。
$ b
How do I get a URL if I need to get it through some proxy, it has to have a timeout of max n. seconds, and a User-Agent?
require 'nokogiri'
require 'net/http'
require 'rexml/document'
def get_with_max_wait(param, proxy, timeout)
url = "http://example.com/?p=#{param}"
uri = URI.parse(url)
proxy_uri = URI.parse(proxy)
http = Net::HTTP.new(uri.host, 80, proxy_uri.host, proxy_uri.port)
http.open_timeout = timeout
http.read_timeout = timeout
response = http.get(url)
doc = Nokogiri.parse(response.body)
doc.css(".css .goes .here")[0].content.strip
end
The code above gets a URL through a proxy with timeout, but it's missing the User-Agent. How do I get it with User-Agent?
You should use open-uri and set the user agent as parameter in open function .
Below is an example where I am setting user Agent in a variable and using that as parameter in open function .
require 'rubygems'
require 'nokogiri'
require 'open-uri'
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.854.0 Safari/535.2"
url = "http://www.somedomain.com/somepage/"
@doc = Nokogiri::HTML(open(url, 'proxy' => 'http://(ip_address):(port)', 'User-Agent' => user_agent, 'read_timeout' => 10 ), nil, "UTF-8")
There is an option to set readtime out in openURI
You can review the documentation of Open URI in the below link
这篇关于如何通过用户代理获取URL并通过Ruby中的某个代理超时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!