当使用 RCurl 确实存在 URL 时,为什么 url.exists 返回 FALSE? [英] Why url.exists returns FALSE when the URL does exists using RCurl?

查看:30
本文介绍了当使用 RCurl 确实存在 URL 时,为什么 url.exists 返回 FALSE?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如:

if(url.exists("http://www.google.com")) {
    # Two ways to submit a query to google. Searching for RCurl
    getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search")
    # Here we let getForm do the hard work of combining the names and values.
    getForm("http://www.google.com/search", hl="en", lr="",ie="ISO-8859-1", q="RCurl", btnG="Search")
    # And here if we already have the parameters as a list/vector.
    getForm("http://www.google.com/search", .params = c(hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search"))
}

这是来自 RCurl 包手册的示例.但是,它不起作用:

This is an example from RCurl package manual. However, it does not work:

> url.exists("http://www.google.com")
[1] FALSE

我发现这里有一个答案Rcurl:当 url 确实存在时,url.exists 返回 false.它说这是因为默认的用户代理没有用.但我不明白用户代理是什么以及如何使用它.

I found there is an answer to this here Rcurl: url.exists returns false when url does exists. It said this is because of the default user agent is not useful. But I do not understand what user agent is and how to use it.

另外,这个错误发生在我在我公司工作的时候.我在家里尝试了相同的代码,并且找到了.所以我猜这是因为代理.或者还有其他一些我没有意识到的原因.

Also, this error happened when I worked in my company. I tried the same code at home, and it worked find. So I am guessing this is because of proxy. Or there is some other reasons that I did not realize.

我需要使用 RCurl 从 Google 搜索我的查询,然后从网站中提取标题和描述等信息.在这种情况下,如何使用用户代理?或者,包 httr 可以做到这一点吗?

I need to use RCurl to search my queries from Google, and then extract the information such as title and descriptions from the website. In this case, how to use user agent? Or, does the package httr can do this?

推荐答案

伙计们.非常感谢您的帮助.我想我只是想出了怎么做.重要的是代理.如果我使用:

guys. Thanks a lot for help. I think I just figured out how to do it. The important thing is proxy. If I use:

> opts <- list(
     proxy         = "http://*******",
     proxyusername = "*****", 
     proxypassword = "*****", 
     proxyport     = 8080
)
> url.exists("http://www.google.com",.opts = opts)
[1] TRUE

那么一切都完成了!如果你用的是win 10,你可以在System-->proxy下找到你的代理.同时:

Then all done! You can find your proxy under System-->proxy if you use win 10. At the same time:

 > site <- getForm("http://www.google.com.au", hl="en",
                 lr="", q="r-project", btnG="Search",.opts = opts)
 > htmlTreeParse(site)
 $file
 [1] "<buffer>"
 .........

在 getForm 中,也需要放入 opts.这里有两张海报(RCurl默认代理设置R 的代理设置) 回答相同的问题.我还没有尝试过如何从这里提取信息.

In getForm, opts needs to be put in as well. There are two posters here (RCurl default proxy settings and Proxy setting for R) answering the same question. I have not tried how to extract information from here.

这篇关于当使用 RCurl 确实存在 URL 时,为什么 url.exists 返回 FALSE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆