设置“信息丰富的用户代理字符串"在获取网址中 [英] Setting "an informative User-Agent string" in getURL

查看：23 发布时间：2021/9/24 20:44:24 r wikipedia-api

本文介绍了设置“信息丰富的用户代理字符串"在获取网址中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图访问维基百科页面以获取页面列表，但出现以下错误:

I am trying to access a Wikipedia page so to get a list of pages, and get the following error:

library(RCurl)
u <- "http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4"
getURL(u)
[1] "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.\n"

我希望通过维基百科 api 到达那个页面，但是我不确定它会起作用.

I hope to get to that page through the Wikipedia api, but I am not sure it would work.

问题是读取其他页面没有问题，例如:

And the thing is that other pages are read without problem, for example:

u <- "http://en.wikipedia.org/wiki/Wikipedia:Talk"
getURL(u)

有什么建议吗?

旁注:一般来说，我宁愿不抓取 wiki 页面并通过 api，但我担心这些特定页面尚未通过 api 可用...

Side note: In general I would rather to not scrape wiki pages and go through the api, but I fear that this specific pages are not yet available through the api...

推荐答案

根据 RCurl 的文档，你可以通过添加一个 httpheader 参数来指定额外的头部:

According to the documentation of RCurl, you can specify additional header by adding a httpheader parameter:

getURL(u, httpheader = c('User-Agent' = "Informative string with your contact info"))

这篇关于设置“信息丰富的用户代理字符串"在获取网址中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

设置“信息丰富的用户代理字符串"在获取网址中 [英] Setting "an informative User-Agent string" in getURL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

设置“信息丰富的用户代理字符串"在获取网址中 [英] Setting &quot;an informative User-Agent string&quot; in getURL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

设置“信息丰富的用户代理字符串"在获取网址中 [英] Setting "an informative User-Agent string" in getURL

登录关闭