如何正确设置cookie以使用httr获取URL内容 [英] How to properly set cookies to get URL content using httr
问题描述
我需要从使用Cookie保护的网站下载信息。我手动传递此保护,然后插入cookie到 httr
。
这里是类似的主题,但它不能解决我的问题:(复制httr的cookie )
library(httr)
url < http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ
cook< - _ SMIDA = 9117a9eb136353bd6956651bd59acd37; __utmt = 1; __utma = 29983421.1729484844。 1413489369.1413625619.1413627797.3; __utmb = 29983421.7.10.1413627797; __utmc = 29983421; __utmz = 29983421.1413489369.1.1.utmcsr =(direct)| utmccn =(direct)| utmcmd =(none)
response< GET(url,config(cookie = cook))
content(x = response,as ='text',encoding =UTF-8)
所以当我使用内容时,它返回我的信息,我没有登录(因为我没有cookie)
我如何解决这个问题?
测试凭证是登录:
mytest2
,pass:qwerty12
)解决方案这将是
set_cookies
c $ c> GET &httr
:GET(http://smida.gov。 ua / db / emitent / year / xml / showform / 32153/125 / templ,
set_cookies(`_SMIDA` =7cf9ea4bfadb60bbd0950e2f8f4c279d,
`__utma` =29983421.138599299.1413649536.1413649536.1413649536.1,
`__utmb` =29983421.5.10.1413649536,
`__utmc`=29983421,
`__utmt` =1,
`__utmz`=29983421.1413649536.1.1。 utmcsr =(direct)| utmccn =(direct)| utmcmd =(none)))
为我工作,至少我认为它是因为我不能读的语言。
不幸的是,登录时的验证码阻止了使用Rselenium(或其他类似的爬行包),因此,您必须继续手动抓取这些Cookie(或使用实用程序从会话中提取它们)。
最后,您真的想认真考虑更改这些凭证,now: - )
I need to download information from web site that is protected using cookies. I pass this protection manually and then insert cookies to
httr
.Here is similar topic, but it does not solve my problem: (Copying cookie for httr)
library(httr) url<-"http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ" cook<-"_SMIDA=9117a9eb136353bd6956651bd59acd37; __utmt=1; __utma=29983421.1729484844.1413489369.1413625619.1413627797.3; __utmb=29983421.7.10.1413627797; __utmc=29983421; __utmz=29983421.1413489369.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)" response <- GET(url, config(cookie= cook)) content(x = response,as = 'text', encoding = "UTF-8")
So when I use content it return me information, that I am not logged in( as I do without cookie)
How can I solve this problem?
Test credentials are login:
mytest2
, pass:qwerty12
)解决方案This would be the way to
set_cookies
withGET
&httr
:GET("http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ", set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d", `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1", `__utmb` = "29983421.5.10.1413649536", `__utmc` = "29983421", `__utmt` = "1", `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"))
That worked for me, well at least I think it did as I cannot read the language. A table comes back with the same structure and no prompt to login.
Unfortunately the captcha on login prevents the use of Rselenium (or other, similar, crawling packages), so you'll have to continue to manually grab those cookies (or use a utility to extract them from the session).
Finally, you prbly want to seriously consider changing those credentials, now :-)
这篇关于如何正确设置cookie以使用httr获取URL内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!