如何正确设置cookie以使用httr获取URL内容 [英] How to properly set cookies to get URL content using httr

查看:312
本文介绍了如何正确设置cookie以使用httr获取URL内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从使用Cookie保护的网站下载信息。我手动传递此保护,然后插入cookie到 httr



这里是类似的主题,但它不能解决我的问题:(复制httr的cookie

  library(httr)
url < http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ

cook< - _ SMIDA = 9117a9eb136353bd6956651bd59acd37; __utmt = 1; __utma = 29983421.1729484844。 1413489369.1413625619.1413627797.3; __utmb = 29983421.7.10.1413627797; __utmc = 29983421; __utmz = 29983421.1413489369.1.1.utmcsr =(direct)| utmccn =(direct)| utmcmd =(none)

response< GET(url,config(cookie = cook))

content(x = response,as ='text',encoding =UTF-8)



所以当我使用内容时,它返回我的信息,我没有登录(因为我没有cookie)



我如何解决这个问题?



测试凭证是登录: mytest2 ,pass: qwerty12

解决方案

这将是 set_cookies c $ c> GET & httr

  GET(http://smida.gov。 ua / db / emitent / year / xml / showform / 32153/125 / templ,
set_cookies(`_SMIDA` =7cf9ea4bfadb60bbd0950e2f8f4c279d,
`__utma` =29983421.138599299.1413649536.1413649536.1413649536.1,
`__utmb` =29983421.5.10.1413649536,
`__utmc`=29983421,
`__utmt` =1,
`__utmz`=29983421.1413649536.1.1。 utmcsr =(direct)| utmccn =(direct)| utmcmd =(none)))

为我工作,至少我认为它是因为我不能读的语言。



不幸的是,登录时的验证码阻止了使用Rselenium(或其他类似的爬行包),因此,您必须继续手动抓取这些Cookie(或使用实用程序从会话中提取它们)。



最后,您真的想认真考虑更改这些凭证,now: - )


I need to download information from web site that is protected using cookies. I pass this protection manually and then insert cookies to httr.

Here is similar topic, but it does not solve my problem: (Copying cookie for httr)

library(httr)
url<-"http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ"

cook<-"_SMIDA=9117a9eb136353bd6956651bd59acd37; __utmt=1; __utma=29983421.1729484844.1413489369.1413625619.1413627797.3; __utmb=29983421.7.10.1413627797; __utmc=29983421; __utmz=29983421.1413489369.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"

response <- GET(url, config(cookie= cook))

content(x = response,as = 'text', encoding = "UTF-8")   

So when I use content it return me information, that I am not logged in( as I do without cookie)

How can I solve this problem?

Test credentials are login: mytest2, pass: qwerty12)

解决方案

This would be the way to set_cookies with GET & httr:

GET("http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ", 
    set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
                `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
                `__utmb` = "29983421.5.10.1413649536",
                `__utmc` = "29983421",
                `__utmt` = "1",
                `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"))

That worked for me, well at least I think it did as I cannot read the language. A table comes back with the same structure and no prompt to login.

Unfortunately the captcha on login prevents the use of Rselenium (or other, similar, crawling packages), so you'll have to continue to manually grab those cookies (or use a utility to extract them from the session).

Finally, you prbly want to seriously consider changing those credentials, now :-)

这篇关于如何正确设置cookie以使用httr获取URL内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆