使用请求登录网站 [英] Logging into websites using request

查看:29
本文介绍了使用请求登录网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前的问题(

最后,如果您担心 Selenium 可能会缓慢"(毕竟,它与用户在打开浏览器并单击内容时所做的事情相同),那么您可以尝试类似的操作CasperJS,尽管使用它来实现某些东西的学习曲线比 Selenium 更陡峭——您可能想先尝试使用 Selenium.

My previous Question (logging in to website using requests) generated some awesome answers and with that I was able to scrape a lot of sites. But the site I'm working on now is tricky. I don't know if it's a website bug or something done intentionally, but i cannot scrape it.

heres a part of my code.

import requests
import re
from lxml import html
from multiprocessing.dummy import Pool as ThreadPool
from fake_useragent import UserAgent
import time
import ctypes

global FileName

now = time.strftime('%d.%m.%Y_%H%M%S_')
FileName=str(now + "Scraped data.txt")
fileW = open(FileName, "w")
url = open('URL.txt', 'r').read().splitlines()
fileW.write("URL    Name    SKU Dimensions  Availability    MSRP    NetPrice")
fileW.write(chr(10))
count=0
no_of_pools=14
r = requests.session()

payload = {
    "email":"I cant give them out in public",
    "password":"maybe I can share it privately if anyone can help me with it :)",
    "redirect":"true"
    }
rs = r.get("https://checkout.reginaandrew.com/store/checkout.ssp?fragment=login&is=login&lang=en_US&login=T#login-register")
rs = r.post("https://checkout.reginaandrew.com/store/checkout.ssp?fragment=login&is=login&lang=en_US&login=T#login-register",data=payload,headers={'Referer':"https://checkout.reginaandrew.com/store/my_account.ssp"})
rs = r.get("https://checkout.reginaandrew.com/store/my_account.ssp")
tree = html.fromstring(rs.content)
print(str(tree.xpath("//*[@id='site-header']/div[3]/nav/div[2]/div/div/a/@href")))

The problem is that even when i manually log in and open a product URL, by entering it in the address bar, the browser doesn't recognize that it's logged in.

The only way around that is clicking a link in the page you are redirected after logging in. Only then does the browser recognize it has logged in and i can open specific URLs and see all the information.

What obstacle I ran into is that the link changes. The print statement in the code

print(str(tree.xpath("//*[@id='site-header']/div[3]/nav/div[2]/div/div/a/@href")))

This should've extracted the link but it returns nothing.

any ideas?

EDIT (stripping out white space) rs.content is:

<!DOCTYPE html><html lang="en-US"><head><meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <link rel="shortcut icon" href="https://checkout.reginaandrew.com/c.1283670/store/img/favicon.ico" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
    <title></title>
    <!--[if !IE]><!-->
    <link rel="stylesheet" href="https://checkout.reginaandrew.com/c.1283670/store/css/checkout.css?t=1484321730904">
    <!--<![endif]-->
    <!--[if lte IE 9]>
    <link rel="stylesheet" href="https://checkout.reginaandrew.com/c.1283670/store/css_ie/checkout_2.css?t=1484321730904">
    <link rel="stylesheet" href="https://checkout.reginaandrew.com/c.1283670/store/css_ie/checkout_1.css?t=1484321730904">
    <link rel="stylesheet" href="https://checkout.reginaandrew.com/c.1283670/store/css_ie/checkout.css?t=1484321730904">
    <![endif]-->
    <!--[if lt IE 9]>
    <script src="/c.1283670/store/javascript/html5shiv.min.js"></script>
    <script src="/c.1283670/store/javascript/respond.min.js"></script>
    <![endif]-->
    <script>var SC=window.SC={ENVIRONMENT:{jsEnvironment:typeof nsglobal==='undefined'?'browser':'server'},isCrossOrigin:function(){return 'checkout.reginaandrew.com'!==document.location.hostname},isPageGenerator:function(){return typeof nsglobal!=='undefined'},getSessionInfo:function(key){var session=SC.SESSION||SC.DEFAULT_SESSION||{};return key?session[key]:session},getPublishedObject:function(key){return SC.ENVIRONMENT&&SC.ENVIRONMENT.published&&SC.ENVIRONMENT.published[key]?SC.ENVIRONMENT.published[key]:null}};function loadScript(data){'use strict';var element;if(data.url){element='<script src="'+data.url+'"></'+'script>'}else{element='<script>'+data.code+'</'+'script>'}if(data.seo_remove){document.write(element)}else{document.write('</div>'+element+'<div class="seo-remove">')}}
</script>
</head>
  <body>
    <noscript>
      <div class="checkout-layout-no-javascript-msg">
        <strong>Javascript is disabled on your browser.</strong><br>
        To view this site, you must enable JavaScript or upgrade to a JavaScript-capable browser.
      </div>
    </noscript>
    <div id="main" class="main"></div>
    <script>loadScript({url: '/c.1283670/store/checkout.environment.ssp?lang=en_US&cur=USD&t=' + (new Date().getTime())});
    </script>
    <script>if (!~window.location.hash.indexOf('login-register') && !~window.location.hash.indexOf('forgot-password') && 'login-register'){window.location.hash = 'login-register';}
    </script>
    <script src="/c.1283670/store/javascript/checkout.js?t=1484321730904">  </script>
    <script src="/cms/2/assets/js/postframe.js"></script>
    <script src="/cms/2/cms.js"></script>
    <script>SCM['SC.Checkout'].Configuration.currentTouchpoint = 'login';</script>
</body>
</html>

解决方案

This is going to be quite tricky and you might want to use a more sophisticated tool like Selenium to be able to emulate a browser.

Otherwise, you will need to investigate what cookies or other type of authentication is required for you to log in to the site. Note all the cookies that are being passed behind the scenes -- it's not quite as simple as just entering in the username/password to be able to log in here. You can see what information is being passed by viewing the Network tab in your web browser.

Finally, if you are worried that Selenium might be 'sluggish' (it is -- after all, it is doing the same thing a user would be doing when opening a browser and clicking things), then you can try something like CasperJS, though the learning curve to implement something with this is quite steeper than Selenium -- you might want to try with Selenium first.

这篇关于使用请求登录网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆