无法解析用户名以确保我已登录网站 [英] Can't parse the username to make sure I'm logged in to a website

查看:65
本文介绍了无法解析用户名以确保我已登录网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用 python 编写了一个脚本来登录网站并解析用户名以确保我真的能够登录.使用我在下面尝试过的方式似乎可以让我到达那里.但是,我在脚本中使用了从 chrome 开发工具中获取的硬编码 cookie 来获得成功.

I've written a script in python to log in to a website and parse the username to make sure I've really been able to log in. Using the way I've tried below seems to get me there. However, I've used hardcoded cookies taken from chrome dev tools within the script to get success.

我尝试过:

import requests
from bs4 import BeautifulSoup

url = 'https://secure.imdb.com/ap/signin?openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_pro_us&openid.mode=checkid_setup&siteState=eyJvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl9wcm9fdXMiLCJyZWRpcmVjdFRvIjoiaHR0cHM6Ly9wcm8uaW1kYi5jb20vIn0&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0'
signin = 'https://secure.imdb.com/ap/signin'
mainurl = 'https://pro.imdb.com/'

with requests.Session() as s:
    res = s.get(url,headers={"User-agent":"Mozilla/5.0"})
    soup = BeautifulSoup(res.text,"lxml")
    payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
    payload['email'] = 'some username'
    payload['password'] = 'some password'

    s.post(signin,data=payload,headers={
        "User-agent":"Mozilla/5.0",
        "Cookie": 'adblk=adblk_yes; ubid-main=130-2884709-6520735; _msuuid_518k2z41603=95C56F3B-E3C1-40E5-A47B-C4F7BAF2FF5D; _fbp=fb.1.1574621403438.97041399; pa=BCYm5GYAag-hj1CWg3cPXjfv2X6NGPUp6kLguepMku7Yf0W9-iSTjgmVNGmQLwUfJ5XJPHqlh84f%0D%0Agrd2voq0Q7TR_rdXU4T1BJw-1a-DdvCNSVuWSm50IXJDC_H4-wM_Qli_%0D%0A; uu=BCYnANeBBdnuTg3UKEVGDiO203C7KR0AQTdyE9Y_Y70vpd04N5QZ2bD3RwWdMBNMAJtdbRbPZMpG%0D%0AbPpC6vZvoMDzucwsE7pTQiKxY24Gr4_-0ONm7hGKPfPbMwvI1NYzy5ZhTIyIUqeVAQ7geCBiS5NS%0D%0A1A%0D%0A; session-id=137-0235974-9052660; session-id-time=2205351554; session-token=jsvzgJ4JY/TCgodelKegvXcqdLyAy4NTDO5/iEvk90VA8qWWEPJpiiRYAZe3V0EYVFlKq590mXU0OU9XMbAzwyKqXIzPLzKfLf3Cc3k0g/VQNTo6roAEa5IxmOGZjWrJuhkRZ1YgeF5uPZLcatWF1y5PFHqvjaDxQrf2LZbgRXF5N7vacTZ8maK0ciJmQEjh; csm-hit=tb:8HH0DWNBDVSWP881GYKG+s-8HH0DWNBDVSWP881GYKG|1574631571950&t:1574631571952&adb:adblk_yes'
        })

    r = s.get(mainurl,headers={
        "Cookie": 'adblk=adblk_yes; ubid-main=130-2884709-6520735; _msuuid_518k2z41603=95C56F3B-E3C1-40E5-A47B-C4F7BAF2FF5D; _fbp=fb.1.1574621403438.97041399; pa=BCYm5GYAag-hj1CWg3cPXjfv2X6NGPUp6kLguepMku7Yf0W9-iSTjgmVNGmQLwUfJ5XJPHqlh84f%0D%0Agrd2voq0Q7TR_rdXU4T1BJw-1a-DdvCNSVuWSm50IXJDC_H4-wM_Qli_%0D%0A; csm-hit=tb:KV47B1QVKP4DNB3QGY95+b-NM69W1Y35R7ARV0639V5|1574631544432&t:1574631544432&adb:adblk_yes; session-id=137-0235974-9052660; session-id-time=2205351554; session-token="EsIzROiSTmFDfXd5jnBPIBOpYG9jAu7tiWXDF8R52sUw5jS6OjddfOOQB+ytCmq0K3UnXs9wKBvQtkB4aVNsXieVbRcIUrKf3iPnYeJchbOlShMjg+MR+O7IQgPKkw0BKihdYQ1YIl7KQS8VeLxZjtzJ5sj5ocnY72fCKdwq/fGOjfieFYbe9Km3a8h++1GpC738JbwcVdpTG08v1pjhQKifqPQXnqhcyVKhi8CD1qk="; x-main="C1KbtQgFFBAYfwttdRSrU5CpCe@Fn6SPHnBTY6dO2ppimt@u1P1L7G0PueQMn6X3"; at-main=Atza|IwEBICfS3UKNp2mwmbyUPY1QzjXRHMcL6fjv2ND7BDXsZ1G-qDPJKsLJXeU9gJOvRpWsofSpOJCyhnap-bIOWCutU6VMIS9bn3UkNVRP8WFVqrs-CLB5opLbrEx6YxVGQlfaxx54gzuuGO4D30z-AgBpGe64_bn0K1iLOT3P3i7S3nBzvP_0AopwKlbU7SRnE5m21cVfVK7bwbtfZO4cf7DrpGcaHK4dlY5jKHPzNx_AR4ypqsEBFbHon36N1j8foty6wLJhFP1gNCvs24mVCec24TRho5ZXFDYqhLB-dw9V3XY1eq7q1QNgtAdYkDSJ6Mq1nllFu59WqIVs1Y3lLEaxDUExLtCt-VQArpS_hZtZR8C_kevhV01jEhWg8RUQaCdYTMwZHwa778MiEOrrrdGqFnR5; sess-at-main="tWwUfkZLx+mDAPqZo+J6yJlnjqBJvYJ0oVMS6/NcIKQ="; id=BCYhnxuM-3g3WFo4uvCv6C5LdGLJKaIcZj8E-rQwU_YsF991I3Tqe94W6IlU27FvaNcnuCyv5Te3%0D%0A0c3O1mMYhEE14wMdByo2SvGXkBS0A4oFMJMEIe0aC1X4fyNRwWYNZ72a6NDzAOqeDQi3_7sZZGH8%0D%0AxQ%0D%0A; uu=BCYsGSOaee6VbhMOMXpG3F_6i7cTIkPCN0S0_Jv7c3bVkUQ5gp9vqtfvVlOMOIOqXv-uHSTSibBp%0D%0ATO1e4tRpT1DolY2qkoOW8yICF7ZrXqAgont_ShTy8zVEg1wxWCxg3_XQX8r8_dGFCO4NWZiyLH-f%0D%0A2RpBF2IJLUSd8R4UCbbbtgo%0D%0A; sid=BCYp9inRAYR9sJgmF1FcA9Vgto81vmiCYHP_gEVv6r2ZdBtz1bKtOQg4_0iSwREudsZrPM8SHMUk%0D%0A5jFMp74veGrdwNTf8DONXPUCExLgkHzfeoZr-KHf4VbI7aI5TrJhqSioYbEhHYqm6q5RGrXfCVPr%0D%0AqA%0D%0A'
        })

    sauce = BeautifulSoup(r.text,"lxml")
    name = sauce.select_one("span.display-name").text
    print(name)

我已经尝试使用以下方法来查看它是否可以避免使用硬编码的 cookie,但不幸的是它失败了:

I've tried with the following to see if it works to avoid using hardcoded cookies but unfortunately it failed:

cookie_string = "; ".join([str(x)+"="+str(y) for x,y in s.cookies.items()])

这是我自动尝试的方式:

This is how I tried automatically:

cookie_string = "; ".join([str(x)+"="+str(y) for x,y in s.cookies.items()])
s.post(signin,data=payload,headers={
    "User-agent":"Mozilla/5.0",
    "Cookie": cookie_string
    })
cookie_string_ano = "; ".join([str(x)+"="+str(y) for x,y in s.cookies.items()])
r = s.get(mainurl,headers={
    "Cookie": cookie_string_ano
    })

当我尝试使用上面时,我可以看到 cookie_string,cookie_string_ano 正在生成 session-id=130-0171771-5726549;session-id-time=2205475101lsession-id=130-0171771-5726549;会话 ID 时间 = 2205475101l;ubid-main=135-8050026-6353151.

When I tried using above I can see that cookie_string,cookie_string_ano are producing session-id=130-0171771-5726549; session-id-time=2205475101l and session-id=130-0171771-5726549; session-id-time=2205475101l; ubid-main=135-8050026-6353151.

如何在脚本中不使用硬编码 cookie 的情况下获取用户名?

How can I fetch the username without using hardcoded cookies within the script?

推荐答案

要从 Chrome 开发工具 获取 cookies,需要交互em> 使用 Python 脚本中的 Chrome DevTools 协议与 Google Chrome.

To fetch cookies from Chrome dev tools, there is a need to interact with Google Chrome using Chrome DevTools Protocol within a Python script.

这是一个 python 插件,可以让您获得获取 cookie 的权限.这将帮助您克服与硬编码 cookie 相关的问题.参观参考:PyChromeDevTools.

Here is a python plugin that gives you the privilege to get cookies. This will help you to overcome the issue related to hard-coded cookies. Visit Reference : PyChromeDevTools.

请记住: IMDb 明确禁止屏幕抓取.访问参考 IMDb 使用条件,如此处所示;

Remember: Screen scraping is explicitly forbidden by the IMDb. Visit Reference IMDb Conditions of Use as given here that;

机器人和屏幕抓取:您不得使用数据挖掘、机器人、屏幕抓取或类似的数据收集和提取工具本网站,除非我们明确书面同意如下所述.

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express wrote consent as noted below.

<小时>

先决条件:

  • 为此,您首先必须在系统环境变量中设置 chrome path.

    此后,您必须使用远程调试选项运行 Google Chrome 实例 - 访问参考:使用 Chrome 开发者工具进行远程调试.

    After this, you must run an instance of Google Chrome with the remote-debugging option - visit-reference: Remote debugging with Chrome Developer Tools.

    command-promptterminal 中使用以下命令来运行给定的实例;

    Use the following command in command-prompt or terminal to run the instance as given;

    chrome.exe --remote-debugging-port=9222 --user-data-dir=remote-profile

    在运行Google 实例之后,您可以像下面的示例一样运行这个程序.

    After running Google instance then you can run this program like in the following example.

    import time
    import requests
    import PyChromeDevTools
    from bs4 import BeautifulSoup
    
    url = 'https://secure.imdb.com/ap/signin?openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_pro_us&openid.mode=checkid_setup&siteState=eyJvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl9wcm9fdXMiLCJyZWRpcmVjdFRvIjoiaHR0cHM6Ly9wcm8uaW1kYi5jb20vIn0&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0'
    signin = 'https://secure.imdb.com/ap/signin'
    mainurl = 'https://pro.imdb.com/'
    
    
    def parse_cookies(input_url):
        chrome = PyChromeDevTools.ChromeInterface()
        chrome.Network.enable()
        chrome.Page.enable()
        chrome.Page.navigate(url=input_url)
        time.sleep(2)
    
        cookies = chrome.Network.getCookies()
    
        return cookies["result"]["cookies"]
    
    
    def get_cookies(parsed_cookie_string):
        cookie_names = [sub_cookie['name'] for sub_cookie in parsed_cookie_string]
        cookie_values = [sub_cookie['value'] for sub_cookie in parsed_cookie_string]
    
        cookie_string = "; ".join([str(x) + "=" + str(y) for x, y in zip(cookie_names, cookie_values)])
    
        return cookie_string
    
    
    with requests.Session() as s:
        res = s.get(url, headers={"User-agent": "Mozilla/5.0"})
        soup = BeautifulSoup(res.text, "lxml")
        payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
        payload['email'] = 'some username'
        payload['password'] = 'some password'
    
        cookie_string_for_post = parse_cookies(signin)
        print("Cookies for Post Request:\n ", cookie_string_for_post)
    
        cookie_string_for_get = parse_cookies(mainurl)
        print("Cookies for Get Request:\n ", cookie_string_for_get)
    
        post_req_cookies = get_cookies(cookie_string_for_post)
        print("Post Cookie_String:\n ", post_req_cookies)
    
        get_req_cookies = get_cookies(cookie_string_for_get)
        print("Get Cookie_String:\n ", get_req_cookies)
    
        s.post(signin, data=payload, headers={
            "User-agent": "Mozilla/5.0",
            "Cookie": post_req_cookies
        })
    
        r = s.get(mainurl, headers={
            "Cookie": get_req_cookies
        })
    
        sauce = BeautifulSoup(r.text, "lxml")
        name = sauce.select_one("span.display-name").text
        print("User-Name:", name)
    

    在上面的脚本中,我维护了两个方法:

    In the above script, I have maintained two methods:

    • parse_cookies(input_url) #在登录前后解析IMDB中的Cookie
    • get_cookies(parsed_cookie_string) # 做切片 for { name=values;} 模式

    这是上面脚本的结果;

    Cookies for Post Request:
      [{'name': 'csm-hit', 'value': 'adb:adblk_no&t:1575551929829', 'domain': 'secure.imdb.com', 'path': '/', 'expires': 1636031929, 'size': 35, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-token', 'value': 'ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035367.931534, 'size': 205, 'httpOnly': True, 'secure': True, 'session': False}, {'name': '_msuuid_518k2z41603', 'value': '7EFA48D9-B808-4A94-AF25-DF946D700AE7', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087673, 'size': 55, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'uu', 'value': 'BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.37521, 'size': 174, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'ubid-main', 'value': '130-4270133-5864707', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035317.315112, 'size': 28, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'adblk', 'value': 'adblk_no', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087639, 'size': 13, 'httpOnly': False, 'secure': False, 'session': False}, {'name': '_fbp', 'value': 'fb.1.1575551679007.40322953', 'domain': '.imdb.com', 'path': '/', 'expires': 1583327724, 'size': 31, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-id', 'value': '130-3480383-2108806', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375339, 'size': 29, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'session-id-time', 'value': '2206271615', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375396, 'size': 25, 'httpOnly': False, 'secure': True, 'session': False}]
    Cookies for Get Request:
      [{'name': 'vuid', 'value': 'pl1203459194.1031556308', 'domain': '.vimeo.com', 'path': '/', 'expires': 1638623938, 'size': 27, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-token', 'value': 'ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035367.931534, 'size': 205, 'httpOnly': True, 'secure': True, 'session': False}, {'name': '_msuuid_518k2z41603', 'value': '7EFA48D9-B808-4A94-AF25-DF946D700AE7', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087673, 'size': 55, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'uu', 'value': 'BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.37521, 'size': 174, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'ubid-main', 'value': '130-4270133-5864707', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035317.315112, 'size': 28, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'adblk', 'value': 'adblk_no', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087639, 'size': 13, 'httpOnly': False, 'secure': False, 'session': False}, {'name': '_fbp', 'value': 'fb.1.1575551679007.40322953', 'domain': '.imdb.com', 'path': '/', 'expires': 1583327724, 'size': 31, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-id', 'value': '130-3480383-2108806', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375339, 'size': 29, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'session-id-time', 'value': '2206271615', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375396, 'size': 25, 'httpOnly': False, 'secure': True, 'session': False}]
    Post Cookie_String:
      csm-hit=adb:adblk_no&t:1575551929829; session-token=ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0; _msuuid_518k2z41603=7EFA48D9-B808-4A94-AF25-DF946D700AE7; uu=BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A; ubid-main=130-4270133-5864707; adblk=adblk_no; _fbp=fb.1.1575551679007.40322953; session-id=130-3480383-2108806; session-id-time=2206271615
    Get Cookie_String:
      vuid=pl1203459194.1031556308; session-token=ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0; _msuuid_518k2z41603=7EFA48D9-B808-4A94-AF25-DF946D700AE7; uu=BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A; ubid-main=130-4270133-5864707; adblk=adblk_no; _fbp=fb.1.1575551679007.40322953; session-id=130-3480383-2108806; session-id-time=2206271615
    User-Name: **Logged in user-name**
    

    这篇关于无法解析用户名以确保我已登录网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆