如何“网络抓取"一个使用python的包含弹出窗口的网站? [英] How to "webscrape" a site containing a popup window, using python?

查看:108
本文介绍了如何“网络抓取"一个使用python的包含弹出窗口的网站?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于没有针对此功能的api,我正在尝试使用python在Web上刮除etherscan站点的特定部分.基本上要转到此链接,然后需要验证,然后弹出一个弹出窗口.您可以在此处中看到.我需要抓的是这部分 0x0882477e7895bdc5cea7cb1552ed914ab157fe56 ,以防消息以图片中的消息开头.

I am trying to web scrape a certain part of the etherscan site with python, since there is no api for this functionality. Basically going to this link and one would need to press verify, after doing so a popup comes up which you can see here. What I need to scrape is this part 0x0882477e7895bdc5cea7cb1552ed914ab157fe56 in case the message starts with the message as seen in the picture.

我已经编写了下面的python脚本,以开始此操作,但是我不知道如何与该站点进行进一步的交互,以使该弹出窗口成为前台并抓取信息.这可能吗?

I've written the below python script that starts this off, but I don't know how it's possible to interact further with the site, in order to have that popup come to the foreground and scrape the information. Is this possible to do?

from bs4 import BeautifulSoup
from requests import get

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0','X-Requested-With': 'XMLHttpRequest',}
url = "https://etherscan.io/proxyContractChecker?a=0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"
response = get(url,headers=headers )
soup = BeautifulSoup(response.content,'html.parser')

谢谢

推荐答案

import requests
from bs4 import BeautifulSoup


def Main(url):
    with requests.Session() as req:
        r = req.get(url, headers={'User-Agent': 'Ahmed American :)'})
        soup = BeautifulSoup(r.content, 'html.parser')
        vs = soup.find("input", id="__VIEWSTATE").get("value")
        vsg = soup.find("input", id="__VIEWSTATEGENERATOR").get("value")
        ev = soup.find("input", id="__EVENTVALIDATION").get("value")
        data = {
            '__VIEWSTATE': vs,
            '__VIEWSTATEGENERATOR': vsg,
            '__EVENTVALIDATION': ev,
            'ctl00$ContentPlaceHolder1$txtContractAddress': '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48',
            'ctl00$ContentPlaceHolder1$btnSubmit': "Verify"
        }
        r = req.post(
            "https://etherscan.io/proxyContractChecker?a=0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48", data=data, headers={'User-Agent': 'Ahmed American :)'})
        soup = BeautifulSoup(r.content, 'html.parser')
        token = soup.find(
            "div", class_="alert alert-success").text.split(" ")[-1]
        print(token)


Main("https://etherscan.io/proxyContractChecker")

输出:

0x0882477e7895bdc5cea7cb1552ed914ab157fe56

这篇关于如何“网络抓取"一个使用python的包含弹出窗口的网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆