如何构建Etherscan网络大楼？ [英] How to build Etherscan webscraper?

查看：42 发布时间：2022/2/25 10:33:19 python-3.x web-scraping beautifulsoup web-crawler etherscan

本文介绍了如何构建Etherscan网络大楼？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在构建一个网络爬行器，它每隔30秒不断刷新一批以太扫描URL，如果发生了任何未考虑在内的新传输，它会向我发送电子邮件通知和指向以太扫描上相关地址的链接，以便我可以手动检查它们。

我想要跟踪的地址之一在这里：

https://etherscan.io/token/0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd?a=0xd071f6e384cf271282fc37eb40456332307bb8af

我到目前为止所做的工作：

from urllib.request import Request, urlopen
url = 'https://etherscan.io/token/0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd?a=0x94f52b6520804eced0accad7ccb93c73523af089'
req = Request(url, headers={'User-Agent': 'XYZ/3.0'})   # I got this line from another post since "uClient = uReq(URL)" and "page_html = uClient.read()" would not work (I beleive that etherscan is attemption to block webscraping or something?)
response = urlopen(req, timeout=20).read()
response_close = urlopen(req, timeout=20).close()
page_soup = soup(response, "html.parser")
Transfers_info_table_1 = page_soup.find("div", {"class": "table-responsive"})
print(Transfers_info_table_1)

有趣的是，当我运行此命令时，我得到以下输出：

<div class="table-responsive" style="visibility:hidden;">
<iframe frameborder="0" id="tokentxnsiframe" scrolling="no" src="" style="width: 100px; height: 600px; min-width: 100%;"></iframe>
</div>

我希望获得整个转移表的输出。我在这里做错了什么？

推荐答案

，因为表存在于iframe中。复制iFrame的src值，然后使用请求获取该URL的内容。

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import pandas as pd

url = 'https://etherscan.io/token/generic-tokentxns2?m=normal&contractAddress=0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd&a=0xd071f6e384cf271282fc37eb40456332307bb8af'
req = Request(url, headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'})   # I got this line from another post since "uClient = uReq(URL)" and "page_html = uClient.read()" would not work (I beleive that etherscan is attemption to block webscraping or something?)
response = urlopen(req, timeout=20).read()
response_close = urlopen(req, timeout=20).close()
page_soup = soup(response, "html.parser")
Transfers_info_table_1 = page_soup.find("table", {"class": "table table-md-text-normal table-hover mb-4"})
df=pd.read_html(str(Transfers_info_table_1))[0]
df.to_csv("TransferTable.csv",index=False)

已生成CSV。

这篇关于如何构建Etherscan网络大楼？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何构建Etherscan网络大楼？ [英] How to build Etherscan webscraper?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何构建Etherscan网络大楼？ [英] How to build Etherscan webscraper?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭