无法使用请求从网页中抓取某个字段的值 [英] Can't scrape the value of a certain field from a webpage using requests

查看:30
本文介绍了无法使用请求从网页中抓取某个字段的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用请求模块从网页中抓取 Balance 的值.我已经在开发工具和页面源代码中寻找名称 Balance 但没有找到.我希望应该有任何方法可以在不使用任何浏览器模拟器的情况下从该网页中获取 Balance 的值.

我尝试过:

导入请求从 bs4 导入 BeautifulSoup链接 = 'https://tronscan.org/?fbclid=IwAR2WiSKZoTDPWX1ufaAIEg9vaA5oLj9Yd_RUfpjE6MWEQKRGBaK-L_JdtwQ#/contract/TCSPn1Lbdv62QfSCczbLdwupNoCFYAfUVL'headers = {User-Agent":Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"}res = requests.get(link,headers=headers)汤 = BeautifulSoup(res.text,'lxml')平衡 = 汤.select_one("li:has(> p:contains('Balance'))").get_text(strip=True)打印(余额)

解决方案

页面的 HTML 没有平衡的原因是页面正在发出 AJAX 请求,这些请求在页面加载后发送回您想要的信息.您可以通过在 Chrome 中按 F12 加载开发者窗口来查看这些请求(在其他浏览器中可能会有所不同),转到网络"选项卡,您将看到:

在这里您可以看到您想要的请求是 account?address= 后跟页面 URL 字符串中的代码,将鼠标悬停在显示 AJAX 请求的完整 URL 上,以珊瑚色突出显示,包含您想要的数据的响应部分在右侧以绿松石突出显示.

您可以通过此处查看回复并找到令牌余额.

为了在 Python 中获得平衡,您可以运行以下命令:

导入请求,jsonurl = 'https://apilist.tronscan.org/api/account?address=TCSPn1Lbdv62QfSCczbLdwupNoCFYAfUVL'headers = {User-Agent":Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"}response = requests.get(url, headers=headers)响应 = json.loads(response.text)balance = response['tokenBalances'][0]['balance']打印(余额)

I'm trying to scrape the value of Balance from a webpage using requests module. I've looked for the name Balance in dev tools and in page source but found nowhere. I hope there should be any way to grab the value of Balance from that webpage without using any browser simulator.

website address

Output I'm after:

I've tried with:

import requests
from bs4 import BeautifulSoup

link = 'https://tronscan.org/?fbclid=IwAR2WiSKZoTDPWX1ufaAIEg9vaA5oLj9Yd_RUfpjE6MWEQKRGBaK-L_JdtwQ#/contract/TCSPn1Lbdv62QfSCczbLdwupNoCFYAfUVL'

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"}

res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,'lxml')
balance = soup.select_one("li:has(> p:contains('Balance'))").get_text(strip=True)
print(balance)

解决方案

The reason the page's HTML doesn't have the balance is because the page is making AJAX requests which are sending back the information you want after the page is loaded. You can look at these requests by loading up your developer window by pressing F12 in Chrome (it might be different in other browsers), go to the Network tab and you'll see this:

Here you can see the request that you want is account?address= followed by the code that is in the URL string for the page, and mousing over that shows the complete URL for the AJAX request, highlighted in coral, and the part of the response which holds the data you want is on the right highlighted in turquoise.

You can look at response by going here and find tokenBalances.

In order to get the balance in Python you can run the following:

import requests, json

url = 'https://apilist.tronscan.org/api/account?address=TCSPn1Lbdv62QfSCczbLdwupNoCFYAfUVL'
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"}

response = requests.get(url, headers=headers)
response = json.loads(response.text)

balance = response['tokenBalances'][0]['balance']

print(balance)

这篇关于无法使用请求从网页中抓取某个字段的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆