如何从这些跨度标签之间抓取数据? [英] How can I scrape the data from in between these span tags?

查看:26
本文介绍了如何从这些跨度标签之间抓取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取 https://www.usdebtclock 上显示的数字.org/world-debt-clock.html ,但是由于数字不断变化,我不知道如何收集这些数据.这是我正在尝试做的一个例子.

I am attempting to scrape the figures shown on https://www.usdebtclock.org/world-debt-clock.html , however due to the numbers constantly changing i am unaware of how to collect this data. This is an example of what i am attempting to do.

import requests
from bs4 import BeautifulSoup

url ="https://www.usdebtclock.org/world-debt-clock.html"
URL=requests.get(url)
site=BeautifulSoup(URL.text,"html.parser")
data=site.find_all("span",id="X4a79R9BW")

print(data)

结果如下:

[ ]"当我期待

$19,987,137,284,731"

"$19,987,137,284,731"

有什么我可以改变以提取数字的吗?

Is there something i can change in order to extract the number?

推荐答案

BeautifulSoup无法为你做这个,因为你需要的数据是JavaScript提供的,而BeautifulSoup不支持JS处理.

BeautifulSoup cannot do this for you, because the data you need is provided by JavaScript, and BeautifulSoup does not support JS processing.

另一种方法是使用诸如 Selenium WebDriver 之类的工具:

An alternative is to use a tool such as Selenium WebDriver:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://www.usdebtclock.org/world-debt-clock.html')
elem2 = driver.find_element_by_xpath('//span[@id="X4a79R9BW"]')
print(elem2.text)
driver.close()

如果您之前没有使用过 Selenium WebDriver,则需要按照此处的安装说明进行操作.

If you have not used Selenium WebDriver before, you need to follow the installation instructions here.

特别是,您需要按照说明下载您选择的浏览器驱动程序(我使用 geckodriver for Firefox).并确保可执行文件在您的路径上.

In particular, you will need to follow the instructions for downloading the browser driver of your choice (I use geckodriver for Firefox). And make sure the executable is on your path.

(我希望还有其他基于 Python 的替代方案.)

(I expect there are other Python-based alternatives, also.)

这篇关于如何从这些跨度标签之间抓取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆