PYTHON:如何使用BeautifulSoup将表格解析为 pandas 数据框 [英] PYTHON: How do I use BeautifulSoup to parse a table into a pandas dataframe

查看:45
本文介绍了PYTHON:如何使用BeautifulSoup将表格解析为 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从CDC网站上抓取最近7天报告的COVID-19病例数据.


试图使这个答案更笼统和有用.

您如何识别这是如何解析数据?

首先,您需要检查页面(Ctrl + Shift + I)并导航到网络标签:


第二,您需要刷新页面以记录网络活动.

去哪里找?

检查 XHR 以限制记录数(1);

通过单击记录(2)来浏览记录,并查看其预览响应(3)找出是否是您需要的数据.


它并不总是有效,但是当它起作用时,直接从API解析数据要比通过request/bs4/selenium等编写抓取工具容易得多,应该是首选.

I am trying to scrape the CDC website for the data of the last 7 days reported cases for COVID-19. https://covid.cdc.gov/covid-data-tracker/#cases_casesinlast7days I've tried to find the table, by name, id, class, and it always returns as none type. When I print the data scraped, I cant manually locate the table in the html either. Not sure what I'm doing wrong here. Once the data is imported, I need to populate a pandas dataframe to later use for graphing purposes, and export the data table as a csv.

解决方案

You might as well request data from the API directly (check out Network tab in your browser while refreshing the page):

import requests
import pandas as pd


endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()
df = pd.DataFrame(data["US_MAP_DATA"])



EDIT: Trying to make this answer more general and useful.

How did you discern that this was how to parse the data?

Firstly, you need to inspect the page (Ctrl + Shift + I) and navigate to network tab:


Secondly, you need to refresh the page to record network activity.

Where to look?

Check XHR to limit number of records (1);

Look through the records by clicking on them (2) and check their preview responses (3) to find out if it's the data you need.


It doesn't always work but when it does, parsing data from API directly is so much easier than writing scrapers via requests / bs4 / selenium etc and should be the first choice.

这篇关于PYTHON:如何使用BeautifulSoup将表格解析为 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆