从表格中抓取数据 [英] web scraping data from tables

查看:35
本文介绍了从表格中抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取本页的年度损益表、资产负债表和现金流量.https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei 并将其放入数据帧中.如您所见,您可以通过单击页面的不同部分来更改数据.有人可以告诉我如何刮年度损益表吗?这是我迄今为止所拥有的.我可以看到汤里的数据,但我不知道如何获得.

I want to scrape the annual income statement, balance sheet and cash flow of this page. https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei and put it into a dataframe. As you can see you can change the data by clicking on different parts of the page. Can someone show me how to scrape the annual income statement? This what I have so far. I can see the data in the soup, but I don't know how to get to it.

from bs4 import BeautifulSoup
import requests
import pandas as pd

df =pd.DataFrame()
url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'
headers = {'User-Agent': 'Mozilla/5.0 (Windows; Windows NT 6.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5'}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')

推荐答案

为什么不直接使用 pandas 的 read_html() 函数?因此,您会得到一个数据框列表 (df),每个表格对应一个,可以通过单击选项显示(其中包括年度损益表):

why don't you just use pandas's read_html() function? As a result you get a list of data frames (df), one for each table that can be displayed by clicking on the options (among them the annual income statement):

import pandas as pd
df = pd.read_html("https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei")

这篇关于从表格中抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆