从表格中抓取数据 [英] web scraping data from tables
问题描述
我想抓取本页的年度损益表、资产负债表和现金流量.https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei
并将其放入数据帧中.如您所见,您可以通过单击页面的不同部分来更改数据.有人可以告诉我如何刮年度损益表吗?这是我迄今为止所拥有的.我可以看到汤里的数据,但我不知道如何获得.
I want to scrape the annual income statement, balance sheet and cash flow of this page. https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei
and put it into a dataframe. As you can see you can change the data by clicking on different parts of the page. Can someone show me how to scrape the annual income statement? This what I have so far. I can see the data in the soup, but I don't know how to get to it.
from bs4 import BeautifulSoup
import requests
import pandas as pd
df =pd.DataFrame()
url = 'https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei'
headers = {'User-Agent': 'Mozilla/5.0 (Windows; Windows NT 6.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5'}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
推荐答案
为什么不直接使用 pandas 的 read_html()
函数?因此,您会得到一个数据框列表 (df
),每个表格对应一个,可以通过单击选项显示(其中包括年度损益表):
why don't you just use pandas's read_html()
function? As a result you get a list of data frames (df
), one for each table that can be displayed by clicking on the options (among them the annual income statement):
import pandas as pd
df = pd.read_html("https://www.google.com/finance?q=NYSE%3AIBM&fstype=ii&ei")
这篇关于从表格中抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!