使用 BeautifulSoup 将表格抓取到数据框中 [英] Scrape tables into dataframe with BeautifulSoup

查看:29
本文介绍了使用 BeautifulSoup 将表格抓取到数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从硬币目录中抓取数据.

I'm trying to scrape the data from the coins catalog.

其中一页.我需要将 此数据 抓取到 Dataframe

There is one of the pages. I need to scrape this data into Dataframe

到目前为止我有这个代码:

So far I have this code:

import bs4 as bs
import urllib.request
import pandas as pd

source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/view/45518').read()
soup = bs.BeautifulSoup(source,'lxml')

table = soup.find('table', attrs={'class':'subs noBorders evenRows'})
table_rows = table.find_all('tr')

for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    print(row)                    # I need to save this data instead of printing it 

它产生以下输出:

[]
['', '', '1882', '', '108,000', 'UNC', '—']
[' ', '', '1883', '', '786,000', 'UNC', '~ $3.99']
[' ', " 



							$('subGraph55337').on('click', function(event) {
								Lightview.show({
									href : '/en/catalog/ajax/subgraph?id=55337',
									rel : 'ajax',
									options : {
										autosize : true,
										topclose : true,
										ajax : {
											evalScripts : true
										}
									} 
								});
								event.stop();
								return false;
							});
						", '1884', '', '4,604,000', 'UNC', '~ $2.08–$4.47']
[' ', '', '1885', '', '1,314,000', 'UNC', '~ $3.20']
['', '', '1886', '', '444,000', 'UNC', '—']
[' ', '', '1888', '', '413,000', 'UNC', '~ $2.88']
[' ', '', '1889', '', '568,000', 'UNC', '~ $2.56']
[' ', " 



							$('subGraph55342').on('click', function(event) {
								Lightview.show({
									href : '/en/catalog/ajax/subgraph?id=55342',
									rel : 'ajax',
									options : {
										autosize : true,
										topclose : true,
										ajax : {
											evalScripts : true
										}
									} 
								});
								event.stop();
								return false;
							});
						", '1890', '', '2,137,000', 'UNC', '~ $1.28–$4.79']
['', '', '1891', '', '605,000', 'UNC', '—']
[' ', '', '1892', '', '205,000', 'UNC', '~ $4.47']
[' ', '', '1893', '', '754,000', 'UNC', '~ $4.79']
[' ', '', '1894', '', '532,000', 'UNC', '~ $3.20']
[' ', '', '1895', '', '423,000', 'UNC', '~ $2.40']
['', '', '1896', '', '174,000', 'UNC', '—']

但是当我尝试将其保存到 Dataframe 并导出到 excel 时,它只包含最后一个值:

But when I'm trying to save it to Dataframe and export to excel it contains just the last value:

         0
0         
1         
2     1896
3         
4  174,000
5      UNC
6        —

推荐答案

试试这个

l = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    l.append(row)
pd.DataFrame(l, columns=["A", "B", ...])

这篇关于使用 BeautifulSoup 将表格抓取到数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆