Python - 网页抓取 HTML 表格并打印到 CSV [英] Python - Web Scraping HTML table and printing to CSV

查看：30 发布时间：2021/12/17 14:01:17 python html csv web-scraping beautifulsoup

本文介绍了Python - 网页抓取 HTML 表格并打印到 CSV的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我几乎是 Python 的新手，但我正在寻找构建一个网页抓取工具，该工具可以在线抓取 HTML 表格中的数据并将其打印为相同格式的 CSV.

这是一个 HTML 表格示例(它非常庞大，所以我将只提供几行).


<div id="history-data" class="tab-pane active"><div class="tab-header"><h2 class="pull-left bottom-margin-2x">比特币的历史数据</h2><div class="clear"></div><div class="row"><div class="col-md-12"><div class="pull-left"><small>美元货币</small>
<div id="reportrange" class="pull-right"><i class="glyphicon glyphicon-calendar fa fa-calendar"></i>&nbsp;<span>2017 年 8 月 16 日 - 2017 年 9 月 15 日</span><b class="caret"></b>

from bs4 import BeautifulSoup进口请求将熊猫导入为 pd导入 csvurl = "输入URLhere"页面 = requests.get(url)pagetext = page.text价格表 = {日期" : []，打开" : []，高的" : []，低的" : []，关闭" : []，体积" : []，市值":[]}汤 = BeautifulSoup(pagetext, 'html.parser')file = open("test.csv", 'w')对于soup.find_all('tr') 中的行:对于 row.find_all('td') 中的 col:打印(col.text)

导入csv从 bs4 导入 BeautifulSoupoutfile = open("table_data.csv","w",newline='')writer = csv.writer(输出文件)树 = BeautifulSoup(html,"lxml")table_tag = tree.select("table")[0]tab_data = [[item.text for item in row_data.select("th,td")]对于 table_tag.select("tr")] 中的 row_data对于 tab_data 中的数据:writer.writerow(数据)打印(' '.join(数据))

from bs4 import BeautifulSoup汤 = BeautifulSoup(html,"lxml")table = 汤.find('table')list_of_rows = []对于 table.findAll('tr') 中的行:list_of_cells = []对于 row.findAll(["th","td"]) 中的单元格:文本 = 单元格.文本list_of_cells.append(文本)list_of_rows.append(list_of_cells)对于 list_of_rows 中的项目:打印(' '.join(item))

Date Open High Low Close Volume Market Cap2017 年 9 月 14 日 3875.37 3920.60 3153.86 3154.95 2,716,310,000 64,191,600,0002017 年 9 月 13 日 4131.98 3789.92 3882.59 2,219,410,000 68,432,200,0002017 年 9 月 12 日 4168.88 4344.65 4085.22 4130.81 1,864,530,000 69,033,400,000

<div class="col-xs-12 tab-content"> <div id="historical-data" class="tab-pane active"> <div class="tab-header"> <h2 class="pull-left bottom-margin-2x">Historical data for Bitcoin</h2> <div class="clear"></div> <div class="row"> <div class="col-md-12"> <div class="pull-left"> <small>Currency in USD</small> </div> <div id="reportrange" class="pull-right"> <i class="glyphicon glyphicon-calendar fa fa-calendar"></i>  <span>Aug 16, 2017 - Sep 15, 2017</span> <b class="caret"></b> </div> </div> </div> <table class="table"> <thead> <tr> <th class="text-left">Date</th> <th class="text-right">Open</th> <th class="text-right">High</th> <th class="text-right">Low</th> <th class="text-right">Close</th> <th class="text-right">Volume</th> <th class="text-right">Market Cap</th> </tr> </thead> <tbody> <tr class="text-right"> <td class="text-left">Sep 14, 2017</td> <td>3875.37</td> <td>3920.60</td> <td>3153.86</td> <td>3154.95</td> <td>2,716,310,000</td> <td>64,191,600,000</td> </tr> <tr class="text-right"> <td class="text-left">Sep 13, 2017</td> <td>4131.98</td> <td>4131.98</td> <td>3789.92</td> <td>3882.59</td> <td>2,219,410,000</td> <td>68,432,200,000</td> </tr> <tr class="text-right"> <td class="text-left">Sep 12, 2017</td> <td>4168.88</td> <td>4344.65</td> <td>4085.22</td> <td>4130.81</td> <td>1,864,530,000</td> <td>69,033,400,000</td> </tr> </tbody> </table> </div> </div> </div>

from bs4 import BeautifulSoup import requests import pandas as pd import csv url = "enterURLhere" page = requests.get(url) pagetext = page.text pricetable = { "Date" : [], "Open" : [], "High" : [], "Low" : [], "Close" : [], "Volume" : [], "Market Cap" : [] } soup = BeautifulSoup(pagetext, 'html.parser') file = open("test.csv", 'w') for row in soup.find_all('tr'): for col in row.find_all('td'): print(col.text)

import csv from bs4 import BeautifulSoup outfile = open("table_data.csv","w",newline='') writer = csv.writer(outfile) tree = BeautifulSoup(html,"lxml") table_tag = tree.select("table")[0] tab_data = [[item.text for item in row_data.select("th,td")] for row_data in table_tag.select("tr")] for data in tab_data: writer.writerow(data) print(' '.join(data))

from bs4 import BeautifulSoup soup = BeautifulSoup(html,"lxml") table = soup.find('table') list_of_rows = [] for row in table.findAll('tr'): list_of_cells = [] for cell in row.findAll(["th","td"]): text = cell.text list_of_cells.append(text) list_of_rows.append(list_of_cells) for item in list_of_rows: print(' '.join(item))

Date Open High Low Close Volume Market Cap Sep 14, 2017 3875.37 3920.60 3153.86 3154.95 2,716,310,000 64,191,600,000 Sep 13, 2017 4131.98 3789.92 3882.59 2,219,410,000 68,432,200,000 Sep 12, 2017 4168.88 4344.65 4085.22 4130.81 1,864,530,000 69,033,400,000

Python - 网页抓取 HTML 表格并打印到 CSV [英] Python - Web Scraping HTML table and printing to CSV

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python - 网页抓取 HTML 表格并打印到 CSV [英] Python - Web Scraping HTML table and printing to CSV

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭