处理 Excel 数据时选择 pandas 而不是 xlsxwriter [英] Choosing pandas over xlsxwriter when working with Excel data
问题描述
既然 Pandas 使用了 xlsxwriter 模块,那么直接使用 xlsxwriter 就可以了,为什么还要使用 Pandas?
Since Pandas uses the xlsxwriter module, why bother using Pandas when one can just use xlsxwriter directly?
也许要回答的更直接的问题是,在处理 Excel 数据时,为什么要考虑用 Pandas 替换 xlsxwriter?
Maybe a more direct question to answer is, why should one consider replacing xlsxwriter with Pandas when working with excel data?
我对这个问题的目标是帮助人们决定在处理 Excel 数据时是使用 xlsxwriter 还是 Pandas.
My goal with this question is to help one decide whether to use xlsxwriter or Pandas when working with Excel data.
推荐答案
一句话:方便.在处理数据时,从/向 Excel 电子表格读取和写入是一项非常的任务.例如,以下是如何从 xlsxwriter
教程:
One word: convenience. Reading and writing from/to Excel spreadsheet is a very common task when dealing with data. As an example, here's how to create a dead-simple Excel file from xlsxwriter
tutorial:
import xlsxwriter
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Expenses01.xlsx')
worksheet = workbook.add_worksheet()
# Some data we want to write to the worksheet.
expenses = (
['Rent', 1000],
['Gas', 100],
['Food', 300],
['Gym', 50],
)
# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for item, cost in (expenses):
worksheet.write(row, col, item)
worksheet.write(row, col + 1, cost)
row += 1
# Write a total using a formula.
worksheet.write(row, 0, 'Total')
worksheet.write(row, 1, '=SUM(B1:B4)')
workbook.close()
将其与熊猫进行比较:
import pandas as pd
df = pd.DataFrame({
'Amount': [1000, 100, 300, 50]
}, index=['Rent', 'Gas', 'Food', 'Gym'])
df.loc['Total', 'Amount'] = df['Amount'].sum()
df.to_excel('Expenses01.xlsx', index=False)
当然,它们并不完全相等.xlsxwriter
为总和创建了一个公式,但是您必须编写的样板代码数量非常庞大.df.to_excel
是一个将数据帧转储到 Excel 的简单命令.您几乎无法控制生成的文件,但根据您的要求,您甚至可能不需要它.
They are not exactly equal of course. xlsxwriter
creates a formula for the sum, but the amount of boilerplatte code you have to write is montrous. df.to_excel
is a simple command that dumps the dataframe to Excel. You have little control over the resultant file but depending on your requirements, you may not even need that.
它们是为两个完全不同的目的而设计的两个库.pandas 提供了与 xlsxwriter
的集成并不意味着您应该始终选择一个.需要方便时使用 df.to_excel
,需要精细控制时使用 xlsxwriter
.
They are two libraries designed for 2 totally different purposes. pandas provide an integration with xlsxwriter
doesn't mean that you should pick one over the other all the times. Use df.to_excel
when you need convenience and xlsxwriter
when you want fine control.
这篇关于处理 Excel 数据时选择 pandas 而不是 xlsxwriter的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!