如何使用pandas to_csv()编写一个带多个标题行的csv文件? [英] How can I write a csv file with multiple header lines with pandas to_csv()?

查看:3851
本文介绍了如何使用pandas to_csv()编写一个带多个标题行的csv文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个以 date 列作为索引的数据框,以及三列 x y z 我想将此数据帧的内容写入.csv文件。我知道我可以使用 df.to_csv 为此,但是,我想添加一个第二个标题行与单位。在此示例中,所需的.csv文件将如下所示:

Consider a data frame with a date column as an index and three columns x, y and z with some observations. I want to write the contents of this data frame to a .csv file. I know I can use df.to_csv for this, however, I would like to add a second header line with the units. In this example, the desired .csv file would look something like this:

date,x,y,z  
(yyyy-mm-dd),(s),(m),(kg)  
2014-03-12,1,2,3  
2014-03-13,4,5,6  
...


推荐答案

在你的例子中的确切输出,但它接近。您可以使用多索引列将第二个标题(单位)与列标签一起存储:

This doesn't produce the exact output in your example, but it's close. You can use multi-index columns to store the second header (the units) with the column labels:

>>> import pandas as pd
>>> columns = pd.MultiIndex.from_tuples(
...     zip(['date', 'x', 'y', 'z'],
...         ['(yyyy-mm-dd)', '(s)', '(m)', '(kg)']))
>>> data = [['2014-03-12', 1, 2, 3],
...         ['2014-03-13', 4, 5, 6]]
>>> df = pd.DataFrame(data, columns=columns)
>>> df
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

以这种方式存储第二个标题允许您的列保持正确的类型 x 应为整数类型):

Storing the second header this way allows your columns to keep the correct type (e.g., column x should be an integer type):

>>> df.dtypes
date  (yyyy-mm-dd)    object
x     (s)              int64
y     (m)              int64
z     (kg)             int64
dtype: object

如果您已将第二个标题作为存储在 DataFrame ,你的列 dtypes 会变成 object ,你可能不想

If you had stored the second header as a row in the DataFrame, your column dtypes would become object, which you probably don't want.

以CSV格式书写 DataFrame 会产生与您的示例非常相似的内容:

Writing the DataFrame in CSV format produces something very similar to your example:

>>> df.to_csv('out.csv', index=False)
>>> !cat out.csv
date,x,y,z
(yyyy-mm-dd),(s),(m),(kg)
,,,
2014-03-12,1,2,3
2014-03-13,4,5,6

唯一的区别是额外的逗号行,这就是pandas如何将多行标题从实际的数据行中分离出来。这允许将CSV文件读回到等效的 DataFrame

The only difference is the extra line of commas, which is how pandas separates multi-row headers from the actual rows of data. This allows the CSV file to be read back into an equivalent DataFrame:

>>> df2 = pd.read_csv('out.csv', header=[0, 1])
>>> df2
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

注意:我发现很多这些信息散布在这是SO问题

Note: I found a lot of this information scattered throughout this SO question.

这篇关于如何使用pandas to_csv()编写一个带多个标题行的csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆