如何在ASCII文件中写入/读取具有MultiIndex的Pandas DataFrame? [英] How to write/read a Pandas DataFrame with MultiIndex from/to an ASCII file?
问题描述
我希望能够为行和列索引创建一个具有MultiIndexes的Pandas DataFrame
,并从ASCII文本文件中读取它.我的数据如下:
I want to be able to create a Pandas DataFrame
with MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:
col_indx = MultiIndex.from_tuples([('A', 'B', 'C'), ('A', 'B', 'C2'), ('A', 'B', 'C3'),
('A', 'B2', 'C'), ('A', 'B2', 'C2'), ('A', 'B2', 'C3'),
('A', 'B3', 'C'), ('A', 'B3', 'C2'), ('A', 'B3', 'C3'),
('A2', 'B', 'C'), ('A2', 'B', 'C2'), ('A2', 'B', 'C3'),
('A2', 'B2', 'C'), ('A2', 'B2', 'C2'), ('A2', 'B2', 'C3'),
('A2', 'B3', 'C'), ('A2', 'B3', 'C2'), ('A2', 'B3', 'C3')],
names=['one','two','three'])
row_indx = MultiIndex.from_tuples([(0, 'North', 'M'),
(1, 'East', 'F'),
(2, 'West', 'M'),
(3, 'South', 'M'),
(4, 'South', 'F'),
(5, 'West', 'F'),
(6, 'North', 'M'),
(7, 'North', 'M'),
(8, 'East', 'F'),
(9, 'South', 'M')],
names=['n', 'location', 'sex'])
size=len(row_indx), len(col_indx)
data = np.random.randint(0,10, size)
df = DataFrame(data, index=row_indx, columns=col_indx)
print df
我尝试了df.to_csv()
和read_csv()
,但是它们没有保留索引.
I've tried df.to_csv()
and read_csv()
but they don't keep the index.
我当时正在考虑使用额外的分隔符来创建一种新格式.例如,使用----------------
行标记列索引的末尾,并使用|
标记行索引的末尾.所以看起来像这样:
I was thinking of maybe creating a new format using extra delimeters. For example, using a row of ----------------
to mark the end of the column indexes and a |
to mark the end of a row index. So it would look like this:
one | A A A A A A A A A A2 A2 A2 A2 A2 A2 A2 A2 A2
two | B B B B2 B2 B2 B3 B3 B3 B B B B2 B2 B2 B3 B3 B3
three | C C2 C3 C C2 C3 C C2 C3 C C2 C3 C C2 C3 C C2 C3
--------------------------------------------------------------------------------------
n location sex :
0 North M | 2 3 9 1 0 6 5 9 5 9 4 4 0 9 6 2 6 1
1 East F | 6 2 9 2 7 0 0 3 7 4 8 1 3 2 1 7 7 5
2 West M | 5 8 9 7 6 0 3 0 2 5 0 3 9 6 7 3 4 9
3 South M | 6 2 3 6 4 0 4 0 1 9 3 6 2 1 0 6 9 3
4 South F | 9 6 0 0 6 1 7 0 8 1 7 6 2 0 8 1 5 3
5 West F | 7 9 7 8 2 0 4 3 8 9 0 3 4 9 2 5 1 7
6 North M | 3 3 5 7 9 4 2 6 3 2 7 5 5 5 6 4 2 9
7 North M | 7 4 8 6 8 4 5 7 9 0 2 9 1 9 7 9 5 6
8 East F | 1 6 5 3 6 4 6 9 6 9 2 4 2 9 8 4 2 4
9 South M | 9 6 6 1 3 1 3 5 7 4 8 6 7 7 8 9 2 3
熊猫是否可以通过MultiIndexes向ASCII文件写入/读取数据帧?
Does Pandas have a way to write/read DataFrames to/from ASCII files with MultiIndexes?
推荐答案
不确定使用的是哪个版本的熊猫,但是通过0.7.3
可以将DataFrame
导出到TSV文件并通过以下操作保留索引:
Not sure which version of pandas you are using but with 0.7.3
you can export your DataFrame
to a TSV file and retain the indices by doing this:
df.to_csv('mydf.tsv', sep='\t')
您需要导出为TSV vs CSV的原因是因为列标题中包含,
字符.这应该可以解决您问题的第一部分.
The reason you need to export to TSV versus CSV is since the column headers have ,
characters in them. This should solve the first part of your question.
第二部分变得有些棘手,因为据我所知,您需要事先了解要包含DataFrame的内容.特别是,您需要知道:
The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:
- TSV上的哪些列代表
MultiIndex
行
- ,其余的列也应转换为
MultiIndex
- Which columns on your TSV represent the row
MultiIndex
- and that the rest of the columns should also be converted to a
MultiIndex
为了说明这一点,让我们将上面保存的TSV文件读回到新的DataFrame
:
To illustrate this, lets read back the TSV file we saved above into a new DataFrame
:
In [1]: t_df = read_table('mydf.tsv', index_col=[0,1,2])
In [2]: all(t_df.index == df.index)
Out[2]: True
因此,我们设法将mydf.tsv
读取到与原始df
具有相同行索引的DataFrame
中.但是:
So we managed to read mydf.tsv
into a DataFrame
that has the same row index as the original df
. But:
In [3]: all(t_df.columns == df.columns)
Out[3]: False
这是因为熊猫(据我所知)无法将标头行正确解析为MultiIndex
.如上所述,如果您知道您的TSV文件头表示MultiIndex
,那么您可以执行以下操作来解决此问题:
And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a MultiIndex
. As I mentioned above, if you know beorehand that your TSV file header represents a MultiIndex
then you can do the following to fix this:
In [4]: from ast import literal_eval
In [5]: t_df.columns = MultiIndex.from_tuples(t_df.columns.map(literal_eval).tolist(),
names=['one','two','three'])
In [6]: all(t_df.columns == df.columns)
Out[6]: True
这篇关于如何在ASCII文件中写入/读取具有MultiIndex的Pandas DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!