使用Pandas从CSV文件中读取元组 [英] Reading back tuples from a csv file with pandas

查看:324
本文介绍了使用Pandas从CSV文件中读取元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用大熊猫,我已经将一个数据帧导出到一个csv文件,该数据帧的单元格包含字符串元组.生成的文件具有以下结构:

Using pandas, I have exported to a csv file a dataframe whose cells contain tuples of strings. The resulting file has the following structure:

index,colA
1,"('a','b')"
2,"('c','d')"

现在,我想使用read_csv读回它.但是,无论我尝试什么,pandas都会将值解释为字符串而不是元组.例如:

Now I want to read it back using read_csv. However whatever I try, pandas interprets the values as strings rather than tuples. For instance:

In []: import pandas as pd
       df = pd.read_csv('test',index_col='index',dtype={'colA':tuple})
       df.loc[1,'colA']
Out[]: "('a','b')"

有没有办法告诉熊猫做正确的事?最好在不对数据帧进行大量后期处理的情况下:实际表具有5000行和2500列.

Is there a way of telling pandas to do the right thing? Preferably without heavy post-processing of the dataframe: the actual table has 5000 rows and 2500 columns.

推荐答案

在列中存储元组通常不是一个好主意.失去了使用Series和DataFrames的许多优点.也就是说,您可以使用converters对字符串进行后处理:

Storing tuples in a column isn't usually a good idea; a lot of the advantages of using Series and DataFrames are lost. That said, you could use converters to post-process the string:

>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval})
>>> df
   index    colA
0      1  (a, b)
1      2  (c, d)

[2 rows x 2 columns]
>>> df.colA.iloc[0]
('a', 'b')
>>> type(df.colA.iloc[0])
<type 'tuple'>

但是我可能会在源头进行更改,以避免首先存储元组.

But I'd probably change things at source to avoid storing tuples in the first place.

这篇关于使用Pandas从CSV文件中读取元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆