保存到csv并重新打开后,为什么列表变成字符串了? Python [英] Why do my lists become strings after saving to csv and re-opening? Python

查看:686
本文介绍了保存到csv并重新打开后,为什么列表变成字符串了? Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中的每一行都包含一个句子,其后是使用spaCy创建的词性标签列表:

I have a Dataframe in which each row contains a sentence followed by a list of part-of-speech tags, created with spaCy:

df.head()

   question             POS_tags            
0  A title for my ...   [DT, NN, IN,...]  
1  If one of the ...    [IN, CD, IN,...]  

当我将DataFrame写入csv文件(encoding ='utf-8')并重新打开时,数据格式似乎已经更改,POS标记现在出现在引号''之间,如下所示:

When I write the DataFrame to a csv file (encoding='utf-8') and re-open it, it looks like the data format has changed with the POS tags now appearing between quotes ' ' like this:

df.head()

   question             POS_tags                    
0  A title for my ...   ['DT', 'NN', 'IN',...]  
1  If one of the ...    ['IN', 'CD', 'IN',...]  

当我现在尝试使用POS标签进行某些操作时,事实证明它们不再是列表,而是变成了甚至包含引号的字符串.它们仍然看起来像列表,但不是.这样做很明显:

When I now try to use the POS tags for some operations, it turns out they are no longer lists but have become strings that even include the quotation marks. They still look like lists but are not. This is clear when doing:

q = df['POS_tags']
q = list(q)
print(q)

这将导致:

["['DT', 'NN', 'IN']"]

这是怎么回事?

即使保存到csv并重新打开后,我还是希望列' POS_tags '包含列表.或者,我想对" POS_tags "列进行操作,以再次具有与SpaCy最初创建的列表相同的列表.有什么建议怎么做吗?

I either want the column 'POS_tags' to contain lists, even after saving to csv and re-opening. Or I want to do an operation on the column 'POS_tags' to have the same lists again that SpaCy originally created. Any advice how to do this?

推荐答案

要保留DataFrame的确切结构,一个简单的解决方案是使用pd.to_pickle而不是以csv的方式将DF以pickle格式序列化.将始终丢弃有关数据类型的所有信息,并且将需要在重新导入后进行手动重建.泡菜的一个缺点是它不是人类可读的.

To preserve the exact structure of the DataFrame, an easy solution is to serialize the DF in pickle format with pd.to_pickle, instead of using csv, which will always throw away all information about data types, and will require manual reconstruction after re-import. One drawback of pickle is that it's not human-readable.

# Save to pickle
df.to_pickle('pickle-file.pkl')
# Save with compression
df.to_pickle('pickle-file.pkl.gz', compression='gzip')

# Load pickle from disk
df = pd.read_pickle('pickle-file.pkl')   # or...
df = pd.read_pickle('pickle-file.pkl.gz', compression='gzip')


从CSV导入后修复列表

如果您已经从CSV导入,则应将POS_tags列从字符串转换为python列表:


Fixing lists after importing from CSV

If you've already imported from CSV, this should convert the POS_tags column from strings to python lists:

from ast import literal_eval
df['POS_tags'] = df['POS_tags'].apply(literal_eval)

这篇关于保存到csv并重新打开后,为什么列表变成字符串了? Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆