python pandas:整数列表,作为DataFrame的各个值 [英] python pandas: list of integers as individual values of DataFrame
问题描述
问题: 如何"pd.read_csv"以使给定列中的值属于类型列表(列的每一行中的列表 )?
Question: How to 'pd.read_csv' so that the values in a given column are of type list (a list in each row of a column)?
在创建DataFrame时(根据字典,请参见下文),单个值的类型为list.问题:将DataFrame写入文件并从文件读回DataFrame之后,我得到的是字符串而不是列表.
When creating a DataFrame (from a dict, see below), individual values are of type list. The problem: After writing the DataFrame to a file and reading from the file back to a DataFrame, I get a string instead of a list.
import pandas as pd
dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
"neg": [[58], [1332, 753, 716, 782], [187]],
"pos": [[96], [659, 661, 705, 1228], [1414]]}
df = pd.DataFrame(dict2df)
值是一个列表
type(df.loc[0, 'neg']) == list # --> True
type(df.loc[0, 'neg']) == str # --> False
df.loc[1, 'neg'][-1] == 782 # --> True
写入文件
df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)
从文件读取
df = pd.read_csv('DataFrame.txt', sep='\t')
值是不是列表的字符串
type(df.loc[0, 'neg']) == list # --> False
type(df.loc[0, 'neg']) == str # --> True
df.loc[1, 'neg'][-1] == 782 # --> False
当然,可以在两种数据类型之间进行转换,但是它的计算量很大并且需要额外的工作(见下文)
Of course, it's possible to convert between the two data types, but it's computationally expensive and needs extra work (see below)
def convert_StringList2ListOfInt(string2convert):
return [int(ele) for ele in string2convert[1:-1].split(',')]
def DataFrame_StringOfInts2ListOfInts(df, cols2convert_list):
for column in cols2convert_list:
column_temp = column + "_temp"
df[column_temp] = df[column].apply(convert_StringList2ListOfInt, 1)
df[column] = df[column_temp]
df = df.drop(column_temp, axis=1)
return df
df = DataFrame_StringOfInts2ListOfInts(df, ['neg', 'pos'])
什么是更好的(更具pythonic的)解决方案?在列表中对Integer进行迭代而不必来回转换将非常方便. 谢谢您的支持!
What would be a better (more pythonic) solution? It would be very convenient to iterate over the Integers in the list without having to convert them back and forth. Thank you for your support!!
推荐答案
You can use ast.literal_eval()
to convert the strings to lists.
ast.literal_eval()
-
>>> import ast
>>> l = ast.literal_eval('[10,20,30]')
>>> type(l)
<class 'list'>
对于您的情况,可以将其传递给Series.apply
,以便(安全地)评估系列中的每个元素.示例-
For your case, you can pass it to Series.apply
, so that each element in the series is evaluated (safely). Example -
df = pd.read_csv('DataFrame.txt', sep='\t')
import ast
df['neg_list'] = df['neg'].apply(ast.literal_eval)
df = df.drop('neg',axis=1)
df['pos_list'] = df['pos'].apply(ast.literal_eval)
df = df.drop('pos',axis=1)
演示-
In [15]: import pandas as pd
In [16]: dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
....: "neg": [[58], [1332, 753, 716, 782], [187]],
....: "pos": [[96], [659, 661, 705, 1228], [1414]]}
In [17]: df = pd.DataFrame(dict2df)
In [18]: df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)
In [19]: newdf = pd.read_csv('DataFrame.txt', sep='\t')
In [20]: newdf['neg']
Out[20]:
0 [58]
1 [1332, 753, 716, 782]
2 [187]
Name: neg, dtype: object
In [21]: newdf['neg'][0]
Out[21]: '[58]'
In [22]: import ast
In [23]: newdf['neg_list'] = newdf['neg'].apply(ast.literal_eval)
In [24]: newdf = newdf.drop('neg',axis=1)
In [25]: newdf['pos_list'] = newdf['pos'].apply(ast.literal_eval)
In [26]: newdf = newdf.drop('pos',axis=1)
In [27]: newdf
Out[27]:
euNOG neg_list pos_list
0 ENOG410IF52 [58] [96]
1 KOG2956 [1332, 753, 716, 782] [659, 661, 705, 1228]
2 KOG1997 [187] [1414]
In [28]: newdf['neg_list'][0]
Out[28]: [58]
这篇关于python pandas:整数列表,作为DataFrame的各个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!