python pandas:整数列表,作为DataFrame的各个值 [英] python pandas: list of integers as individual values of DataFrame

查看:388
本文介绍了python pandas:整数列表,作为DataFrame的各个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题: 如何"pd.read_csv"以使给定列中的值属于类型列表(列的每一行中的列表 )?

Question: How to 'pd.read_csv' so that the values in a given column are of type list (a list in each row of a column)?

在创建DataFrame时(根据字典,请参见下文),单个值的类型为list.问题:将DataFrame写入文件并从文件读回DataFrame之后,我得到的是字符串而不是列表.

When creating a DataFrame (from a dict, see below), individual values are of type list. The problem: After writing the DataFrame to a file and reading from the file back to a DataFrame, I get a string instead of a list.

import pandas as pd
dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"], 
           "neg": [[58], [1332, 753, 716, 782], [187]], 
           "pos": [[96], [659, 661, 705, 1228], [1414]]}
df = pd.DataFrame(dict2df)

值是一个列表

type(df.loc[0, 'neg']) == list # --> True
type(df.loc[0, 'neg']) == str # --> False
df.loc[1, 'neg'][-1] == 782 # --> True

写入文件

df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)

从文件读取

df = pd.read_csv('DataFrame.txt', sep='\t')

值是不是列表的字符串

type(df.loc[0, 'neg']) == list # --> False
type(df.loc[0, 'neg']) == str # --> True
df.loc[1, 'neg'][-1] == 782 # --> False

当然,可以在两种数据类型之间进行转换,但是它的计算量很大并且需要额外的工作(见下文)

Of course, it's possible to convert between the two data types, but it's computationally expensive and needs extra work (see below)

def convert_StringList2ListOfInt(string2convert):
    return [int(ele) for ele in string2convert[1:-1].split(',')]

def DataFrame_StringOfInts2ListOfInts(df, cols2convert_list):
    for column in cols2convert_list:
        column_temp = column + "_temp"
        df[column_temp] = df[column].apply(convert_StringList2ListOfInt, 1)
        df[column] = df[column_temp]
        df = df.drop(column_temp, axis=1)
    return df
df = DataFrame_StringOfInts2ListOfInts(df, ['neg', 'pos'])

什么是更好的(更具pythonic的)解决方案?在列表中对Integer进行迭代而不必来回转换将非常方便. 谢谢您的支持!

What would be a better (more pythonic) solution? It would be very convenient to iterate over the Integers in the list without having to convert them back and forth. Thank you for your support!!

推荐答案

您可以使用

You can use ast.literal_eval() to convert the strings to lists.

ast.literal_eval()-

>>> import ast
>>> l = ast.literal_eval('[10,20,30]')
>>> type(l)
<class 'list'>

对于您的情况,可以将其传递给Series.apply,以便(安全地)评估系列中的每个元素.示例-

For your case, you can pass it to Series.apply , so that each element in the series is evaluated (safely). Example -

df = pd.read_csv('DataFrame.txt', sep='\t')
import ast
df['neg_list'] = df['neg'].apply(ast.literal_eval)
df = df.drop('neg',axis=1)
df['pos_list'] = df['pos'].apply(ast.literal_eval)
df = df.drop('pos',axis=1)

演示-

In [15]: import pandas as pd

In [16]: dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
   ....:            "neg": [[58], [1332, 753, 716, 782], [187]],
   ....:            "pos": [[96], [659, 661, 705, 1228], [1414]]}

In [17]: df = pd.DataFrame(dict2df)

In [18]: df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)

In [19]: newdf = pd.read_csv('DataFrame.txt', sep='\t')

In [20]: newdf['neg']
Out[20]:
0                     [58]
1    [1332, 753, 716, 782]
2                    [187]
Name: neg, dtype: object

In [21]: newdf['neg'][0]
Out[21]: '[58]'

In [22]: import ast

In [23]: newdf['neg_list'] = newdf['neg'].apply(ast.literal_eval)

In [24]: newdf = newdf.drop('neg',axis=1)

In [25]: newdf['pos_list'] = newdf['pos'].apply(ast.literal_eval)

In [26]: newdf = newdf.drop('pos',axis=1)

In [27]: newdf
Out[27]:
         euNOG               neg_list               pos_list
0  ENOG410IF52                   [58]                   [96]
1      KOG2956  [1332, 753, 716, 782]  [659, 661, 705, 1228]
2      KOG1997                  [187]                 [1414]

In [28]: newdf['neg_list'][0]
Out[28]: [58]

这篇关于python pandas:整数列表,作为DataFrame的各个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆