如何使用列表作为 pandas 数据框中的值? [英] how to use lists as values in pandas dataframe?

查看:84
本文介绍了如何使用列表作为 pandas 数据框中的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,该数据框要求列的子集具有多个值的条目.下面是带有运行时"列的数据框,其中包含各种条件下程序的运行时:

I have a dataframe that requires a subset of the columns to have entries with multiple values. below is a dataframe with a "runtimes" column that has the runtimes of a program in various conditions:

df = [{"condition": "a", "runtimes": [1,1.5,2]}, {"condition": "b", "runtimes": [0.5,0.75,1]}]
df = pandas.DataFrame(df)

这将形成一个数据框:

  condition        runtimes
0         a     [1, 1.5, 2]
1         b  [0.5, 0.75, 1]

如何使用此数据框并让熊猫将其值视为数字列表?例如,计算各行中运行时"列的平均值?

how can I work with this dataframe and get pandas to treat its values as a numeric list? for example calculate the mean for "runtimes" column across the rows?

df["runtimes"].mean()

给出错误:"Could not convert [1, 1.5, 2, 0.5, 0.75, 1] to numeric"

使用此数据帧并将它们序列化为csv文件很有用,其中的列表如:[1, 1.5, 2]被转换为"1,1.5,2",因此它仍然是csv文件中的单个条目.

it'd be useful to work with this dataframes and also to serialize them as csv files where a list like: [1, 1.5, 2] gets converted into "1,1.5,2" so that it's still a single entry in the csv file.

推荐答案

感觉就像您正在尝试使Pandas成为并非如此.如果您始终有3个运行时,则可以创建3列.但是,更多的Pandas-esqe方法是将您的数据(无论您有多少次不同的试验)标准化为如下所示:

It feels like you're trying to make Pandas be something it is not. If you always have 3 runtimes, you could make 3 columns. However the more Pandas-esqe approach is to normalize your data (no matter how many different trials you have) to something like this:

df = [{"condition": "a", "trial": 1, "runtime": 1},
      {"condition": "a", "trial": 2, "runtime": 1.5},
      {"condition": "a", "trial": 3, "runtime": 2},
      {"condition": "b", "trial": 1, "runtime": .5},
      {"condition": "b", "trial": 2, "runtime": .75},
      {"condition": "b", "trial": 3, "runtime": 1}]
df = pd.DataFrame(df)

那么您就可以

print df.groupby('condition').mean()


           runtime  trial
condition                
a             1.50      2
b             0.75      2

此处的概念是使数据保持表格格式,并且每个单元格仅保留一个值.如果要执行嵌套列表功能,则应使用列表,而不是Pandas数据框.

The concept here is to keep the data tabular and only one value per cell. If you want to do nested list functions then you should be using lists, and not Pandas dataframes.

这篇关于如何使用列表作为 pandas 数据框中的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆