如何获取所有列的统计信息,包括数据框、列表或数组中具有嵌套数值结构的列? [英] How can I get the statistics of all columns including those with a nested structure of numerical values in a dataframe, list or array?

查看:57
本文介绍了如何获取所有列的统计信息,包括数据框、列表或数组中具有嵌套数值结构的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

获取数据框(或列表或数组)中任何列的简单描述性统计信息的最佳方法是什么,是否嵌套,一种高级 df.describe() 还包括带有数值的嵌套结构.

What is the best method to get the simple descriptive statistics of any column in a dataframe (or list or array), be it nested or not, a sort of advanced df.describe() that also includes nested structures with numerical values.

就我而言,我有一个包含许多列的数据框.有些列的每一行都有一个数字列表(在我的例子中是一个时间序列),这是嵌套结构.它是一个数据帧并不重要,其他结构也包含在问题中,因为它们之间的变化很快.

In my case, I have a dataframe with many columns. Some columns have a numerical list in each row (in my case a time series), which is nested structure. It is not important that it is a dataframe, other structures are also included in the question, as changing between them is fast.

我的意思是嵌套结构如

  • 数组列表,
  • 数组数组,
  • 一系列列表,
  • 在某些列中带有嵌套数值列表的数据框(我的情况)

您需要获取其中的简单描述性统计数据.

of which you need to get simple descriptive statistics.

要求

df.describe() 

只会给我数值列的统计数据,而不是包含数值列表的列的统计数据.仅通过应用无法获得统计数据

will give me just the statistics of the numerical columns, but not those of the columns that include a list with numerical values. I cannot get the statistics just by applying

from scipy import stats
stats.describe(arr)

无论是 如何获取 NumPy 数组的描述性统计信息? 对于非嵌套数组.

either as it is the solution in How can I get descriptive statistics of a NumPy array? for a non-nested array.

推荐答案

我的第一种方法是先获取每个数字列表的统计信息,然后再次获取该统计信息,例如均值的均值或方差的均值也会给我一些信息.在我这里的第一种方法中,我首先将具有嵌套数值列表的特定列转换为一系列嵌套列表.嵌套数组或列表可能需要稍作调整,未经测试.

My first approach would be to get the statistics of each numerical list first, and then take the statistics of that again, e.g. the mean of the mean or the mean of the variance would then give me some information as well. In my first approach here, I convert a specific column that has a nested list of numerical values to a series of nested lists first. Nested arrays or lists might need a small adjustment, not tested.

NESTEDSTRUCTURE = df['nestedColumn']

NESTEDSTRUCTURE = df['nestedColumn']

[stats.describe([a[x] for a in [stats.describe(x) for x in NESTEDSTRUCTURE]]) for x in range(6)]

为您提供嵌套结构列的统计信息.如果你想要一个列的所有手段的平均值,你可以使用

gives you the stats of the stats for a nested structure column. If you want the mean of all means of a column, you can use

stats.describe([a[2] for a in [stats.describe(x) for x in NESTEDSTRUCTURE]])

因为位置 2 代表

DescribeResult(nobs=, minmax=(, ), 均值=, 方差=, 偏度=,峰度=)

DescribeResult(nobs=, minmax=(, ), mean=, variance=, skewness=, kurtosis=)

我希望有一种更好的描述性统计方法,它也应该自动理解带有数值的嵌套结构,这只是一种解决方法.

I expect that there is a better descriptive statistics approach that should also automatically understand nested structures with numerical values, this is just a workaround.

这篇关于如何获取所有列的统计信息,包括数据框、列表或数组中具有嵌套数值结构的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆