我如何在数据集中使用 pandas 找到中位数? [英] How I do find median using pandas on a dataset?

查看:69
本文介绍了我如何在数据集中使用 pandas 找到中位数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包含3列的数据框数据-日期,细分和指标.我正在执行以下操作:

I have dataframe data which has 3 columns - Date, segment and metric. I am doing the following:

data = pandas.read_csv("Filename.csv")
ave = data.groupby('Segment').mean() #works
ave = data.groupby('Segment').median() #gives error
ave['median'] = data.groupby('Segment').median()

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 1453, in __setitem__
    self._set_item(key, value)
  File "/usr/lib/pymodules/python2.7/pandas/core/frame.py", line 1488, in _set_item
    NDFrame._set_item(self, key, value)
  File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 301, in _set_item
    self._data.set(key, value)
  File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 616, in set
    assert(value.shape[1:] == self.shape[1:])
AssertionError

推荐答案

您遇到什么错误?

ave = data.groupby('Segment').median()

我认为应该可以,也许是因为您的数据中有某种原因导致错误,例如nan,我只是在猜测.您可以尝试应用自己的中值函数来查看是否可以解决导致错误的原因,例如:

I think that should work, maybe there's something in your data causing the error, like nan's, im just guessing. You could try applying your own median function to see if you can work around the cause of the error, something like:

def mymed(group):
    return np.median(group.dropna())

ave = data.groupby('segment')['Metric'].apply(mymed)

如果您可以提供一些复制错误的示例数据,则会更容易.

It would be easier if you could provide some sample data which replicates the error.

这是另一种方法,您可以将中位数重新添加到原始数据框中,指标列的中位数变为:

Here is a different approach, you can add the median back to your original dataframe, the median for the metric column becomes:

data['metric_median'] = data.groupby('Segment')['Metric'].transform('median')

将组的中位数附加到每个数据点是否有用取决于您以后要执行的操作.

Wether its useful to have the median of the group attached to each datapoint depends a bit what you want to do afterwards.

这篇关于我如何在数据集中使用 pandas 找到中位数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆