通过numpy.mean分组 [英] Group by with numpy.mean

查看:87
本文介绍了通过numpy.mean分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何计算下面每个工人的均值?以下是我的示例NumPy ndarray.
第0列是工作人员编号,第1列是纬度,第2列是经度.我想计算每个workerid的平均纬度和经度.我想使用NumPy(ndarray)保留所有这些,而不转换为Pandas.

How do I calculate the mean for each of the below workerid's? Below is my sample NumPy ndarray. Column 0 is the workerid, column 1 is the latitude, and column 2 is the longitude.
I want to calculate the mean latitude and longitude for each workerid. I want to keep this all using NumPy (ndarray), without converting to Pandas.

import numpy
from scipy.spatial.distance import cdist, euclidean
import itertools
from itertools import groupby

class WorkerPatientScores:

    '''
    I read from the Patient and Worker tables in SchedulingOptimization.
    '''
    def __init__(self, dist_weight=1):
        self.a = []

        self.a = ([[25302, 32.133598100000000, -94.395845200000000],
                   [25302, 32.145095132560200, -94.358041585705600],
                   [25302, 32.160400000000000, -94.330700000000000],
                   [25305, 32.133598100000000, -94.395845200000000],
                   [25305, 32.115095132560200, -94.358041585705600],
                   [25305, 32.110400000000000, -94.330700000000000],
                   [25326, 32.123598100000000, -94.395845200000000],
                   [25326, 32.125095132560200, -94.358041585705600],
                   [25326, 32.120400000000000, -94.330700000000000],
                   [25341, 32.173598100000000, -94.395845200000000],
                   [25341, 32.175095132560200, -94.358041585705600],
                   [25341, 32.170400000000000, -94.330700000000000],
                   [25376, 32.153598100000000, -94.395845200000000],
                   [25376, 32.155095132560200, -94.358041585705600],
                   [25376, 32.150400000000000, -94.330700000000000]])

        ndarray = numpy.array(self.a)
        ndlist = ndarray.tolist()
        geo_tuple = [(p[1], p[2]) for p in ndlist]
        nd1 = numpy.array(geo_tuple)
        mean_tuple = numpy.mean(nd1, 0)
        print(mean_tuple)

上面的输出是:

[32.14303108 -94.36152893]

[ 32.14303108 -94.36152893]

推荐答案

您可以使用一些创造性的数组切片和 where 函数来解决此问题.

You can use some creative array slicing and the where function to solve this problem.

means = {}
for i in numpy.unique(a[:,0]):
    tmp = a[numpy.where(a[:,0] == i)]
    means[i] = (numpy.mean(tmp[:,1]), numpy.mean(tmp[:,2]))

切片 [:, 0] 是从2d数组中提取列(在本例中为第一个列)的便捷方法.为了获得均值,我们从第一列中找到唯一的ID,然后针对每个ID,使用 where 提取适当的行,然后合并.最终结果是元组的字典,其中键是ID,值是包含其他两列平均值的元组.当我运行它时,它会产生以下命令:

The slice [:,0] is a handy way to extract a column (in this case the first) from a 2d array. To get the means, we find the unique IDs from the first column, then for each of those, we extract the appropriate rows with where, and combine. The end result is a dict of tuples, where the keys are the IDs and the values are a tuple containing the mean value of the other two columns. When I run it, it produces the following dict:

{25302.0: (32.1463644108534, -94.36152892856853),
 25305.0: (32.11969774418673, -94.36152892856853),
 25326.0: (32.12303107752007, -94.36152892856853),
 25341.0: (32.17303107752007, -94.36152892856853),
 25376.0: (32.15303107752007, -94.36152892856853)}

这篇关于通过numpy.mean分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆