numpy平均结构化数组 [英] Numpy Mean Structured Array

查看:73
本文介绍了numpy平均结构化数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个结构化的学生(字符串)和测验分数(整数)数组,其中每个条目都是特定学生在特定测验中获得的分数.每个学生自然都有这个数组中的多个条目.

Suppose that I have a structured array of students (strings) and test scores (ints), where each entry is the score that a specific student received on a specific test. Each student has multiple entries in this array, naturally.

import numpy
grades = numpy.array([('Mary', 96), ('John', 94), ('Mary', 88), ('Edgar', 89), ('John', 84)],
                     dtype=[('student', 'a50'), ('score', 'i')])

print grades
#[('Mary', 96) ('John', 94) ('Mary', 88) ('Edgar', 89) ('John', 84)]

如何轻松计算每个学生的平均分数?换句话说,我该如何在得分"维度中采用数组的均值?我想做

How do I easily compute the average score of each student? In other words, how do I take the mean of the array in the 'score' dimension? I'd like to do

grades.mean('score')

有个脾气暴躁的人

[('Mary', 92), ('John', 89), ('Edgar', 89)]

但Numpy抱怨

TypeError: an integer is required

是否有一种Numpy式的方法可以轻松地做到这一点?我认为这可能涉及使用具有不同dtype的结构化数组的视图.任何帮助,将不胜感激.谢谢.

Is there a Numpy-esque way to do this easily? I think it might involve taking a view of the structured array with a different dtype. Any help would be appreciated. Thanks.

>>> grades = numpy.zeros(5, dtype=[('student', 'a50'), ('score', 'i'), ('testid', 'i'])
>>> grades[0] = ('Mary', 96, 1)
>>> grades[1] = ('John', 94, 1)
>>> grades[2] = ('Mary', 88, 2)
>>> grades[3] = ('Edgar', 89, 1)
>>> grades[4] = ('John', 84, 2)
>>> np.mean(grades, 'testid')
TypeError: an integer is required

推荐答案

NumPy不能将行分组在一起并将聚合函数应用于这些组.您可以:

NumPy isn't designed to be able to group rows together and apply aggregate functions to those groups. You could:

  • 使用 itertools.groupby 并重建数组;
  • 使用 Pandas ,它基于NumPy,擅长分组;或
  • 为测试ID的数组添加另一个维度(因此,本例将是2x3数组,因为看起来好像有两个测试).
  • use itertools.groupby and reconstruct the array;
  • use Pandas, which is based on NumPy and is great at grouping; or
  • add another dimension to the array for the test id (so this case would be a 2x3 array, because it looks like there were two tests).

这是itertools解决方案,但是如您所见,它非常复杂且效率低下.我建议其他两种方法之一.

Here's the itertools solution, but as you can see it's quite complicated and inefficient. I'd recommend one of the other two methods.

np.array([(k, np.array(list(g), dtype=grades.dtype).view(np.recarray)['score'].mean())
          for k, g in groupby(np.sort(grades, order='student').view(np.recarray),
                              itemgetter('student'))], dtype=grades.dtype)

这篇关于numpy平均结构化数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆