numpy平均结构化数组 [英] Numpy Mean Structured Array
问题描述
假设我有一个结构化的学生(字符串)和测验分数(整数)数组,其中每个条目都是特定学生在特定测验中获得的分数.每个学生自然都有这个数组中的多个条目.
Suppose that I have a structured array of students (strings) and test scores (ints), where each entry is the score that a specific student received on a specific test. Each student has multiple entries in this array, naturally.
import numpy
grades = numpy.array([('Mary', 96), ('John', 94), ('Mary', 88), ('Edgar', 89), ('John', 84)],
dtype=[('student', 'a50'), ('score', 'i')])
print grades
#[('Mary', 96) ('John', 94) ('Mary', 88) ('Edgar', 89) ('John', 84)]
如何轻松计算每个学生的平均分数?换句话说,我该如何在得分"维度中采用数组的均值?我想做
How do I easily compute the average score of each student? In other words, how do I take the mean of the array in the 'score' dimension? I'd like to do
grades.mean('score')
有个脾气暴躁的人
[('Mary', 92), ('John', 89), ('Edgar', 89)]
但Numpy抱怨
TypeError: an integer is required
是否有一种Numpy式的方法可以轻松地做到这一点?我认为这可能涉及使用具有不同dtype的结构化数组的视图.任何帮助,将不胜感激.谢谢.
Is there a Numpy-esque way to do this easily? I think it might involve taking a view of the structured array with a different dtype. Any help would be appreciated. Thanks.
>>> grades = numpy.zeros(5, dtype=[('student', 'a50'), ('score', 'i'), ('testid', 'i'])
>>> grades[0] = ('Mary', 96, 1)
>>> grades[1] = ('John', 94, 1)
>>> grades[2] = ('Mary', 88, 2)
>>> grades[3] = ('Edgar', 89, 1)
>>> grades[4] = ('John', 84, 2)
>>> np.mean(grades, 'testid')
TypeError: an integer is required
推荐答案
NumPy不能将行分组在一起并将聚合函数应用于这些组.您可以:
NumPy isn't designed to be able to group rows together and apply aggregate functions to those groups. You could:
- 使用
itertools.groupby
并重建数组; - 使用 Pandas ,它基于NumPy,擅长分组;或
- 为测试ID的数组添加另一个维度(因此,本例将是2x3数组,因为看起来好像有两个测试).
- use
itertools.groupby
and reconstruct the array; - use Pandas, which is based on NumPy and is great at grouping; or
- add another dimension to the array for the test id (so this case would be a 2x3 array, because it looks like there were two tests).
这是itertools
解决方案,但是如您所见,它非常复杂且效率低下.我建议其他两种方法之一.
Here's the itertools
solution, but as you can see it's quite complicated and inefficient. I'd recommend one of the other two methods.
np.array([(k, np.array(list(g), dtype=grades.dtype).view(np.recarray)['score'].mean())
for k, g in groupby(np.sort(grades, order='student').view(np.recarray),
itemgetter('student'))], dtype=grades.dtype)
这篇关于numpy平均结构化数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!