通过在Python特点总结阵列 [英] Summing Arrays by Characteristics in Python

查看:108
本文介绍了通过在Python特点总结阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道什么是总结由下式给出的特点数组元素的最有效方式。比如我有1000平的数据,我就是我正在寻找的是跨性别每次抽签(列)的某一年,疾病的总和(即绘制按性别,年,疾病,我想两性每年和疾病),用于总和

 导入numpy的是NP
年= np.repeat((1980年,1990年,2000年,2010年),10)
性别= np.array(['男','女'] * 20)
疾病= np.repeat(('D1','D2','D3','D4','D5','D6','d7中','D8'),5)
绘制= np.random.normal(0,1,大小=(sex.shape [0],1000))

如何让有跨越男女双方战平的某一年,疾病的总和数组,将形状(20,1000)有什么想法?我还需要能够做到这一点的情况下,其中的数据是不完美的正方形(有疾病年只有1个性别)。


解决方案

 导入numpy的是NP
进口和itertools
导入CSV年= np.repeat((1980年,1990年,2000年,2010年),10)
性别= np.array(['男','女'] * 20)
疾病= np.repeat(('D1','D2','D3','D4','D5','D6','d7中','D8'),5)
绘制= np.random.normal(0,1,大小=(sex.shape [0],1000))年= np.unique(年)
疾病= np.unique(病)draw_sums =字典(((Y,D),绘制[(全年== Y)及(病== D)]和(轴= 0))
                  y的,D在itertools.product(年,疾病))

这样导致用抽签的相应和的字典每个(年,疾病)相关联。写 draw_sums CSV ,你可以做这样的事情:

 开放('的/ tmp / test.csv','W')为f:
    作家= csv.writer(F)
    writer.writerow(['年','日期'] + ['画{我}。格式(I = I)为我的range(1,1001)])
    对于yeardate,在绘制排序(draw_sums.items()):
        writer.writerow(列表(yeardate)+ draws.tolist())

I'm wondering what is the most efficient way to sum elements of an array by given characteristics. For example I have 1000 draws of data, and I what I'm looking for is the sum of each draw (column) across sexes for a given year-disease (ie, the draws are by sex, year, disease, and I want the sum of both sexes for each year and disease).

import numpy as np
year = np.repeat((1980, 1990 , 2000, 2010), 10)
sex = np.array(['male', 'female']*20)
disease = np.repeat(('d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8'), 5)
draws = np.random.normal(0, 1, size=(sex.shape[0], 1000))

Any thoughts on how to get an array that will be shape (20, 1000) that has the sum of the draw across both sexes for a given year-disease? I will also need to be able to do this in situations where the data isn't perfectly square (there are disease-years which only have 1 sex).

解决方案

import numpy as np
import itertools   
import csv

year = np.repeat((1980, 1990 , 2000, 2010), 10)
sex = np.array(['male', 'female']*20)
disease = np.repeat(('d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8'), 5)
draws = np.random.normal(0, 1, size=(sex.shape[0], 1000))

years=np.unique(year)
diseases=np.unique(disease)

draw_sums = dict(((y,d), draws[(year==y)&(disease==d)].sum(axis=0)) 
                  for y,d in itertools.product(years,diseases))

This results in an dict associating each (year,disease) with the corresponding sum of the draws. To write draw_sums to a csv, you could do something like this:

with open('/tmp/test.csv','w') as f:
    writer=csv.writer(f)
    writer.writerow(['year', 'date']+['draw{i}'.format(i=i) for i in range(1,1001)])
    for yeardate,draws in sorted(draw_sums.items()):
        writer.writerow(list(yeardate)+draws.tolist())

这篇关于通过在Python特点总结阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆