在不同类别的数据框中汇总数据 [英] Summing Data in data frame in distinct categories
问题描述
我用数据创建了一个Excel电子表格,并已将其传输到CSV文件中.我想添加每个不同年份每个种族的数据.我试图创建一个数据索引,并尝试对每个种族进行总计,但是能够保存或包含这些数据. 我用过df.以及创建的"for"循环,以便我可以按种族保存数据,但收到错误消息.原始的excel工作表包含与特定年份相关的特定节目的每个种族的数据框.我无法按种族汇总每年的列.
I have created an excel spreadsheet with data, and have transferred into a CSV file. I would like to add the data per ethnicity at each distinct year. I have tried to create a data index and have tried to total sum for each ethnicity but have been able to hold or contain the data. I have used df. as well as created 'for' loops so that I can hold the data per ethnicity but have received error messages. The original excel sheet contains the data frame per ethnicity for a specific show that is in relation to a specific year. I am unable to sum columns per year per ethnicity.
我应该使用for或if循环逐步执行特定年份吗,我的方法是否正确?
Should I use a for or if loop to step through specific years, is my approach to the correct process?
#this is the first method I have tried
import pandas as pd
import numpy as np
from google.colab import files
uploaded = files.upload()
# df = pd.read_csv('/content/drive/My Drive/allTheaterDataV2.csv')
import io
df = pd.read_csv(io.BytesIO(uploaded['allTheaterDataV2.csv']))
# Daset is now stored in a Pandas Dataframe
#create list that contains the specific season that we want to reference
# print(df)
data = pd.DataFrame(allTheaterDataV2)
dataindex = [20082009, 20102011, 20112012, 20122013, 20132014, 20142015]
print(dataindex)
df.loc['total',:] = df.sum(axis=0)
print(df.loc[1:42, ['ASIAM','AFRAM','LAT','CAU','OTH']].sum())
# The second method I have tried is included below
for i in dataindex:
# create a new data frame that stores the data per year
hold_ASIAM = df[df.index == i]
# allows for data for each season to be contained together
ETHtotalASIAM = df['ASIAM'].sum()
hold_ASIAM.append(ETHtotalASIAM)
print(hold_ASIAM)
我希望输出的结果是每年(20082009)每个种族(例如AFRAM)的总数(大约#),但实际输出是未定义名称'allTheaterDataV2''
I expect the output to give me the total(some #) per ethnicity (ex:AFRAM) per year (20082009), but the actual output is "name 'allTheaterDataV2' is not defined'
推荐答案
这应该有效.
import pandas as pd
df = pd.DataFrame({'ID':['Billy Elliot','next to normal','shrek','guys and dolls',
'west side story', 'pal joey'],
'Season' : [20082009,20082009,20082009,
20082009,20082009,20082009],
'AFRAM' : [2,0,4,4,0,1],
'ASIAM' : [0,0,1,0,0,0],
'CAU' : [48,10,25,24,28,20],
'LAT' : [1,0,1,3,18,0],
'OTH' : [0,0,0,0,0,0]})
print(df)
# AFRAM ASIAM CAU ID LAT OTH Season
# 0 2 0 48 Billy Elliot 1 0 20082009
# 1 0 0 10 next to normal 0 0 20082009
# 2 4 1 25 shrek 1 0 20082009
# 3 4 0 24 guys and dolls 3 0 20082009
# 4 0 0 28 west side story 18 0 20082009
# 5 1 0 20 pal joey 0 0 20082009
# drop the ID column since it is just a string
df = df.drop(['ID'], axis = 1)
# group by season and add the other columns
df = df.groupby('Season').sum()
print(df)
# AFRAM ASIAM CAU LAT OTH
# Season
# 20082009 11 1 155 23 0
这篇关于在不同类别的数据框中汇总数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!