pandas :根据其他数据框信息创建数据框行 [英] Pandas: creating dataframe rows from other dataframe information

查看:78
本文介绍了 pandas :根据其他数据框信息创建数据框行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理汇总数据,为了对其进行进一步处理,需要对其进行分解.原始df包含值'否.每行学生数,而我需要在每位学生的新df中排一行:

I'm working with aggregated data, which I need to dis-aggregate in order to process it further. The original df contains a value 'no. of students' per row and I need one row in the new df per student:

原始df:

                faculty A   faculty B   faculty x
male students           2           7       ...
female students         4           3       ...

新df:

 No.           gender  faculty   ...
 1             m       A
 2             m       A
 3             f       A

,依此类推.原始df包含更多信息(例如国籍和地区信息),但是可以用与性别等相同的方式来处理. 显然,我会先从转置(df.T)开始,但随后就开始有趣了……我是个初学者,任何指针都将非常受欢迎.

and so on. The original df contains some more information (like nationality and regional info), but that could be dealt with the same way as with gender, etc. Obviously I'd start by transposing (df.T), but then the fun begins... I'm quite the beginner, any pointer would be very welcome.

推荐答案

我认为分解"数据的最简单方法是使用生成器表达式 只需枚举所有所需的行:

I think the easiest way to "disaggregate" the data is to use a generator expression to simply enumerate all the desired rows:

(key for key, val in series.iteritems() for i in range(val))


import pandas as pd

df = pd.DataFrame({'faculty A': [2,4], 'faculty B':[7,3]}, 
                  index=['male students', 'female students'])
df.columns = [re.sub(r'faculty ', '', col) for col in df.columns]
df.index = ['m', 'f']
series = df.stack()
df = pd.DataFrame(
    (key for key, val in series.iteritems() for i in range(val)),
    columns=['gender','faculty'])

收益

   gender faculty
0       m       A
1       m       A
2       m       B
3       m       B
4       m       B
5       m       B
6       m       B
7       m       B
8       m       B
9       f       A
10      f       A
11      f       A
12      f       A
13      f       B
14      f       B
15      f       B


PS.上面显示了可以分解"数据,但是您确定吗 你想这样做吗?分解似乎效率很低.如果其中之一 值是一百万,那么您最终将得到一百万 行...


PS. The above shows it is possible to "disaggregate" the data, but are you sure you want to do that? Disaggregation seems rather inefficient. If one of the values is a million, then you would end up with a million duplicate rows...

与其进行分类,不如找到一种对聚合数据进行计算的方法.

Instead of disaggregating, you might be better off finding a way to perform your computation on the aggregated data.

这篇关于 pandas :根据其他数据框信息创建数据框行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆