PANDA在组内创建序数递增值列 [英] PANDAs create ordinal ascending value column within Group

查看:93
本文介绍了PANDA在组内创建序数递增值列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下内容的数据框"df":

I have a dataframe 'df' that consists of:

col1 = datetime[64]
col2 = object
col3 = object
col4 = object

我想按'col1'对数据框进行排序.然后,我想按"col2"分组.最后,我想在按"col2"分组并按"col1"排序的分组内创建序数(1,2,3).如果按"col2"分组的情况下有4行,则在此新列中,这些行的值将为[1,2,3,4].

I would like to sort the dataframe by 'col1'. Then I'd like to group by 'col2'. Finally I would like to create an ordinal value (1,2,3) within that grouping by 'col2' and sorted by 'col1'. If there are 4 rows in a grouping by 'col2', then the values for the those rows would be [1,2,3,4], in this new column.

我知道PANDA中有一个"rank()",我可以使用

I know there is a 'rank()' in PANDAs, and I can use

df['newcol'] = df.groupby(['col2'])['col1'].rank()

但是这不给我原始数据框列的序号仅在分组内像[1,2,3]一样吗?

But this doesn't give me the original dataframe column with ordinal values that number like [1,2,3] within the grouping only?

推荐答案

您是否要实现这样的目标?没有样本数据和理想结果很难分辨.

Are you trying to achieve something like this? It is difficult to tell without sample data and desired result.

random.seed(0)
df = pd.DataFrame({col: [random.choice(list('abc')) for i in range(10)] for col in list('ABC')})
df['timestamp'] = pd.date_range('2016-1-1', periods=len(df))

df.sort_values('timestamp', inplace=True)
df['rank'] = \
    df.groupby('A')['B'].transform(lambda group: group.astype('category').cat.codes + 1)

>>> df
   A  B  C  timestamp rank
0  c  c  a 2016-01-01    2
1  c  b  c 2016-01-02    1
2  b  a  c 2016-01-03    1
3  a  c  c 2016-01-04    1
4  b  b  b 2016-01-05    2
5  b  a  a 2016-01-06    1
6  c  c  b 2016-01-07    2
7  a  c  b 2016-01-08    1
8  b  c  c 2016-01-09    3
9  b  c  c 2016-01-10    3

这篇关于PANDA在组内创建序数递增值列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆