PANDA在组内创建序数递增值列 [英] PANDAs create ordinal ascending value column within Group
问题描述
我有一个包含以下内容的数据框"df":
I have a dataframe 'df' that consists of:
col1 = datetime[64]
col2 = object
col3 = object
col4 = object
我想按'col1'对数据框进行排序.然后,我想按"col2"分组.最后,我想在按"col2"分组并按"col1"排序的分组内创建序数(1,2,3).如果按"col2"分组的情况下有4行,则在此新列中,这些行的值将为[1,2,3,4].
I would like to sort the dataframe by 'col1'. Then I'd like to group by 'col2'. Finally I would like to create an ordinal value (1,2,3) within that grouping by 'col2' and sorted by 'col1'. If there are 4 rows in a grouping by 'col2', then the values for the those rows would be [1,2,3,4], in this new column.
我知道PANDA中有一个"rank()",我可以使用
I know there is a 'rank()' in PANDAs, and I can use
df['newcol'] = df.groupby(['col2'])['col1'].rank()
但是这不给我原始数据框列的序号仅在分组内像[1,2,3]一样吗?
But this doesn't give me the original dataframe column with ordinal values that number like [1,2,3] within the grouping only?
推荐答案
您是否要实现这样的目标?没有样本数据和理想结果很难分辨.
Are you trying to achieve something like this? It is difficult to tell without sample data and desired result.
random.seed(0)
df = pd.DataFrame({col: [random.choice(list('abc')) for i in range(10)] for col in list('ABC')})
df['timestamp'] = pd.date_range('2016-1-1', periods=len(df))
df.sort_values('timestamp', inplace=True)
df['rank'] = \
df.groupby('A')['B'].transform(lambda group: group.astype('category').cat.codes + 1)
>>> df
A B C timestamp rank
0 c c a 2016-01-01 2
1 c b c 2016-01-02 1
2 b a c 2016-01-03 1
3 a c c 2016-01-04 1
4 b b b 2016-01-05 2
5 b a a 2016-01-06 1
6 c c b 2016-01-07 2
7 a c b 2016-01-08 1
8 b c c 2016-01-09 3
9 b c c 2016-01-10 3
这篇关于PANDA在组内创建序数递增值列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!