如何根据另一行对行进行排序? [英] How to sequence row based on another row?
问题描述
我正在尝试将公式从 excel 转换为 Pandas.
DataFrame 如下所示:
A 栏 B 栏HHHJJJJ钾钾
我想填充 B 列以递增,而 A 列中的值保持不变.在上面的例子中,这将是:
A 栏 B 栏1H 2H 3123412
在excel中,公式为=IF(A2<>A1,1,B1+1)
如何在 Pandas 中应用这个公式?
这可以使用以下矢量化方法来完成:
代码:
<预><代码>>>>df = pd.DataFrame({'A':['H', 'H', 'H', 'J', 'J', 'J', 'J', 'K', 'K']})>>>df['B'] = df.groupby((df['A'].shift(1) != df['A']).cumsum()).cumcount() + 1输出:
<预><代码>>>>df甲乙0 小时 11 小时 22 小时 33 J 14 J 25 J 36 J 47 千 18 千 2说明:
首先,我们使用df['A'].shift(1) != df['A']
将 A 列与 A 列移动 1 进行比较.这产生:
接下来,我们使用cumsum()
返回该列的累积总和.这给了我们:
现在,我们可以使用 GroupBy.cumcount()
像往常一样以升序枚举每一项,加 1 以从 1 开始索引.注意我们不能只使用
df.groupby('A').cumcount()
例如,因为如果我们有:
<预><代码>>>>df一种0小时1小时2小时3J4J5 焦耳6 焦7K8K9小时这会给我们:
<预><代码>>>>df.groupby('A').cumcount() + 10 11 22 33 14 25 36 47 18 29 4数据类型:int64请注意,最后一行是 4
而不是 1
预期的.
I am trying to convert a formula from excel to pandas.
The DataFrame looks like this:
Column A Column B
H
H
H
J
J
J
J
K
K
I want to fill column B to increment while the value in column A remains the same. In the example above, this would be:
Column A Column B
H 1
H 2
H 3
J 1
J 2
J 3
J 4
K 1
K 2
In excel, the formula would be =IF(A2<>A1,1,B1+1)
How can I apply this formula in pandas?
This can be done using the following vectorised method:
Code:
>>> df = pd.DataFrame({'A':['H', 'H', 'H', 'J', 'J', 'J', 'J', 'K', 'K']})
>>> df['B'] = df.groupby((df['A'].shift(1) != df['A']).cumsum()).cumcount() + 1
Output:
>>> df
A B
0 H 1
1 H 2
2 H 3
3 J 1
4 J 2
5 J 3
6 J 4
7 K 1
8 K 2
Explanation:
First, we use df['A'].shift(1) != df['A']
to compare column A with column A shifted by 1. This yields:
>>> df['A'] != df['A'].shift(1)
0 True
1 False
2 False
3 True
4 False
5 False
6 False
7 True
8 False
Name: A, dtype: bool
Next, we use cumsum()
to return the cumulative sum over that column. This gives us:
>>> (df['A'] != df['A'].shift(1)).cumsum()
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 3
8 3
Name: A, dtype: int32
Now, we can use GroupBy.cumcount()
as usual to enumerate each item in ascending order, adding 1 to start the index at 1. Note that we can't just use
df.groupby('A').cumcount()
Because if, for example, we had:
>>> df
A
0 H
1 H
2 H
3 J
4 J
5 J
6 J
7 K
8 K
9 H
This would give us:
>>> df.groupby('A').cumcount() + 1
0 1
1 2
2 3
3 1
4 2
5 3
6 4
7 1
8 2
9 4
dtype: int64
Note that the final row is 4
and not 1
as expected.
这篇关于如何根据另一行对行进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!