大 pandas 仅替换列的一部分 [英] pandas replace only part of a column
问题描述
这是我的输入内容:
import pandas as pd
import numpy as np
list1 = [10,79,6,38,4,557,12,220,46,22,45,22]
list2 = [4,3,23,6,234,47,312,2,426,42,435,23]
df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.where (df['A'] > df['B'].shift(-2), 1, np.nan)
print (df)
产生此输出的
:
that produces this output:
A B C
0 10 4 NaN
1 79 3 1.0
2 6 23 NaN
3 38 6 NaN
4 4 234 NaN
5 557 47 1.0
6 12 312 NaN
7 220 2 1.0
8 46 426 NaN
9 22 42 NaN
10 45 435 NaN
11 22 23 NaN
我需要做的是将列"C"更改为连续三个不重复的1的集合.所需的输出是:
What I need to do is to change column 'C' to be a set of three 1's in a row, non-overlapping. The desired output is:
A B C
0 10 4 NaN
1 79 3 1.0
2 6 23 1.0
3 38 6 1.0
4 4 234 NaN
5 557 47 1.0
6 12 312 1.0
7 220 2 1.0
8 46 426 NaN
9 22 42 NaN
10 45 435 NaN
11 22 23 NaN
因此,第2、3和6行从NaN更改为1.0.第7行已经有一个1.0,因此将被忽略.第7行和第8行需要保留NaN,因为第7行是前一组的最后一个条目.
So, rows 2, 3, and 6 change from NaN to 1.0. Row 7 already has a 1.0 and it is ignored. Rows 8 and 9 need to stay NaN because row 7 is the last entry of the previous set.
我不知道是否有更好的方法来创建列"C",该列将在创建时执行此操作.
I don't know if there is a better way to build column 'C' that would do this at creation.
我尝试了fillna和ffill的多个版本,但没有一个对我有用.
I have tried several versions of fillna and ffill, none of them worked for me.
这似乎很令人费解,但我尝试使用此行隔离每个1.0的行ID:
It seems very convoluted but I tried to isolate the row id's for each 1.0 with this line:
print (df.loc[df['C'] == 1])
可以正确输出以下内容:
Which correctly outputs this:
A B C
1 79 3 1.0
5 557 47 1.0
7 220 2 1.0
即使我知道这些信息,我也不知道如何从那里进行.
Even though I know that information, I don't know how to proceed from there.
非常感谢您的提前帮助, 大卫
Thank you so much for your help in advance, David
推荐答案
更快的版本(感谢b2002):
Faster version (thanks to b2002):
ii = df[pd.notnull(df.C)].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
jj = [ii[0]] + jj
for ci in jj:
df.C.values[ci:ci+3] = 1.0
首先,通过查看C
列中不为空的点之间的差异来获取所有起点的索引,即所有1.0且后面有两个NaN的点(第一个索引包含在默认值),然后遍历这些索引并使用loc
更改C
列的切片:
First get the indices of all your starting points, i.e. all your points that are 1.0 and have two NaN following, by looking at the differences between the points that are not null in the C
column (first index is included by default), then iterate over those indices and use loc
to change slices of your C
column:
ii = df[pd.notnull(df.C)].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
jj = [ii[0]] + jj
for ci in jj:
df.loc[ci:ci+2,'C'] = 1.0
结果:
A B C
0 10 4 NaN
1 79 3 1.0
2 6 23 1.0
3 38 6 1.0
4 4 234 NaN
5 557 47 1.0
6 12 312 1.0
7 220 2 1.0
8 46 426 NaN
9 22 42 NaN
10 45 435 NaN
11 22 23 NaN
这篇关于大 pandas 仅替换列的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!