pandas 如何在另一个列的基础上派生新列的值 [英] pandas how to derived values for a new column base on another column
问题描述
我有一个dataframe
,其中有一个列,每个值都是一个列表,现在我想派生一个新列,该列仅考虑大小大于1的列表,并为对应的行分配一个唯一的整数作为id .
样本dataframe
就像
I have a dataframe
that has a column that each value is a list, now I want to derive a new column which only considers list whose size is greater than 1, and assigns a unique integer to the corresponding row as id.
A sample dataframe
is like,
document_no_list cluster_id
[1,2,3] 1
[4,5,6,7] 2
[8] nan
[9,10] 3
列cluster_id
仅考虑第一行,第二行和第四行,每行的大小都大于1,并为其列中的相应单元格分配一个唯一的整数id.
column cluster_id
only considers the 1st, 2nd and 4th row, each of which has a size greater than 1, and assigns a unique integer id to its corresponding cell in the column.
我想知道如何在pandas
中做到这一点.
I am wondering how to do that in pandas
.
推荐答案
我们可以将np.random.choice用于具有.loc的唯一随机值,即
We can use np.random.choice for unique random values with .loc for assignment i.e
df = pd.DataFrame({'document_no_list' :[[1,2,3],[4,5,6,7],[8],[9,10]]})
x = df['document_no_list'].apply(len) > 1
df.loc[x,'Cluster'] = np.random.choice(range(len(df)),x.sum(),replace=False)
输出:
document_no_list Cluster
0 [1, 2, 3] 2.0
1 [4, 5, 6, 7] 1.0
2 [8] NaN
3 [9, 10] 3.0
如果要连续数字,则可以使用
If you want continuous numbers then you can use
df.loc[x,'Cluster'] = np.arange(x.sum())+1
document_no_list Cluster
0 [1, 2, 3] 1.0
1 [4, 5, 6, 7] 2.0
2 [8] NaN
3 [9, 10] 3.0
希望对您有帮助
这篇关于 pandas 如何在另一个列的基础上派生新列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!