pandas 如何在另一个列的基础上派生新列的值 [英] pandas how to derived values for a new column base on another column

查看:60
本文介绍了 pandas 如何在另一个列的基础上派生新列的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个dataframe,其中有一个列,每个值都是一个列表,现在我想派生一个新列,该列仅考虑大小大于1的列表,并为对应的行分配一个唯一的整数作为id . 样本dataframe就像

I have a dataframe that has a column that each value is a list, now I want to derive a new column which only considers list whose size is greater than 1, and assigns a unique integer to the corresponding row as id. A sample dataframe is like,

document_no_list    cluster_id
[1,2,3]             1
[4,5,6,7]           2
[8]                 nan
[9,10]              3 

cluster_id仅考虑第一行,第二行和第四行,每行的大小都大于1,并为其列中的相应单元格分配一个唯一的整数id.

column cluster_id only considers the 1st, 2nd and 4th row, each of which has a size greater than 1, and assigns a unique integer id to its corresponding cell in the column.

我想知道如何在pandas中做到这一点.

I am wondering how to do that in pandas.

推荐答案

我们可以将np.random.choice用于具有.loc的唯一随机值,即

We can use np.random.choice for unique random values with .loc for assignment i.e

df = pd.DataFrame({'document_no_list' :[[1,2,3],[4,5,6,7],[8],[9,10]]})

x = df['document_no_list'].apply(len) > 1 

df.loc[x,'Cluster'] =  np.random.choice(range(len(df)),x.sum(),replace=False)

输出:


 document_no_list  Cluster
0        [1, 2, 3]      2.0
1     [4, 5, 6, 7]      1.0
2              [8]      NaN
3          [9, 10]      3.0

如果要连续数字,则可以使用

If you want continuous numbers then you can use

df.loc[x,'Cluster'] =  np.arange(x.sum())+1


 document_no_list  Cluster
0        [1, 2, 3]      1.0
1     [4, 5, 6, 7]      2.0
2              [8]      NaN
3          [9, 10]      3.0

希望对您有帮助

这篇关于 pandas 如何在另一个列的基础上派生新列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆