从数据框中查找列的唯一组合 [英] Finding unique combinations of columns from a dataframe

查看:67
本文介绍了从数据框中查找列的唯一组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的数据集中,我需要找到唯一的序列并为其指定一个序列号.

In my below data set, I need to find unique sequences and assign them a serial no ..

数据集:

user    age maritalstatus   product
A   Young   married 111
B   young   married 222
C   young   Single  111
D   old single  222
E   old married 111
F   teen    married 222
G   teen    married 555
H   adult   single  444
I   adult   single  333

预期输出:

young   married     0
young   single      1
old     single      2
old     married     3
teen    married     4
adult   single      5

找到上面显示的唯一值后,如果我通过下面的新用户,

After finding the unique values like shown above, if I pass a new user like below,

user age maritalstatus  
X     young  married 

它应该将我的产品作为清单退还给我.

it should return me the products as a list .

X : [111, 222]

如果没有序列,如下所示

if there is no sequence, like below

user     age     maritalstatus  
    Y     adult  married

它应该给我返回一个空列表

it should return me an empty list

Y : []

推荐答案

首先仅选择要输出的列,然后添加

First select only columns for output and add drop_duplicates, last add new column by range:

df = df[['age','maritalstatus']].drop_duplicates()
df['no'] = range(len(df.index))
print (df)
     age maritalstatus  no
0  Young       married   0
1  young       married   1
2  young        Single   2
3    old        single   3
4    old       married   4
5   teen       married   5
7  adult        single   6

如果要先将所有值都转换为小写:

If want convert all values to lowercase first:

df = df[['age','maritalstatus']].apply(lambda x: x.str.lower()).drop_duplicates()
df['no'] = range(len(df.index))
print (df)
     age maritalstatus  no
0  young       married   0
2  young        single   1
3    old        single   2
4    old       married   3
5   teen       married   4
7  adult        single   5

首先转换为lowercase:

df[['age','maritalstatus']] = df[['age','maritalstatus']].apply(lambda x: x.str.lower())
print (df)
  user    age maritalstatus  product
0    A  young       married      111
1    B  young       married      222
2    C  young        single      111
3    D    old        single      222
4    E    old       married      111
5    F   teen       married      222
6    G   teen       married      555
7    H  adult        single      444
8    I  adult        single      333

然后使用 merge 转换为list的唯一product:

df2 = pd.DataFrame([{'user':'X', 'age':'young', 'maritalstatus':'married'}])
print (df2)
     age maritalstatus user
0  young       married    X

a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[111, 222]


df2 = pd.DataFrame([{'user':'X', 'age':'adult', 'maritalstatus':'married'}])
print (df2)
     age maritalstatus user
0  adult       married    X

a = pd.merge(df, df2, on=['age','maritalstatus'])['product'].unique().tolist()
print (a)
[]

但是,如果需要,请使用> c6> :

But if need column use transform:

df['prod'] = df.groupby(['age', 'maritalstatus'])['product'].transform('unique')
print (df)
  user    age maritalstatus  product        prod
0    A  young       married      111  [111, 222]
1    B  young       married      222  [111, 222]
2    C  young        single      111       [111]
3    D    old        single      222       [222]
4    E    old       married      111       [111]
5    F   teen       married      222  [222, 555]
6    G   teen       married      555  [222, 555]
7    H  adult        single      444  [444, 333]
8    I  adult        single      333  [444, 333]

a = (pd.merge(df, df2, on=['age','maritalstatus'])
       .groupby('user_y')['product']
       .apply(lambda x: x.unique().tolist())
       .to_dict())
print (a)
{'X': [111, 222]}

详细信息:

print (pd.merge(df, df2, on=['age','maritalstatus']))
  user_x    age maritalstatus  product user_y
0      A  young       married      111      X
1      B  young       married      222      X

这篇关于从数据框中查找列的唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆