从数据框中提取与列表进行比较的字符串 [英] Extract string from a dataframe comparing to a list

查看:62
本文介绍了从数据框中提取与列表进行比较的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从pandas数据帧中的DF中提取字符串,并且源字符串在必须与之匹配的列表中.我尝试使用df.str.extract(list1),但出现了无法哈希类型的错误,我想我将列表与DF比较的方式不正确

I am trying to extract strings from a DF in pandas dataframe and the source strings are in a list from which I have to match. I tried using a df.str.extract(list1) but i got an error of unhashable types i guess i the way I compare the list to the DF is not correct

来自

Col 1   Col 2
1       The date
2       Three has come
3       Mail Sent
4       Done Deal

收件人

Col 1   Col 2           Col 3 
1       The date        NaN
2       Three has come  Three has
3       Mail Sent        Mail
4       Done Deal        Done

我的列表如下

List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']

推荐答案

您可以使用

You can use extract with join all values in List by | what means or in regex:

List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']
df['Col 3'] = df['Col 2'].str.extract("(" + "|".join(List1) +")", expand=False)
print (df)
   Col 1           Col 2      Col 3
0      1        The date        NaN
1      2  Three has come  Three has
2      3       Mail Sent       Mail
3      4       Done Deal       Done

另一种解决方案:

List1 = ['Three has' , 'Mail' , 'Done' , 'Game' , 'Time has come']

df['Col 3'] = df['Col 2'].apply(lambda x: ''.join([L for L in List1 if L in x]))
df['Col 3'] = df['Col 3'].mask(df['Col 3'] == '')
print (df)
   Col 1           Col 2      Col 3
0      1        The date        NaN
1      2  Three has come  Three has
2      3       Mail Sent       Mail
3      4       Done Deal       Done

这篇关于从数据框中提取与列表进行比较的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆