在Python中使用 pandas 在两个DataFrame之间进行值匹配 [英] value matching between two DataFrames using pandas in python
本文介绍了在Python中使用 pandas 在两个DataFrame之间进行值匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个像下面的DataFrames
Hi I have two DataFrames like below
DF1
Alpha | Numeric | Special
and | 1 | @
or | 2 | #
lol ok | 4 | &
DF2 with single column
Content
boy or girl
school @ morn
pyc LoL ok student
Chandra
我想搜索DF1的任何列中是否有DF2的content列中的任何关键字,并且输出应该在新的DF中
I want to search if anyone of the column in DF1 has anyone of the keyword in content column of DF2 and the output should be in a new DF
`df11 = (df1.unstack()
.reset_index(level=2,drop=True)
.rename_axis(('col_order','col_name'))
.dropna()
.reset_index(name='val_low'))
df22 = (df2['Content'].str.split(expand=True)
.stack()
.rename('val')
.reset_index(level=1,drop=True)
.rename_axis('idx')
.reset_index())`
df22['val_low'] = df22['val'].str.lower()
df = (pd.merge(df22, df11, on='val_low', how='left')
.dropna(subset=['col_name'])
.sort_values(['idx','col_order'])
.drop_duplicates(['idx']))
df = (pd.concat([df2, df.set_index('idx')], axis=1)
.fillna({'col_name':'Other'})[['val','col_name','Content']])
但是它没有考虑大声笑之间的空格
but it is not considering the spaces between lol ok
expected_output_DF
val col_name Content
0 or Alpha boy or girl
1 @ Special school @ morn
2 lol ok Alpha pyc LoL ok student
3 NaN Other Chandra
有人帮我解决这个问题
推荐答案
使用str.cat
+ str.extract
.然后,使用map
作为列名,并使用pd.concat
进行联接.
Use str.cat
+ str.extract
. Then, use map
for the column names, and pd.concat
to join.
i = df.stack().astype(str)
j = i.reset_index(level=0, drop=1)
m = dict(zip(j.values, j.index))
v = i.str.cat(sep='|')
df2['val'] = df2.Content.str.extract(r'\s(' + v + r')\s', flags=re.I, expand=False)
df2['col_name'] = df2['val'].str.lower().map(m).fillna('Other')
df2
Content val col_name
0 boy or girl or Alpha
1 school @ morn @ Special
2 pyc LoL ok student LoL ok Alpha
3 Chandra NaN Other
详细信息
-
i
和j
是用于创建映射的设置变量 -
m
是值到列名的映射 -
v
是发送到str.extract
进行关键字提取的正则表达式模式.我用re.I
忽略大小写 - 我使用
map
+fillna
将提取的值通过m
映射到列名
i
andj
are setup variable to create the mappingm
is the mapping of values to column namesv
is the regex pattern that is sent tostr.extract
for keyword extraction. I usere.I
to ignore the case- I use
map
+fillna
to map extracted values to column names viam
v
'and|1|@|or|2|#|lol ok|4|&'
m
{'#': 'Special',
'&': 'Special',
'1': 'Numeric',
'2': 'Numeric',
'4': 'Numeric',
'@': 'Special',
'and': 'Alpha',
'lol ok': 'Alpha',
'or': 'Alpha'}
df['val']
0 or
1 @
2 LoL ok
3 NaN
Name: val, dtype: object
df['col_name']
0 Alpha
1 Special
2 Alpha
3 Other
Name: col_name, dtype: object
这篇关于在Python中使用 pandas 在两个DataFrame之间进行值匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文