如果在列文本字符串中找到了值,如何使用字典键添加新的数据框列 [英] How to add new Dataframe Column with Dictionary Key, if the Value is found in a column text string

查看:72
本文介绍了如果在列文本字符串中找到了值,如何使用字典键添加新的数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中一列包含文本信息.

I have a dataframe in which one column has text information.

print(df):

...   | ... |  Text                         |

...   | ... |  StringA. StringB. StringC    |
...   | ... |  StringZ. StringY. StringX    |
...   | ... |  StringL. StringK. StringJ    |
...   | ... |  StringA. StringZ. StringJ    |

我还有一本字典,其中包含以下内容:

I also have a dictionary that has the following:

dict = {'Dogs': ['StringA', 'StringL'],'Cats': ['StringB', 'StringZ', 'StringJ'],'Birds': ['StringK', 'StringY']}

我大约有100个字典键,每个键有4个以上的值.

i have about 100 dictionary Keys which each have 4+ Values.

我希望做的是在数据框中为字典中的每个键创建额外的列,然后在出现字典中的任何值时在该列中放置一个"1".

What I am hoping to do is create extra columns in the dataframe for each Key in the dictionary and then place a "1" in the column when any of the Values from the dictionary appear.

因此,我想要获得的输出是:

Therefore the output i am trying to get is:

print(df):

...   | ... |  Text                         |   Dogs   |   Cats    |   Birds

...   | ... |  StringA. StringB. StringC    |   1      |   1       |   0
...   | ... |  StringZ. StringY. StringX    |   0      |   1       |   1
...   | ... |  StringL. StringK. StringJ    |   1      |   1       |   1
...   | ... |  StringA. StringZ. StringJ    |   1      |   1       |   0

问题是我不确定如何在文本列中搜索值,然后在找到键"列时返回1.任何帮助将非常感激!谢谢!

The issue is I'm not sure how to search for the values within the text column and then return a 1 if found to the Key column. Any help would be much appreciated! Thanks!

推荐答案

import pandas as pd

d = {'Dogs': ['StringA', 'StringL'],'Cats': ['StringB', 'StringZ', 'StringJ'],'Birds': ['StringK', 'StringY']}
df = pd.DataFrame({'Text': ['StringA. StringB. StringC', 'StringZ. StringY. StringX', 'StringL. StringK. StringJ',
                            'StringA. StringZ. StringJ']})

for k,v in d.items(): # Key, value iteration of dict
    df[k] = df.apply(lambda x: 1 if any([s in x['Text'] for s in v]) else 0, axis=1)

# Apply lambda function to each row in the new column. If any of the values in the array is present in the text, its a 1

# Output
                        Text  Dogs  Cats  Birds
0  StringA. StringB. StringC     1     1      0
1  StringZ. StringY. StringX     0     1      1
2  StringL. StringK. StringJ     1     1      1
3  StringA. StringZ. StringJ     1     1      0

如果字符串很大或有很多字符串,则此解决方案可能不是最佳的.在这种情况下,您可能必须添加带有某种Trie数据结构的附加列.

This solution may be unoptimal if the Strings are large or there are many strings. In which case you may have to add an additional column with some sort of Trie data structure.

但是上述解决方案应适用于大多数中等情况.

But the above solution should work for most moderate cases.

这篇关于如果在列文本字符串中找到了值,如何使用字典键添加新的数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆