匹配Python Pandas的多个短语 [英] Multiple Phrases Matching Python Pandas

查看:125
本文介绍了匹配Python Pandas的多个短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是参考我之前的问题在熊猫中的单数和复数短语匹配.由于无法从其他人的帮助下获得预期的功能,因此我将按照我所遵循的方法以及我实际需要实现的功能来发布它.

This is with reference to my previous question Singular and plural phrase matching in pandas. Since the expected functionality was not achieved from the help given by others, I am posting it with the approach I have followed and what I actually needed to achieve.

下面是两个短语数据集和代码.

Here below are the two phrases datasets and code.

ingredients=pd.Series(["vanilla extract","walnut","oat","egg","almond"])

df=pd.DataFrame(["1 teaspoons vanilla extract","2 eggs","3 cups chopped walnuts","4 cups rolled oats","1 (10.75 ounce) can Campbell's Condensed Cream of Chicken with Herbs Soup","6 ounces smoke-flavored almonds, finely chopped","sdfgsfgsf","fsfgsgsfgfg"])

我只需要将配料系列中的短语与DataFrame中的短语进行匹配.作为伪代码,

What I simply needed was match the phrases in the ingredients Series with the phrases in the DataFrame. As a Pseudo code,

如果在DataFrame的短语中找到成分(单数或复数), 返回成分.否则,返回false.

If ingredients(singular or plural) found in phrase in the DataFrame, return the ingredient. Or otherwise, return false.

我已经根据我提出的其他问题给出的说明开发了代码.

I have developed a code from instructions given in other question I asked.

results=ingredients.apply(lambda x: any(df[0].str.lower().str.contains(x.lower())))
df["existence"]=results
df

我的代码的问题在于,它仅检查序列中的项目数并停止寻找.我真正需要的结果如下,

The problem with my code is that it only checks the number of items in the series and stop looking for it. The result I really needed is as follows,

    0                                            existence
0   1 teaspoons vanilla extract                  vanilla
1   2 eggs                                       egg
2   3 cups chopped walnuts                       walnut
3   4 cups rolled oats                           oat
4   1 (10.75 ounce) can.....                     False
5   6 ounces smoke-flavored almonds.....         almond
6   sdfgsfgsf                                    False
7   fsfgsgsfgfg                                  False

谁能告诉我如何实现此功能?我花了几天的时间测试它,但最终还是没有运气.谢谢大家.

Can anyone tell me how should I achieve this functionality? I have spent days testing it but no luck finally. Thank You everyone.

推荐答案

签出

Check out numpy string operations:

In [131]:

df.columns = ['val']
V = df.val.str.lower().values.astype(str)
K = ingredients.values.astype(str)
df['existence'] = map(''.join, np.where(np.char.count(V, K[...,np.newaxis]),,
                                        K[...,np.newaxis], '').T)
print df
                                                 val        existence
0                        1 teaspoons vanilla extract  vanilla extract
1                                             2 eggs              egg
2                             3 cups chopped walnuts           walnut
3                                 4 cups rolled oats              oat
4  1 (10.75 ounce) can Campbell's Condensed Cream...                 
5    6 ounces smoke-flavored almonds, finely chopped           almond
6                                          sdfgsfgsf                 
7                                        fsfgsgsfgfg     

有2个步骤:

In [138]:
#check if each ingredients in found
np.char.count(V, K[...,np.newaxis])
Out[138]:
array([[1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0]])
In [139]:
#if it is found, grab its name
np.where(np.char.count(V, K[...,np.newaxis]),
                      K[...,np.newaxis], '').T
Out[139]:
array([['vanilla extract', '', '', '', ''],
       ['', '', '', 'egg', ''],
       ['', 'walnut', '', '', ''],
       ['', '', 'oat', '', ''],
       ['', '', '', '', ''],
       ['', '', '', '', 'almond'],
       ['', '', '', '', ''],
       ['', '', '', '', '']], 
      dtype='|S15')

这篇关于匹配Python Pandas的多个短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆