匹配Python Pandas的多个短语 [英] Multiple Phrases Matching Python Pandas
问题描述
这是参考我之前的问题在熊猫中的单数和复数短语匹配.由于无法从其他人的帮助下获得预期的功能,因此我将按照我所遵循的方法以及我实际需要实现的功能来发布它.
This is with reference to my previous question Singular and plural phrase matching in pandas. Since the expected functionality was not achieved from the help given by others, I am posting it with the approach I have followed and what I actually needed to achieve.
下面是两个短语数据集和代码.
Here below are the two phrases datasets and code.
ingredients=pd.Series(["vanilla extract","walnut","oat","egg","almond"])
df=pd.DataFrame(["1 teaspoons vanilla extract","2 eggs","3 cups chopped walnuts","4 cups rolled oats","1 (10.75 ounce) can Campbell's Condensed Cream of Chicken with Herbs Soup","6 ounces smoke-flavored almonds, finely chopped","sdfgsfgsf","fsfgsgsfgfg"])
我只需要将配料系列中的短语与DataFrame中的短语进行匹配.作为伪代码,
What I simply needed was match the phrases in the ingredients Series with the phrases in the DataFrame. As a Pseudo code,
如果在DataFrame的短语中找到成分(单数或复数), 返回成分.否则,返回false.
If ingredients(singular or plural) found in phrase in the DataFrame, return the ingredient. Or otherwise, return false.
我已经根据我提出的其他问题给出的说明开发了代码.
I have developed a code from instructions given in other question I asked.
results=ingredients.apply(lambda x: any(df[0].str.lower().str.contains(x.lower())))
df["existence"]=results
df
我的代码的问题在于,它仅检查序列中的项目数并停止寻找.我真正需要的结果如下,
The problem with my code is that it only checks the number of items in the series and stop looking for it. The result I really needed is as follows,
0 existence
0 1 teaspoons vanilla extract vanilla
1 2 eggs egg
2 3 cups chopped walnuts walnut
3 4 cups rolled oats oat
4 1 (10.75 ounce) can..... False
5 6 ounces smoke-flavored almonds..... almond
6 sdfgsfgsf False
7 fsfgsgsfgfg False
谁能告诉我如何实现此功能?我花了几天的时间测试它,但最终还是没有运气.谢谢大家.
Can anyone tell me how should I achieve this functionality? I have spent days testing it but no luck finally. Thank You everyone.
推荐答案
Check out numpy
string operations:
In [131]:
df.columns = ['val']
V = df.val.str.lower().values.astype(str)
K = ingredients.values.astype(str)
df['existence'] = map(''.join, np.where(np.char.count(V, K[...,np.newaxis]),,
K[...,np.newaxis], '').T)
print df
val existence
0 1 teaspoons vanilla extract vanilla extract
1 2 eggs egg
2 3 cups chopped walnuts walnut
3 4 cups rolled oats oat
4 1 (10.75 ounce) can Campbell's Condensed Cream...
5 6 ounces smoke-flavored almonds, finely chopped almond
6 sdfgsfgsf
7 fsfgsgsfgfg
有2个步骤:
In [138]:
#check if each ingredients in found
np.char.count(V, K[...,np.newaxis])
Out[138]:
array([[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0]])
In [139]:
#if it is found, grab its name
np.where(np.char.count(V, K[...,np.newaxis]),
K[...,np.newaxis], '').T
Out[139]:
array([['vanilla extract', '', '', '', ''],
['', '', '', 'egg', ''],
['', 'walnut', '', '', ''],
['', '', 'oat', '', ''],
['', '', '', '', ''],
['', '', '', '', 'almond'],
['', '', '', '', ''],
['', '', '', '', '']],
dtype='|S15')
这篇关于匹配Python Pandas的多个短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!