如何根据单词对的存在选择子字符串?蟒蛇 [英] How to select sub-strings based on the presence of word pairs? Python
问题描述
我有很多句子,我想从中提取以某些单词组合开头的子句。例如,我想提取以 what do或 what is等开头的句子段(从出现在单词对之前的句子中基本上删除单词)。句子和单词对都是存储在 DataFrame
中:
I have a large number of sentences, from which I want to extract sub-sentences that start with certain word combinations. For example, I want to extract sentence segments that begin with "what does" or "what is', etc. (essentially eliminating the words from the sentence that appear before the word-pairs). Both the sentences and the word-pairs are stored in a DataFrame
:
'Sentence' 'First2'
0 If this is a string what does it say? 0 can I
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2 what does
3 etc. etc. 3 etc. etc
上面示例中的结果是是:
The result I want from the above example would be:
0 what does it say?
1 should it say more?
2
下面最明显的解决方案(至少对我来说)不起作用,它仅使用第一个单词对 b
遍历所有句子 r
,但不能另一个 b
。
The most obvious solution (at least to me) below does not work. It only uses the first word-pair b
to go over all the sentences r
, but not the other b
's.
a = df['Sentence']
b = df['First2']
#The function seems to loop over all r's but only over the first b:
def func(z):
for x in b:
if x in r:
s = z[z.index(x):]
return s
else:
return ‘’
df['Segments'] = a.apply(func)
以这种方式同时循环两个DataFrame无效。有没有更有效的方法?
It seems that looping over two DataFrames simultaneously in this way does not work. Is there a more efficient and effective way to do this?
推荐答案
我相信您的代码中存在错误。
I believe there is a bug in your code.
else:
return ''
这意味着如果第一个比较不匹配,则'func'将立即返回。这就是为什么代码未返回任何匹配项的原因。
This means if the 1st comparison is not a match, 'func' will return immediately. That might be why the code does not return any matches.
示例代码如下:
# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
for first_two in first_twos:
if first_two in sentence:
s = sentence[sentence.index(first_two):]
return s
return ''
df['Segments'] = a.apply(func)
输出:
df:
{
'First2': ['can I', 'should it', 'what does'],
'Segments': ['what does it say? ', 'should it say more?', ''],
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string. ' ]
}
这篇关于如何根据单词对的存在选择子字符串?蟒蛇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!