使用python进行文本搜索 [英] Text search using python
问题描述
我正在研究一个文本搜索项目,并使用文本blob从文本中搜索句子. TextBlob有效地提取所有带有关键字的句子.但是,为了进行有效的研究,我也想删掉一句话,而在此之后我无法弄清楚.
I am working on a text search project, and using text blob to search for sentences from text. TextBlob pulls all the sentences with the keywords efficiently. However for effective research i also want to pull out one sentence before and one after which I am unable to figure.
以下是我正在使用的代码:
Below is the code I am using:
def extraxt_sents(Text,word):
search_words = set(word.split(','))
sents = ''.join([s.lower() for s in Text])
blob = TextBlob(sents)
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
print search_words
print(matches)
推荐答案
如果要获取比赛前后的行,则可以创建循环并记住前一行,或者使用
If you want to get the lines before and after the match, you can either create a loop and memorize the previous line, or use slices, like [from:to]
on the blob.sentences
list.
最好的方法可能是使用 enumerate
bultin功能.
The best way might be to use the enumerate
bultin function.
match_region = [map(str, blob.sentences[i-1:i+2]) # from prev to after next
for i, s in enumerate(blob.sentences) # i is index, e is element
if search_words & set(s.words)] # same as your condition
在这里,blob.sentences[i-1:i+2]
将提取从索引i-1
(包括)到索引i+2
(不包括)的子列表,然后map
将该列表中的元素转换为字符串.
Here, blob.sentences[i-1:i+2]
will extract the sublist spanning from index i-1
(inclusive) to index i+2
(exclusive), and map
turns the elements in this list into strings.
注意:实际上,您可能希望将i-1
替换为max(0, i-1)
;否则,i-1
可能是-1
,Python会将其解释为最后一个元素,从而产生一个空切片.另一方面,如果i+2
大于列表的长度,则不会有问题.
Note: Actually, you might want to replace i-1
with max(0, i-1)
; otherwise i-1
could be -1
and Python would interpret this as the last element, yielding an empty slice. If i+2
is higher than the list's length, on the other hand, this will not be a problem.
这篇关于使用python进行文本搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!