Python:在文本中查找单词列表并返回其索引 [英] Python: Find a list of words in a text and return its index

查看:768
本文介绍了Python:在文本中查找单词列表并返回其索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须处理纯文本文档,查找单词列表,并在找到的每个单词周围返回一个文本窗口.我正在使用 NLTK .

I have to process a document in plain text, looking for a word list and returning a text window around each word found. I'm using NLTK.

我在Stack Overflow上找到了帖子,他们在其中使用正则表达式来查找单词,但没有获取它们的索引,而只是打印它们.我认为使用RE不正确,因为我必须找到特定的单词.

I found posts on Stack Overflow where they use regular expressions for finding words, but without getting their index, just printing them. I don't think use RE is right, cause I have to find specific words.

推荐答案

这是您要寻找的:

  • 您可以使用str.index或str.find:

文件内容:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi sollicitudin tortor et velit venenatis molestie. Morbi non nibh magna, quis tempor metus. 
Vivamus vehicula velit sit amet neque posuere id hendrerit sem venenatis. Nam vitae felis sem. Mauris ultricies congue mi, eu ornare massa convallis nec. 
Donec volutpat molestie velit, scelerisque porttitor dui suscipit vel. Etiam feugiat feugiat nisl, vitae commodo ligula tristique nec. Fusce bibendum fermentum rutrum.

>>>a = open("file.txt").read()

>>>print a.index("vitae")
232
>>> print a.find("vitae")
232

-编辑-

好吧,如果您在多个索引中有相同的单词,请尝试使用生成器,

--Edit--

Ok, if you have same words in multiple indices try using a generator,

def all_occurences(file, str):
    initial = 0
    while True:
        initial = file.find(str, initial)
        if initial == -1: return
        yield initial
        initial += len(str)


>>>print list(all_occurences(open("file.txt").read(),"vitae"))
[232, 408]

这篇关于Python:在文本中查找单词列表并返回其索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆