在python中导航文本文件搜索 [英] navigating text file searches in python

查看:101
本文介绍了在python中导航文本文件搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我正在使用的文本文件的示例:

here is sample of the text file I am working with:

<Opera>

Tristan/NNP
and/CC
Isolde/NNP
and/CC
the/DT
fatalistic/NN
horns/VBZ
The/DT
passionate/JJ
violins/NN
And/CC
ominous/JJ
clarinet/NN
;/:

正斜杠后的大写字母是怪异的标记.我希望能够在文件中搜索类似"NNP,CC,NNP"的内容,并使程序返回此段"Tristan and Isolde",这是与行中的这三个标记匹配的行中的三个词.

The capital letters after the forward slashes are weird tags. I want to be able to search the file for something like "NNP,CC,NNP" and have the program return for this segment "Tristan and Isolde", the three words in a row that match those three tags in a row.

我遇到的问题是我希望用户输入搜索字符串,因此它将始终是不同的.
我可以读取文件并找到一个匹配项,但是我不知道如何从该点开始倒数打印第一个单词,或者如何查找下一个标记是否匹配.

The problem I am having is I want the search string to be user inputed so it will always be different.
I can read the file and find one match but I do not know how to count backwards from that point to print the first word or how to find whether the next tag matches.

推荐答案

看来您的源文本可能是

It appears your source text was possibly produced by Natural Language Toolkit (nltk).

使用nltk,您可以对文本进行标记化,将标记拆分为(单词,part_of_speech)元组,并遍历ngram以找到与模式匹配的元组:

Using nltk, you could tokenize the text, split the token into (word, part_of_speech) tuples, and iterate through ngrams to find those that match the pattern:

import nltk
pattern = 'NNP,CC,NNP'
pattern = [pat.strip() for pat in pattern.split(',')]
text = '''Tristan/NNP and/CC Isolde/NNP and/CC the/DT fatalistic/NN horns/VBZ
          The/DT passionate/JJ violins/NN And/CC ominous/JJ clarinet/NN ;/:'''
tagged_token = [nltk.tag.str2tuple(word) for word in nltk.word_tokenize(text)]
for ngram in nltk.ingrams(tagged_token,len(pattern)):
    if all(gram[1] == pat for gram,pat in zip(ngram,pattern)):
        print(' '.join(word for word, pos in ngram))      

收益

Tristan and Isolde


相关链接:


Related link:

  • Categorizing and Tagging Words (chapter 5 of the NLTK book)

这篇关于在python中导航文本文件搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆