使用一致性查找连字 [英] Use concordance to find hyphenated words

查看:88
本文介绍了使用一致性查找连字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够获得这本书的预期输出,第页4搜索文本".当我尝试将其应用于我的案例时,我得到了No matches,这不是我的预期输出.我认为我没有在适当的级别上分词(用单词代替字符),但是不确定如何纠正它.有什么建议?我想要的输出是每个连字符与其周围的上下文垂直对齐.

I was able to get the expected output of this book, page 4 "Searching Text". When I tried to apply it to my case I got No matches which was not my expected output. I think I'm not tokenizing at the proper level (word instead of character) but am unsure of how to correct that. Any suggestions? The output I want is every hyphen lined up vertically with its surrounding context.

>>> f = open('hyphen.txt')
>>> raw = f.read()
>>> import nltk
>>> tokens = nltk.word_tokenize(raw)
>>> text = nltk.Text(tokens)
>>> text.concordance("-")
No matches
>>> text
<Text: Fog Air-Flow Switch stuck off ? Bubble Tower...>

(Python 3.4.3)

(Python 3.4.3)

编辑

我想我已经使用正则表达式了,但是我不知道如何删除'NoneType'对象.有什么建议吗?

I think I'm close by using regular expressions but I don't know how to remove the 'NoneType' objects. Any suggestions?

我想看到的输出看起来像这样:

The output I'd want to see would look like this:

                 Fog Air-Flow Switch stuck off?
      Bubble Tower Check-Valve stuck closed?
           Chamber Drain-Trap broken, dry, or missing?
         Chamber Exhaust-Vent blocked or restricted?
 etc.

如果上下文比带有连字符的句子宽还可以-对我而言,重要的是连字符与周围的上下文垂直对齐.

It's okay if the context is wider than the sentence with the hyphen - all that matters to me is that the hyphens are lined up vertically with its surrounding context.

推荐答案

需要一点点更改代码.

import nltk
f = open("/path/to/file") //path of the file
raw = f.read()
text = nltk.Text(raw)
text.concordance("-")

必需的输出:

这篇关于使用一致性查找连字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆