标记句子列表中的单词Python [英] Tokenize words in a list of sentences Python

查看:115
本文介绍了标记句子列表中的单词Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一个文件,其中包含一个类似

i currently have a file that contains a list that is looks like

example = ['Mary had a little lamb' , 
           'Jack went up the hill' , 
           'Jill followed suit' ,    
           'i woke up suddenly' ,
           'it was a really bad dream...']

示例"是此类句子的列表,我希望输出显示为:

"example" is a list of such sentences , and i want the output to look as :

mod_example = ["'Mary' 'had' 'a' 'little' 'lamb'" , 'Jack' 'went' 'up' 'the' 'hill' ....] 等等. 我需要将句子与每个词分开,以便将每个词标记化,以便可以将mod_example句子(一次使用for循环)中的每个词与参考句子进行比较.

mod_example = ["'Mary' 'had' 'a' 'little' 'lamb'" , 'Jack' 'went' 'up' 'the' 'hill' ....] and so on. I need the sentences to be separate with each word tokenized so that i can compare each word from a sentence of mod_example (at a time using for loop) with a reference sentence.

我尝试过这个:

for sentence in example:
    text3 = sentence.split()
    print text3 

并得到以下输出:

['it', 'was', 'a', 'really', 'bad', 'dream...']

如何获得所有句子的答案? 它不断覆盖.是的,还请提及我的方法是否正确? 这应该是带有标记词的句子的列表..谢谢

How do I get this for all the sentences? it keeps overwriting . and yes , also mention whether my approach is right? this should remain a list of sentences with the words tokenized.. thanks

推荐答案

您可以在NLTK中使用单词tokenizer(

You could use the word tokenizer in NLTK (http://nltk.org/api/nltk.tokenize.html) with a list comprehension, see http://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

>>> from nltk.tokenize import word_tokenize
>>> example = ['Mary had a little lamb' , 
...            'Jack went up the hill' , 
...            'Jill followed suit' ,    
...            'i woke up suddenly' ,
...            'it was a really bad dream...']
>>> tokenized_sents = [word_tokenize(i) for i in example]
>>> for i in tokenized_sents:
...     print i
... 
['Mary', 'had', 'a', 'little', 'lamb']
['Jack', 'went', 'up', 'the', 'hill']
['Jill', 'followed', 'suit']
['i', 'woke', 'up', 'suddenly']
['it', 'was', 'a', 'really', 'bad', 'dream', '...']

这篇关于标记句子列表中的单词Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆