如何在Python Natural Language Toolkit中创建自己的语料库? [英] How can I create my own corpus in the Python Natural Language Toolkit?

查看:327
本文介绍了如何在Python Natural Language Toolkit中创建自己的语料库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在nltk中扩展了名称主体,并想知道如何将我拥有的两个文件(male.txt,female.txt)转换为主体,以便可以使用现有的nltk.corpus访问它们.方法.有人有什么建议吗?

I have recently expanded the names corpus in nltk and would like to know how I can turn the two files I have (male.txt, female.txt) in to a corpus so I can access them using the existing nltk.corpus methods. Does anyone have any suggestions?

非常感谢, 詹姆斯.

推荐答案

As the readme says, the names corpus is not in the public domain -- you should send an email with any changes you make to the corpus author (address is in that file). Apart from that detail of law and courtesy, you can simply replace either or both of those files with your own, they're in perfectly simple format (one name per line, comments allowed [[and ignored]] and start with '#').

要安装全新的语料库,而不仅仅是调整现有的语料库,您可以从给出

To install a totally new corpus rather than just tweaking an existing ones, you could start with the docs given here.

这篇关于如何在Python Natural Language Toolkit中创建自己的语料库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆