如何在Python中使用nltk stanford pos标记器解析大数据 [英] How to parse large data with nltk stanford pos tagger in Python

查看:110
本文介绍了如何在Python中使用nltk stanford pos标记器解析大数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用nltk和stanford pos标记程序开发一个程序,以分析带有很多单词的文件.我的一些文件可以很好地执行,但是很少有以下错误.

I am working on a program with nltk and stanford pos tagger to parse files with lots of words. Some of my files get executed fine but few gives the following error.

OSError: Java command failed : ['C:\\Program Files\\Java\\jdk1.8.0_60\\bin\\java.exe', '-mx1000m', '-cp',

经过一些研究,我发现问题出在运行时程序内存较少.一种可能的解决方案是将一个文件分成两个文件,然后分别处理它们.但是,对于我的程序来说,这不是一个永久的长期解决方案.因此,现在我想增加进程内存.

After some research I found that the issue is with less program memory at runtime. The one possible solution is to break one file into two and then process them separately. But, this is not a permanent long term solution for my program. So, now I would want to increase the process memory.

我发现本文是通过过量提交分配内存的.看来该解决方案是针对linux的.我正在Windows 8上工作,找不到文件sysctl.conf.因此,任何人都可以向我提供有关如何在Windows环境中增加内存的解决方案.

I found this article for allocating memory via over-commit . It seems that the solution is for linux. I am working on windows 8 and can't find file sysctl.conf . So anyone can provide me solution of how can I increase memory in my windows environment.

谢谢

推荐答案

经过一番搜索,我增加了运行斯坦福POS标记器所需的Java最大内存.命令是:

After some search, I increased the maximum RAM that java needed to run stanford POS tagger. The command is:

nltk.internals.config_java(options='-xmx2G')

重新启动程序,它就可以工作了

Restarted the program and it worked

这篇关于如何在Python中使用nltk stanford pos标记器解析大数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆