用于自然语言处理的 Java 或 Python [英] Java or Python for Natural Language Processing

查看:19
本文介绍了用于自然语言处理的 Java 或 Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道哪种编程语言更适合自然语言处理.Java 还是 Python?我发现了很多关于它的问题和答案.但我仍然迷失在选择使用哪个.

I would like to know which programming language is better for natural language processing. Java or Python? I have found lots of questions and answers regarding about it. But I am still lost in choosing which one to use.

而且我想知道 Java 使用哪个 NLP 库,因为有很多库(LingPipe、GATE、OpenNLP、StandfordNLP).对于 Python,大多数程序员推荐 NLTK.

And I want to know which NLP library to use for Java since there are lots of libraries (LingPipe, GATE, OpenNLP, StandfordNLP). For Python, most programmers recommend NLTK.

但是如果我要从非结构化数据(只是自由格式的纯英文文本)中进行一些文本处理或信息提取以获取一些有用的信息,最好的选择是什么?Java 还是 Python?合适的库?

But if I am to do some text processing or information extraction from unstructured data (just free formed plain English text) to get some useful information, what is the best option? Java or Python? Suitable library?

更新

我想做的是从非结构化数据中提取有用的产品信息(例如,用户用不太标准的英语制作不同形式的关于手机或笔记本电脑的广告)

What I want to do is to extract useful product information from unstructured data (E.g. users make different forms of advertisement about mobiles or laptops with not very standard English language)

推荐答案

Java 与 Python 的 NLP 是非常偏好或必要的.根据公司/项目的不同,您需要使用其中一个,而且通常没有太多选择,除非您正在领导一个项目.

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you'll need to use one or the other and often there isn't much of a choice unless you're heading a project.

除了NLTK(www.nltk.org),在python中实际上还有其他用于文本处理的库强>:

Other than NLTK (www.nltk.org), there are actually other libraries for text processing in python:

  • TextBlob: http://textblob.readthedocs.org/en/dev/
  • Gensim: http://radimrehurek.com/gensim/
  • Pattern: http://www.clips.ua.ac.be/pattern
  • Spacy:: http://spacy.io
  • Orange: http://orange.biolab.si/features/
  • Pineapple: https://github.com/proycon/pynlpl

(有关更多信息,请参见 https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

对于 Java,还有很多其他的,但这里有另一个列表:

For Java, there're tonnes of others but here's another list:

  • Freeling: http://nlp.lsi.upc.edu/freeling/
  • OpenNLP: http://opennlp.apache.org/
  • LingPipe: http://alias-i.com/lingpipe/
  • Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/ (comes with wrappers for other languages, python included)
  • CogComp NLP: https://github.com/CogComp/cogcomp-nlp

这是基本字符串处理的一个很好的比较,参见 http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

GATE 与 UIMA 与 OpenNLP 的有用比较,参见 https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

如果您不确定哪种语言适合用于 NLP,我个人会说,任何可以为您提供所需分析/输出的语言",请参阅 学习哪种语言或工具进行自然语言处理?

If you're uncertain, which is the language to go for NLP, personally i say, "any language that will give you the desired analysis/output", see Which language or tools to learn for natural language processing?

这是最近(2017 年)的 NLP 工具:https://github.com/alvations/awesome-community-curated-nlp

Here's a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp

较早的 NLP 工具列表(2013 年):http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp

An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp

除了语言处理工具之外,您还非常需要机器学习工具来整合到NLP管道中.

Other than language processing tools, you would very much need machine learning tools to incorporate into NLP pipelines.

PythonJava 的范围很广,再次取决于偏好和库是否足够用户友好:

There's a whole range in Python and Java, and once again it's up to preference and whether the libraries are user-friendly enough:

python 中的机器学习库:

Machine Learning libraries in python:

  • Sklearn (Scikit-learn): http://scikit-learn.org/stable/
  • Milk: http://luispedro.org/software/milk
  • Scipy: http://www.scipy.org/
  • Theano: http://deeplearning.net/software/theano/
  • PyML: http://pyml.sourceforge.net/
  • pyBrain: http://pybrain.org/
  • Graphlab Create (Commerical tool but free academic license for 1 year): https://dato.com/products/create/

(有关更多信息,请参见 https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)

随着最近(2015 年)深度学习海啸在 NLP 中,您可能可以考虑:https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

出于非偏袒/中立的考虑,我将避免列出深度学习工具.

I'll avoid listing deep learning tools out of non-favoritism / neutrality.

同样要求 NLP/ML 工具的其他 Stackoverflow 问题:

Other Stackoverflow questions that also asked for NLP/ML tools:

这篇关于用于自然语言处理的 Java 或 Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆