在.NET项目中提取文本上下文相关条款(名词短语) [英] Extracting terms with contextual relevance (noun phrases) from text in a .NET project

查看:204
本文介绍了在.NET项目中提取文本上下文相关条款(名词短语)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想弄清楚进行术语提取上自由消费的文本数据的好方法。在理想的情况下,我可以提取至少两个单词,其中包括一些其使用的范围内的名词短语。这些都是我的理想设定的要求。

  • 名词短语提取
  • 易于集成在一个.NET项目
  • 在无第三方服务集成

我已经做了一些研究,已经和我已经包括下面的一些注意事项。

有许多不同的自然语言处理的库在那里。最大的竞争者似乎 NLTK OpenNLP 。既支持标记化文本数据和除其他事项外,名词短语提取。然而,无论是实施.NET和将需要某种IPC层。他们俩也有相当高的学习曲线。

SharperNLP 是OpenNLP的C#端口。它有活动的简要乱舞在2006年,但数量不多至今。

下面是从别人的一些注意事项谁试图与NLTK在.NET实现使用IronPython的集成。

开源NLP在C#3.5使用NLTK

我迄今为止发现的最简单的方法是SQL Server集成服务期限提取,转换。这是非常简单的配置和启动和运行。它能够提取有意义的名词短语准确的高度。然而,它有许多局限。

  • 这是一个SSIS包,非常适合解析文本后的,事实,但不是实时的。
  • 这需要SQL Server企业许可证。
  • 在它仅支持英语支持其他语言没有计划。

要关闭,我意识到我的要求可能有点过于严格,所以请不要犹豫与任何类型的解决方案,至少提取名词短语句子片段来回答。

解决方案

我已经做了一些研究,发现了一个简单的方法来使用OpenNLP在.NET项目,被称为IKVM.NET工具的帮助。欲了解更多有关如何端口OpenNLP罐子为.NET程序集请参阅下面的OpenNLP维基文章。

<一个href="https://cwiki.apache.org/confluence/display/OPENNLP/Introduction+to+using+openNLP+in+.NET+Projects"相对=nofollow>从.NET

有关我的解决方案的更多信息,请查看下面的帖子。

<一个href="http://randonom.com/blog/2012/08/extracting-noun-phrases-with-contextual-relevance-in-net-using-opennlp/"相对=nofollow>使用.NET提取名词短语与上下文相关OpenNLP

I would like to figure out a good way to perform term extraction on freeform consumer text data. In an ideal scenario I could extract noun phrases of at least two words that include some kind of context of their usage. These are my ideal set of requirements.

  • Noun phrase extraction
  • Easy integration in a .NET project
  • No 3rd party service integration

I've done some research already and I've included some notes below.

There are many different NLP libraries out there. The big contenders appear to be NLTK and OpenNLP. Both support tokenizing text data and extracting among other things, noun phrases. However, neither are implemented in .NET and some kind of IPC layer would be required. They both also have fairly high learning curves.

SharperNLP is a C# port of OpenNLP. It had a brief flurry of activity in 2006, but not much since then.

Here are some notes from someone who attempted to integrate with NLTK in a .NET implementation using IronPython.

Open Source NLP in C# 3.5 using NLTK.

The easiest solution I've found so far is the SQL Server Integration Services Term Extraction Transformation. It was very simple to configure and get up and running. It was able to extract meaningful noun phrases with a high degree of accuracy. However, it has a number of limitations.

  • It's an SSIS package, great for parsing text after-the-fact, but not in real time.
  • It requires SQL Server enterprise license.
  • It only supports English with no plans of supporting other languages.

To close, I realize my requirements may be a little too strict so please don't hesitate to answer with any type of solution that at least extracts noun phrase sentence fragments.

解决方案

I've done some research and found an easy way to use OpenNLP in a .NET project with the help of a tool known as IKVM.NET. For more information on how to port OpenNLP jars to a .NET assembly see the following OpenNLP wiki post.

A quick guide to using OpenNLP from .NET

For more information on my solution check out the following post.

Extracting noun phrases with contextual relevance in .NET using OpenNLP

这篇关于在.NET项目中提取文本上下文相关条款(名词短语)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆