使用C#的POS Tagger Stanford NPL每个句子的部分语音标签列表 [英] List of part of speech tags per sentence with POS Tagger Stanford NPL in C#

查看:130
本文介绍了使用C#的POS Tagger Stanford NPL每个句子的部分语音标签列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Stanford NPL .NET的POS Tagger,我试图提取每个句子的部分语音标签的详细列表.

Using the POS Tagger of Stanford NPL .NET, I'm trying to extract a detailed list of part of speech tags per sentence.

例如:去那儿看看.看看车!"

e.g: "Have a look over there. Look at the car!"

具有/VB a/DT外观/NN over/IN那里/RB ./.在/DT车/NN处/中查看/VB !/.

Have/VB a/DT look/NN over/IN there/RB ./. Look/VB at/IN the/DT car/NN !/.

我需要:

  • POS文本:具有"
  • POS标签:"VB"
  • 原始文本中的位置

我设法通过反射访问结果的私有字段来实现这一点.

I managed to achieve this by accessing the private fields of the result via reflection.

我知道它很丑陋,效率不高而且非常糟糕,但这是我唯一发现的,直到知道.因此,我的问题是:有没有内置的方式来访问这些信息?

I know it's ugly, not efficient and very bad, but that's the only I found until know. Hence my question; is there any built-in way to access such information?

using (var streamReader = new StringReader(rawText))
{
    var tokenizedSentences = MaxentTagger.tokenizeText(streamReader).toArray();

    foreach (ArrayList tokenizedSentence in tokenizedSentences)
    {
        var taggedSentence = _posTagger.tagSentence(tokenizedSentence).toArray();

        for (int index = 0; index < taggedSentence.Length; index++)
        {
            var partOfSpeech = ((StringLabel) (taggedSentence[index]));
            var posText = partOfSpeech.value();

            var posTag = ReflectionHelper.GetInstanceField(typeof (TaggedWord), partOfSpeech, "tag") as string;
            var posBeginPosition = (int)ReflectionHelper.GetInstanceField(typeof (StringLabel), partOfSpeech, "beginPosition");
            var posEndPosition = (int)ReflectionHelper.GetInstanceField(typeof (StringLabel), partOfSpeech, "endPosition");

            // process the pos
        }
    } 

ReflectionHelper:

ReflectionHelper:

public static object GetInstanceField<T>(T instance, string fieldName)
{
    const BindingFlags bindFlags = BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static;

    object result = null;
    var field = typeof(T).GetField(fieldName, bindFlags);
    if (field != null)
    {
        result = field.GetValue(instance);
    }
    return result;
}

推荐答案

该解决方案非常简单. 只需将词性(taggedSentence [index])转换为TaggedWord. 然后,您可以从getter的beginPosition(),endPosition(),tag()和value()轻松访问这些属性.

The solution is pretty easy. Just cast the part of speech (taggedSentence[index]) to a TaggedWord. You can then easily access these properties from the getters beginPosition(), endPosition(), tag() and value().

这篇关于使用C#的POS Tagger Stanford NPL每个句子的部分语音标签列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆