使用POS标签确定句子的时间性 [英] determine the temporality of a sentence with POS tagging

查看:133
本文介绍了使用POS标签确定句子的时间性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果要从一系列句子中进行操作,我想找出是否已执行了一项操作. 例如: "I will prescribe this medication""I prescribed this medication""He had already taken the stuff""he may take the stuff later"

I want to find out whether an action has been carried out if will be carried out from a series of sentences. For example: "I will prescribe this medication" versus "I prescribed this medication" or "He had already taken the stuff" versus "he may take the stuff later"

我正在尝试tidytext方法,并决定只查找过去分词动词和将来分词动词.但是,当我使用唯一的动词类型进行POS标记时,则是"Verb intransitive""Verb (usu participle)""Verb (transitive)".如何了解过去或将来的动词,或者我可以使用其他POS标记器?

I was trying a tidytext approach and decided to simply look for past participle versus future participle verbs. However when I POS tag using the only types of verbs I get are "Verb intransitive", "Verb (usu participle)" and "Verb (transitive)". How can I get an idea of past or future verbs or is there another POS tagger I can use?

我热衷于使用tidytext,因为我无法安装其他某些文本挖掘程序包使用的rjava.

I am keen to use tidytext because I cannot install rjava which some of the other text mining packages use.

推荐答案

查看udpipe批注中的形态特征.这些内容放在注释的专区列中.您可以使用cbind_morphological将它们作为额外的列放入数据集中. 所有功能均在 https://universaldependencies.org/u/feat/index.html 您会在下面看到我已开这种药"这句话中的过去式以及他已经服用"中的已服用"一词.

Look at the morphological features from the udpipe annotation. These are put in the feats column of the annotation. And you can put these as extra columns in the dataset by using cbind_morphological. All the features are defined at https://universaldependencies.org/u/feat/index.html You'll see below that prescribed from the sentence 'I prescribed this medication' is past tense as well as the word taken and had from 'he had already taken'.

library(udpipe)
x <- data.frame(doc_id = 1:4, 
                text = c("I will prescribe this medication", 
                         "I prescribed this medication", 
                         "He had already taken the stuff", 
                         "he may take the stuff later"), 
                stringsAsFactors = FALSE)
anno <- udpipe(x, "english")
anno <- cbind_morphological(anno)

anno[, c("doc_id", "token", "lemma", "feats", "morph_verbform", "morph_tense")]

 doc_id      token      lemma                                                  feats morph_verbform morph_tense
      1          I          I             Case=Nom|Number=Sing|Person=1|PronType=Prs           <NA>        <NA>
      1       will       will                                           VerbForm=Fin            Fin        <NA>
      1  prescribe  prescribe                                           VerbForm=Inf            Inf        <NA>
      1       this       this                               Number=Sing|PronType=Dem           <NA>        <NA>
      1 medication medication                                            Number=Sing           <NA>        <NA>
      2          I          I             Case=Nom|Number=Sing|Person=1|PronType=Prs           <NA>        <NA>
      2 prescribed  prescribe                       Mood=Ind|Tense=Past|VerbForm=Fin            Fin        Past
      2       this       this                               Number=Sing|PronType=Dem           <NA>        <NA>
      2 medication medication                                            Number=Sing           <NA>        <NA>
      3         He         he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs           <NA>        <NA>
      3        had       have                       Mood=Ind|Tense=Past|VerbForm=Fin            Fin        Past
      3    already    already                                                   <NA>           <NA>        <NA>
      3      taken       take                               Tense=Past|VerbForm=Part           Part        Past
      3        the        the                              Definite=Def|PronType=Art           <NA>        <NA>
      3      stuff      stuff                                            Number=Sing           <NA>        <NA>
      4         he         he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs           <NA>        <NA>
      4        may        may                                           VerbForm=Fin            Fin        <NA>
      4       take       take                                           VerbForm=Inf            Inf        <NA>
      4        the        the                              Definite=Def|PronType=Art           <NA>        <NA>
      4      stuff      stuff                                            Number=Sing           <NA>        <NA>
      4      later      later                                                   <NA>           <NA>        <NA>

这篇关于使用POS标签确定句子的时间性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆