使用POS标签确定句子的时间性 [英] determine the temporality of a sentence with POS tagging
问题描述
如果要从一系列句子中进行操作,我想找出是否已执行了一项操作.
例如:
"I will prescribe this medication"
与"I prescribed this medication"
或"He had already taken the stuff"
与"he may take the stuff later"
I want to find out whether an action has been carried out if will be carried out from a series of sentences.
For example:
"I will prescribe this medication"
versus "I prescribed this medication"
or "He had already taken the stuff"
versus "he may take the stuff later"
我正在尝试tidytext
方法,并决定只查找过去分词动词和将来分词动词.但是,当我使用唯一的动词类型进行POS标记时,则是"Verb intransitive"
,"Verb (usu participle)"
和"Verb (transitive)"
.如何了解过去或将来的动词,或者我可以使用其他POS标记器?
I was trying a tidytext
approach and decided to simply look for past participle versus future participle verbs. However when I POS tag using the only types of verbs I get are "Verb intransitive"
, "Verb (usu participle)"
and "Verb (transitive)"
. How can I get an idea of past or future verbs or is there another POS tagger I can use?
我热衷于使用tidytext
,因为我无法安装其他某些文本挖掘程序包使用的rjava
.
I am keen to use tidytext
because I cannot install rjava
which some of the other text mining packages use.
推荐答案
查看udpipe
批注中的形态特征.这些内容放在注释的专区列中.您可以使用cbind_morphological
将它们作为额外的列放入数据集中.
所有功能均在 https://universaldependencies.org/u/feat/index.html
您会在下面看到我已开这种药"这句话中的过去式以及他已经服用"中的已服用"一词.
Look at the morphological features from the udpipe
annotation. These are put in the feats column of the annotation. And you can put these as extra columns in the dataset by using cbind_morphological
.
All the features are defined at https://universaldependencies.org/u/feat/index.html
You'll see below that prescribed from the sentence 'I prescribed this medication' is past tense as well as the word taken and had from 'he had already taken'.
library(udpipe)
x <- data.frame(doc_id = 1:4,
text = c("I will prescribe this medication",
"I prescribed this medication",
"He had already taken the stuff",
"he may take the stuff later"),
stringsAsFactors = FALSE)
anno <- udpipe(x, "english")
anno <- cbind_morphological(anno)
anno[, c("doc_id", "token", "lemma", "feats", "morph_verbform", "morph_tense")]
doc_id token lemma feats morph_verbform morph_tense
1 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
1 will will VerbForm=Fin Fin <NA>
1 prescribe prescribe VerbForm=Inf Inf <NA>
1 this this Number=Sing|PronType=Dem <NA> <NA>
1 medication medication Number=Sing <NA> <NA>
2 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
2 prescribed prescribe Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
2 this this Number=Sing|PronType=Dem <NA> <NA>
2 medication medication Number=Sing <NA> <NA>
3 He he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
3 had have Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
3 already already <NA> <NA> <NA>
3 taken take Tense=Past|VerbForm=Part Part Past
3 the the Definite=Def|PronType=Art <NA> <NA>
3 stuff stuff Number=Sing <NA> <NA>
4 he he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
4 may may VerbForm=Fin Fin <NA>
4 take take VerbForm=Inf Inf <NA>
4 the the Definite=Def|PronType=Art <NA> <NA>
4 stuff stuff Number=Sing <NA> <NA>
4 later later <NA> <NA> <NA>
这篇关于使用POS标签确定句子的时间性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!