在R中为特定单词标记词性 [英] Tagging part of speech for a particular word in R

查看:124
本文介绍了在R中为特定单词标记词性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的A列有句子,B列有一些单词.我想检查语音列B的部分单词是否属于A列中的句子.

I have a column A having sentences and column B have some words. I want to check the part of speech column B word belongs to sentence present in column A.

目前,我可以使用以下代码获得单个句子的词性:

Currently I am able to get part of speech for a single sentence using following code:

我正在尝试使词性与文本文件中的每个句子相对应.请为此提供建议代码.

I am trying to get part of speech corresponds to each sentence in text file. Please suggest code for this.

s <- unlist(lapply(posText, function(x) { str_split(x, "\n") }))

tagPOS <-  function(x, ...) {
  s <- as.String(x)
  word_token_annotator <- Maxent_Word_Token_Annotator()
  a2 <- Annotation(1L, "sentence", 1L, nchar(s))
  a2 <- annotate(s, word_token_annotator, a2)
  a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
  a3w <- a3[a3$type == "word"]
  POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
  POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
  list(POStagged = POStagged, POStags = POStags)
}

tagged_str <-  tagPOS(s)

推荐答案

使用lapply,您可以标记多个句子.由于您没有提供可复制的数据,因此我创建了自己的数据.

Using lapply you can tag multiple sentences. Since you didn't provide a reproducible data i created my own.

代码

#Reproducible data - Quotes from  Wuthering Heights by  Emily Bronte
posText<- "I gave him my heart, and he took and pinched it to death; and flung it back to me.
           People feel with their hearts, Ellen, and since he has destroyed mine, I have not power to feel for him."

library(stringr)
#Spliting into sentence based on carriage return
s <- unlist(lapply(posText, function(x) { str_split(x, "\n") }))

library(NLP)
library(openNLP)

tagPOS <-  function(x, ...) {
  s <- as.String(x)
  word_token_annotator <- Maxent_Word_Token_Annotator()
  a2 <- Annotation(1L, "sentence", 1L, nchar(s))
  a2 <- annotate(s, word_token_annotator, a2)
  a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
  a3w <- a3[a3$type == "word"]
  POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
  POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
  list(POStagged = POStagged, POStags = POStags)
}

result <- lapply(s,tagPOS)
result <- as.data.frame(do.call(rbind,result))

输出创建一个具有两列的数据帧,第一列是句子,该句子的单词带有用"/"分隔的标签.第二列具有按句子中出现的方式排列的标记集.

The output creates a data frame with two columns one being the sentence having the word with the tags separated by "/". The second column has the set of tags ordered in the manner of appearance in the sentence.

输出:

> print(result)
                                                                                                                                                                                     POStagged
1                             I/PRP gave/VBD him/PRP my/PRP$ heart/NN ,/, and/CC he/PRP took/VBD and/CC pinched/VBD it/PRP to/TO death/NN ;/: and/CC flung/VBD it/PRP back/RB to/TO me/PRP ./.
2 People/NNS feel/VBP with/IN their/PRP$ hearts/NNS ,/, Ellen/NNP ,/, and/CC since/IN he/PRP has/VBZ destroyed/VBN mine/NN ,/, I/PRP have/VBP not/RB power/NN to/TO feel/VB for/IN him/PRP ./.
                                                                                                 POStags
1        PRP, VBD, PRP, PRP$, NN, ,, CC, PRP, VBD, CC, VBD, PRP, TO, NN, :, CC, VBD, PRP, RB, TO, PRP, .
2 NNS, VBP, IN, PRP$, NNS, ,, NNP, ,, CC, IN, PRP, VBZ, VBN, NN, ,, PRP, VBP, RB, NN, TO, VB, IN, PRP, .

> 

这篇关于在R中为特定单词标记词性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆