skipgrams 上下文中的关键字(kwic)? [英] Keyword in context (kwic) for skipgrams?

查看:92
本文介绍了skipgrams 上下文中的关键字(kwic)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 quanteda 对 ngram 和标记进行上下文分析中的关键字,并且效果很好.我现在想为skipgrams做这件事,捕捉进入障碍"的上下文;以及[...] [和] 进入的障碍.

I do keyword in context analysis with quanteda for ngrams and tokens and it works well. I now want to do it for skipgrams, capture the context of "barriers to entry" but also "barriers to [...] [and] entry.

以下代码是一个空的 kwic 对象,但我不知道我做错了什么.dcc.corpus 是指文本文档.我也使用了标记化版本,但没有任何变化.

The following code a kwic object which is empty but I don't know what I did wrong. dcc.corpus refers to the text document. I also used the tokenized version but nothing changes.

结果是:

具有 0 行的 kwic 对象"

"kwic object with 0 rows"

x <- tokens("barriers entry")
ntoken_test <- tokens_ngrams(x, n = 2, skip = 0:4, concatenator = " ")
twic_skipgram <-  kwic(doc.corpus, pattern = list(ntoken_test), window=20, valuetype= "glob")

twic_skipgram

推荐答案

可能最简单的方法是使用通配符来表示跳过".

Probably the easiest way is wildcarding to represent the "skip".

library("quanteda")
## Package version: 2.1.1

txt <- c(
  "There are barriers to entry.",
  "Also barriers against entry.",
  "Just barriers entry."
)

# for skip of 1
kwic(txt, phrase("barriers * entry"))
##                                                     
##  [text1, 3:5] There are |   barriers to entry    | .
##  [text2, 2:4]      Also | barriers against entry | .

# for skip of 0 and 1
kwic(txt, phrase(c("barriers * entry", "barriers entry")))
##                                                     
##  [text1, 3:5] There are |   barriers to entry    | .
##  [text2, 2:4]      Also | barriers against entry | .
##  [text3, 2:3]      Just |     barriers entry     | .

这篇关于skipgrams 上下文中的关键字(kwic)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆