无法让 tm_map 使用 mc.cores 参数 [英] unable to get tm_map to use mc.cores argument

查看:29
本文介绍了无法让 tm_map 使用 mc.cores 参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含超过 1000 万个文档的大型语料库.每当我尝试使用 mc.cores 参数对多个内核进行转换时,我都会收到错误:

I have a large corpus with over 10M documents. Whenever I try a transformation over multiple cores using mc.cores argument I get error:

Error in FUN(content(x), ...) : unused argument (mc.cores = 10)

我目前托管的 r 工作室中有 15 个可用内核.

I have 15 available cores in my current hosted r studio.

# I have a corpus
> inspect(corpus[1])
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 1

[[1]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 46

> length(corpus)
[1] 10255313

观察当我尝试使用 tm_map 进行转换时会发生什么

Watch what happens when I try to make transformations using tm_map

library(tidyverse)
library(qdap)
library(stringr)
library(tm)
library(textstem)
library(stringi)
library(SnowballC)

例如

> corpus <- tm_map(corpus, content_transformer(replace_abbreviation), mc.cores = 10)
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)

尝试添加 lazy = T

Tried adding lazy = T

corpus <- tm_map(corpus, content_transformer(replace_abbreviation), mc.cores = 10, lazy = T) # read the documentation, still don't really get what this does

转换后如果我去例如

> corpus[[1]][1] I get:
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)

而在我得到之前:

> corpus.beforetransformation[[1]][1]
$content
[1] "here is some text"

我在这里做错了什么?如何使用 mc.cores 参数来使用更多处理器?

What am I doing wrong here? How can I use mc.cores argument to use more of my processors?

可重现的例子:

sometext <- c("cats dogs rabbits", "oranges banannas pears", "summer fall winter") %>% 
  data.frame(stringsAsFactors = F) %>% DataframeSource %>% VCorpus

corpus.example <- tm_map(sometext, content_transformer(replace_abbreviation), mc.cores = 2, lazy = T)
corpus.example[[1]][1]

推荐答案

来自 tm 文档,请尝试以下操作:

From the tm documentation, try the following:

options(mc.cores = 10)  # or whatever
tm_parLapply_engine(parallel::mclapply)  # mclapply gets the number of cores from global options
tm_map(sometext, content_transformer(replace_abbreviation))

这篇关于无法让 tm_map 使用 mc.cores 参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆