R:使用`strsplit`耗尽内存 [英] R: running out of memory using `strsplit`

查看:106
本文介绍了R:使用`strsplit`耗尽内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用strsplit耗尽了内存(大概);这是代码:

I am running out of memory using strsplit (presumably); here is the code:

split.fields <- function (frame, fields, split, suffix, ...) {
  for (field in fields) {
    v <- sapply(strsplit(frame[[field]],"@",...),"[",1)
    frame[[paste0(field,suffix)]] <- frame[[field]]
    frame[[field]] <- v
  }
  frame
}
split.version <- function (frame, fields)
  split.fields(frame, fields, split="@", suffix="Ver", fixed=TRUE)
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 238165 12.8     467875   25   407500 21.8
Vcells 369492  2.9     905753    7   905631  7.0
> frame <- data.frame(browser = sample(c("Chrome@28","Chrome@27","Firefox@21","Firefox@22","IE@9","IE@8"), 1e7, replace=TRUE), stringsAsFactors=FALSE)
> str(frame)
'data.frame':   10000000 obs. of  1 variable:
 $ browser: chr  "IE@8" "Chrome@27" "Chrome@27" "Chrome@27" ...
> object.size(frame)
80000992 bytes
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells   240555 12.9     467875  25.0   407500  21.8
Vcells 10373979 79.2   34109873 260.3 40534688 309.3
> system.time(frame <- split.version(frame,"browser"))
   user  system elapsed 
 73.700   0.872  74.831 
> object.size(frame)
160001248 bytes
> str(frame)
'data.frame':   10000000 obs. of  2 variables:
 $ browser   : chr  "IE" "Chrome" "Chrome" "Chrome" ...
 $ browserVer: chr  "IE@8" "Chrome@27" "Chrome@27" "Chrome@27" ...
> gc()
           used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells   264888  14.2   16652260 889.4  31376740 1675.7
Vcells 20459856 156.1   95461025 728.4 119226749  909.7

除了R进程的RSS现在为 1.6G .这看起来或多或少都是合理的.

This all looks more or less reasonable except that the R process's RSS is now 1.6G.

这似乎暗示max used中的1675.7Mb Ncell 列尚未返回到操作系统.

This appears to imply that the 1675.7Mb of Ncells in the max used column have not been returned to the OS.

我不太在乎操作系统是否不取回RAM,我在乎什么 是要处理分配给1.6G的80M数据R(在我的真实数据上 用完了可用的物理RAM)

I don't care much about the OS not getting back the RAM, what I do care is that to process 80M of data R allocated 1.6G (and on my real data it runs out of the physical RAM available)

有没有办法提高内存的使用效率?

Is there a way to make this more memory efficient?

例如,可能将字符向量转换为一个因子并对其进行运算 它的水平会有所帮助吗?

E.g., maybe converting the character vector to a factor and operating on its levels would help?

R version 3.0.1 (2013-05-16) -- "Good Sport"
Platform: x86_64-pc-linux-gnu (64-bit)

推荐答案

如何使用substrregexpr:

x <- c("Chrome@28","Chrome@27","Firefox@21","IE@8")
substr(x,1,regexpr("@",x)-1)
[1] "Chrome"  "Chrome"  "Firefox" "IE" 

这篇关于R:使用`strsplit`耗尽内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆