Windows中未释放R内存 [英] R memory not released in Windows

查看:129
本文介绍了Windows中未释放R内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Windows 7中使用RStudio,但在向OS释放内存时遇到问题.下面是我的代码.在for循环中:

I am using RStudio in Windows 7 and I have a problem in releasing memory to the OS. Below is my code. In a for loop:

  • 我通过Census.gov网站的API调用读取数据,并使用包acs通过临时对象table将它们保存在.csv文件中.
  • 我删除了table(通常大小:几个MB),并且使用软件包pryr检查内存使用情况.
  • I read data through an API call of the Census.gov website and I use the package acs to save them in a .csv file via the temporary object table.
  • I remove the table (usual size: few MB), and I use the package pryr to check memory usage.

根据功能mem_used(),在删除table之后,R始终返回到恒定的内存使用量.相反,根据Windows任务管理器,rsession.exe(不是Rstudio)的内存分配在每次迭代时都会增加,并最终使rsession崩溃. gc()的使用无济于事.我已经阅读了许多类似的问题,但是似乎唯一的解决办法就是重新启动R会话,这似乎很愚蠢. 有什么建议吗?

According to the function mem_used(), after the removal of table, R always returns to a constant memory usage; instead according to Windows Task Manager the memory allocation for rsession.exe (not Rstudio) increases at every iteration and it eventually crashes the rsession. The use of gc() does not help. I have read lots of similar questions around but it seems that the only solution to free memory is to restart the R session which seems silly. Any suggestion?

   library(acs)
   library(pryr) 
   # for loop to extract tables from API and save them on API
   for (i in 128:length(tablecodes)) {
           tryCatch({table <- acs.fetch(table.number = tablecodes[i],endyear = 2014, span=5, 
                 geography = geo.make(state = "NY", county = "*", tract = "*"), 
                 key = "e24539dfe0e8a5c5bf99d78a2bb8138abaa3b851",col.names="pretty")},
             error = function(e){print("Table skipped") })

    # if the table is actually fetched then we save it 
    if (exists("table", mode="S4")) {         
         print(paste("Table",i,"fetched")
         if (!is.na(table)){
                   write.csv(estimate(table),paste("./CENSUS_tables/NY/",tablecodes[i],".csv",sep = ""))       
         }
    print(mem_used())  
    print(mem_change(rm(table)))
    gc()
    }
   }

推荐答案

我能够确认Windows 7上存在内存问题.(通过MacOSX上的VMware Fusion运行).它似乎也存在于MacOSX上,尽管内存使用量是逐渐变化的[未经证实,但表明内存泄漏]. 使用MacOSX会有些棘手,因为如果操作系统使用率很高,它将压缩内存.

I was able to confirm the memory problem exists on Windows 7. (Running via VMware Fusion on MacOSX). It also appears to exist on MacOSX, though memory usage appears quite gradual [Unconfirmed but indicative of memory leak]. Slightly tricky with MacOSX as the OS compresses memory if it sees high usage.

鉴于上述情况,我的建议是在从美国人口普查局下载表格时将表格下载集分成较小的组.为什么?好吧,看一下代码,您正在下载数据以存储在.CSV文件中.因此,短期内的解决方法是分解要下载的表的列表.您的程序应该能够在一组运行中成功完成.

My proposal in light of the above is to split the table download sets into smaller groups when you download from the US Census Bureau. Why? Well, looking at the code you are downloading the data to store in .CSV files. Hence, the workaround in the short term is to break up the list of tables you are downloading. Your program should be able to complete successfully across a set of runs.

一个选项是创建一个包装RScript,并使其在N次运行中运行,其中N次运行各自调用一个单独的R会话.即Rscript连续调用N个RSession,每个会话下载N个文件

nb.根据您的代码和观察到的内存使用情况,我的感觉是您正在下载大量表,因此在R个会话之间进行拆分可能是最好的选择.

nb. Based on your code, and observed memory usage, my sense is you are downloading a lot of the tables, hence splitting up across R session(s) may be the best option.

nb..在Windows 7上的cgiwin下,以下内容应适用.

nb. The following should work under cgiwin on Windows 7.

示例:下载主表01到27-如果不存在主表,请跳过...

Example: Download the primary tables 01 to 27 - if they do not exist skip...

!#/bin/bash

#Ref: https://censusreporter.org/topics/table-codes/
# Params: Primary Table Year Span

for CensusTableCode in $(seq -w 1 27)
do
  R --no-save -q --slave < ./PullCensus.R --args B"$CensusTableCode"001 2014 5
done

PullCensus.R

if (!require(acs)) install.packages("acs")
if (!require(pryr)) install.packages("pryr")

# You can obtain a US Census key from the developer site
# "e24539dfe0e8a5c5bf99d78a2bb8138abaa3b851"
api.key.install(key = "** Secret**")

setwd("~/dev/stackoverflow/37264919")

# Extract Table Structure
#
# B = Detailed Column Breakdown
# 19 = Income (Households and Families)
# 001 =
# A - I = Race
#

args <- commandArgs(trailingOnly = TRUE) # trailingOnly=TRUE means that only your arguments are returned

if ( length(args) != 0 ) {
    tableCodes <- args[1]
    defEndYear = args[2]
    defSpan = args[3]
  } else {
  tableCodes <- c("B02001")
  defEndYear = 2014
  defSpan = 5
}

# for loop to extract tables from API and save them on API
for (i in 1:length(tableCodes))
{
  tryCatch(
    table <- acs.fetch(table.number = tableCodes[i],
                       endyear = defEndYear,
                       span = defSpan,
                       geography = geo.make(state = "NY",
                                            county = "*",
                                            tract = "*"),
                       col.names = "pretty"),
    error = function(e) { print("Table skipped")} )

  # if the table is actually fetched then we save it
  if (exists("table", mode = "S4"))
  {
    print(paste("Table", i, "fetched"))
    if (!is.na(table))
    {
      write.csv(estimate(table), paste(defEndYear,"_",tableCodes[i], ".csv", sep = ""))
    }
    print(mem_used())
    print(mem_change(rm(table)))
    gc(reset = TRUE)
    print(mem_used())
  }
}

我希望以上示例可以为您提供帮助.这是一种方法. ;-)

I hope the above helps by way of example. It is an approach. ;-)

T.

我将看一下软件包的源代码,看看是否可以看到实际出了什么问题.另外,您自己也可以缩小范围,并针对程序包提交错误.

I'll take a look at the package source to see if I can see what is actually wrong. Alternatively, you yourself may be able to narrow it down and file a bug against the package.

我的感觉是,这可能有助于提供一个工作代码示例来框架上述解决方法.为什么?这里的目的是提供一个示例,人们可以用来测试和考虑正在发生的事情.为什么?好吧,这样可以更轻松地理解您的问题和意图.

My sense is that is might help to provide a working code example to frame the above workaround. Why? The intent here is to provide an example that people might use to test and consider what is happening. Why? Well, it makes it easier to understand your question and intent.

从本质上讲,(据我所知)您正在从美国人口普查网站批量下载美国人口普查数据.表格代码用于指定您要下载的数据.好的,所以我只创建了一组表代码并测试了内存使用情况,以查看是否按照您的说明开始消耗内存.

Essentially, (as I understand it) you are bulk downloading US Census data from the US Census website. The table codes are used to specify what data you wish to download. Ok, so I just created a set of table codes and tested the memory usage to see if memory starts to be consumed as you explained.

library(acs)
library(pryr)
library(tigris)
library(stringr)  # to pad fips codes
library(maptools)

# You can obtain a US Census key from the developer site
# "e24539dfe0e8a5c5bf99d78a2bb8138abaa3b851"
api.key.install(key = "<INSERT KEY HERE>")

# Table Codes
#
# While Census Reporter hopes to save you from the details, you may be
# interested to understand some of the rationale behind American Community
# Survey table identifiers.
#
# Detailed Tables
#
# The bulk of the American Community Survey is the over 1400 detailed data
# tables. These tables have reference codes, and knowing how the codes are
# structured can be helpful in knowing which table to use.
#
# Codes start with either the letter B or C, followed by two digits for the
# table subject, then 3 digits that uniquely identify the table. (For a small
# number of technical tables the unique identifier is 4 digits.) In some cases
# additional letters for racial iterations and Puerto Rico-specific tables.
#
# Full and Collapsed Tables
#
# Tables beginning with B have the most detailed column breakdown, while a
# C table for the same numbers will have fewer columns. For example, the
# B02003 table ("Detailed Race") has 71 columns, while the "collapsed
# version," C02003 has only 19 columns. While your instinct may be to want
# as much data as possible, sometimes choosing the C table can simplify
# your analysis.
#
# Table subjects
#
# The first two digits after B/C indicate the broad subject of a table.
# Note that many tables have more than one subject, but this reflects the
# main subject.
#
# 01 Age and Sex
# 02 Race
# 03 Hispanic Origin
# 04 Ancestry
# 05 Foreign Born; Citizenship; Year or Entry; Nativity
# 06 Place of Birth07Residence 1 Year Ago; Migration
# 08 Journey to Work; Workers' Characteristics; Commuting
# 09 Children; Household Relationship
# 10 Grandparents; Grandchildren
# 11 Household Type; Family Type; Subfamilies
# 12 Marital Status and History13Fertility
# 14 School Enrollment
# 15 Educational Attainment
# 16 Language Spoken at Home and Ability to Speak English
# 17 Poverty
# 18 Disability
# 19 Income (Households and Families)
# 20 Earnings (Individuals)
# 21 Veteran Status
# 22 Transfer Programs (Public Assistance)
# 23 Employment Status; Work Experience; Labor Force
# 24 Industry; Occupation; Class of Worker
# 25 Housing Characteristics
# 26 Group Quarters
# 27 Health Insurance
#
# Three groups of tables reflect technical details about how the Census is
# administered. In general, you probably don't need to look at these too
# closely, but if you need to check for possible weaknesses in your data
# analysis, they may come into play.
#
# 00 Unweighted Count
# 98 Quality Measures
# 99 Imputations
#
# Race and Latino Origin
#
# Many tables are provided in multiple racial tabulations. If a table code
# ends in a letter from A-I, that code indicates that the table universe is
# restricted to a subset based on responses to the race or
# Hispanic/Latino-origin questions.
#
# Here is a guide to those codes:
#
#   A White alone
#   B Black or African American Alone
#   C American Indian and Alaska Native Alone
#   D Asian Alone
#   E Native Hawaiian and Other Pacific Islander Alone
#   F Some Other Race Alone
#   G Two or More Races
#   H White Alone, Not Hispanic or Latino
#   I Hispanic or Latino


setwd("~/dev/stackoverflow/37264919")

# Extract Table Structure
#
# B = Detailed Column Breakdown
# 19 = Income (Households and Families)
# 001 =
# A - I = Race
#
tablecodes <- c("B19001", "B19001A", "B19001B", "B19001C", "B19001D",
                "B19001E", "B19001F", "B19001G", "B19001H", "B19001I" )

# for loop to extract tables from API and save them on API
for (i in 1:length(tablecodes))
{
  print(tablecodes[i])
  tryCatch(
    table <- acs.fetch(table.number = tablecodes[i],
                       endyear = 2014,
                       span = 5,
                       geography = geo.make(state = "NY",
                                            county = "*",
                                            tract = "*"),
                       col.names = "pretty"),
    error = function(e) { print("Table skipped")} )

  # if the table is actually fetched then we save it
  if (exists("table", mode="S4"))
  {
    print(paste("Table", i, "fetched"))
    if (!is.na(table))
    {
      write.csv(estimate(table), paste("T",tablecodes[i], ".csv", sep = ""))
    }
    print(mem_used())
    print(mem_change(rm(table)))
    gc()
    print(mem_used())
  }
}

运行时输出

> library(acs)
> library(pryr)
> library(tigris)
> library(stringr)  # to pad fips codes
> library(maptools)
> # You can obtain a US Census key from the developer site
> # "e24539dfe0e8a5c5bf99d78a2bb8138abaa3b851"
> api.key.install(key = "...secret...")
> 
...
> setwd("~/dev/stackoverflow/37264919")
> 
> # Extract Table Structure
> #
> # B = Detailed Column Breakdown
> # 19 = Income (Households and Families)
> # 001 =
> # A - I = Race
> #
> tablecodes <- c("B19001", "B19001A", "B19001B", "B19001C", "B19001D",
+                 "B19001E", "B19001F", "B19001G", "B19001H", "B19001I" )
> 
> # for loop to extract tables from API and save them on API
> for (i in 1:length(tablecodes))
+ {
+   print(tablecodes[i])
+   tryCatch(
+     table <- acs.fetch(table.number = tablecodes[i],
+                        endyear = 2014,
+                        span = 5,
+                        geography = geo.make(state = "NY",
+                                             county = "*",
+                                             tract = "*"),
+                        col.names = "pretty"),
+     error = function(e) { print("Table skipped")} )
+ 
+   # if the table is actually fetched then we save it
+   if (exists("table", mode="S4"))
+   {
+     print(paste("Table", i, "fetched"))
+     if (!is.na(table))
+     {
+       write.csv(estimate(table), paste("T",tablecodes[i], ".csv", sep = ""))
+     }
+     print(mem_used())
+     print(mem_change(rm(table)))
+     gc()
+     print(mem_used())
+   }
+ }
[1] "B19001"
[1] "Table 1 fetched"
95.4 MB
-1.88 MB
93.6 MB
[1] "B19001A"
[1] "Table 2 fetched"
95.4 MB
-1.88 MB
93.6 MB
[1] "B19001B"
[1] "Table 3 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001C"
[1] "Table 4 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001D"
[1] "Table 5 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001E"
[1] "Table 6 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001F"
[1] "Table 7 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001G"
[1] "Table 8 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001H"
[1] "Table 9 fetched"
95.5 MB
-1.88 MB
93.6 MB
[1] "B19001I"
[1] "Table 10 fetched"
95.5 MB
-1.88 MB
93.6 MB


输出文件

>ll
total 8520
drwxr-xr-x@ 13 hidden  staff   442B Oct 17 20:41 .
drwxr-xr-x@ 40 hidden  staff   1.3K Oct 17 23:17 ..
-rw-r--r--@  1 hidden  staff   4.4K Oct 17 23:43 37264919.R
-rw-r--r--@  1 hidden  staff   492K Oct 17 23:50 TB19001.csv
-rw-r--r--@  1 hidden  staff   472K Oct 17 23:51 TB19001A.csv
-rw-r--r--@  1 hidden  staff   414K Oct 17 23:51 TB19001B.csv
-rw-r--r--@  1 hidden  staff   387K Oct 17 23:51 TB19001C.csv
-rw-r--r--@  1 hidden  staff   403K Oct 17 23:51 TB19001D.csv
-rw-r--r--@  1 hidden  staff   386K Oct 17 23:51 TB19001E.csv
-rw-r--r--@  1 hidden  staff   402K Oct 17 23:51 TB19001F.csv
-rw-r--r--@  1 hidden  staff   393K Oct 17 23:52 TB19001G.csv
-rw-r--r--@  1 hidden  staff   465K Oct 17 23:44 TB19001H.csv
-rw-r--r--@  1 hidden  staff   417K Oct 17 23:44 TB19001I.csv

这篇关于Windows中未释放R内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆