通过引用在函数中添加大量列 [英] adding a large number of columns within a function by reference

查看:135
本文介绍了通过引用在函数中添加大量列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这种情况下,我无法弄清data.table的情况:

  fooFun < tbl,totCols){
tbl [,paste0(col,1:totCols):= 0]
}

从一个空的1-col数据表开始。

 然后通过引用添加99列:

p>

 > fooFun(tbl,99)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99

现在添加第100列:

 > fooFun(tbl,100)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99

不存在...现在在函数调用之外添加一个列:

 > tbl [,newCol:= 5] 
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newCol

现在添加第100列:

 > fooFun(tbl,100)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newColcol100

现在再添加20个:

 > fooFun(tbl,120)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newColcol100col101col102col103col104col105col106col107col108 [113]col111col112col113col114col115col116col117col118col119col120

看起来OK。现在添加一堆:

 > fooFun(tbl,240)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newColcol100col101col102col103col104col105col106col107col108 col118col119col128col121col122col123col124
[col126]col112 127]col125col126col127col128col129col130col131col132col133col134col135col136col137col138 ]col139col140col141col142col143col144col145col146col147col148col149col150col151 col153col154col155col156col157col158col159col160col161col162col163col164col165col166
col167col168col169col170col171col172col173col174col175col176col177col178col179col180
col182col183col184col185col186col187col188col189col190col191col192col193 col196col197col198



发生了什么事?




解决方案

@Arun指出这个问题已在邮件列表中解决:#5204 。按照该线程中的建议,我增加了在创建data.table时alloc_的列指针的默认数量:

  options(datatable.alloccol = 900)

在已创建的表上增加列数超过100时分配100列。这可以解决在达到预分配限制时浅复制对象的基本问题,以便为此SO问题中的测试产生预期行为。


I can't figure out what's going on with data.table in this situation:

fooFun <- function(tbl, totCols) {
    tbl[, paste0("col", 1:totCols) := 0]
}

Start with an empty 1-col data table.

> tbl = data.table(initialCol=double())

Then add 99 columns by reference:

> fooFun(tbl, 99)
> colnames(tbl)
  [1] "initialCol" "col1"       "col2"       "col3"       "col4"       "col5"       "col6"       "col7"       "col8"       "col9"       "col10"      "col11"      "col12"      "col13"     
 [15] "col14"      "col15"      "col16"      "col17"      "col18"      "col19"      "col20"      "col21"      "col22"      "col23"      "col24"      "col25"      "col26"      "col27"     
 [29] "col28"      "col29"      "col30"      "col31"      "col32"      "col33"      "col34"      "col35"      "col36"      "col37"      "col38"      "col39"      "col40"      "col41"     
 [43] "col42"      "col43"      "col44"      "col45"      "col46"      "col47"      "col48"      "col49"      "col50"      "col51"      "col52"      "col53"      "col54"      "col55"     
 [57] "col56"      "col57"      "col58"      "col59"      "col60"      "col61"      "col62"      "col63"      "col64"      "col65"      "col66"      "col67"      "col68"      "col69"     
 [71] "col70"      "col71"      "col72"      "col73"      "col74"      "col75"      "col76"      "col77"      "col78"      "col79"      "col80"      "col81"      "col82"      "col83"     
 [85] "col84"      "col85"      "col86"      "col87"      "col88"      "col89"      "col90"      "col91"      "col92"      "col93"      "col94"      "col95"      "col96"      "col97"     
 [99] "col98"      "col99" 

All looks good. Now add the 100th column:

> fooFun(tbl, 100)
> colnames(tbl)
  [1] "initialCol" "col1"       "col2"       "col3"       "col4"       "col5"       "col6"       "col7"       "col8"       "col9"       "col10"      "col11"      "col12"      "col13"     
 [15] "col14"      "col15"      "col16"      "col17"      "col18"      "col19"      "col20"      "col21"      "col22"      "col23"      "col24"      "col25"      "col26"      "col27"     
 [29] "col28"      "col29"      "col30"      "col31"      "col32"      "col33"      "col34"      "col35"      "col36"      "col37"      "col38"      "col39"      "col40"      "col41"     
 [43] "col42"      "col43"      "col44"      "col45"      "col46"      "col47"      "col48"      "col49"      "col50"      "col51"      "col52"      "col53"      "col54"      "col55"     
 [57] "col56"      "col57"      "col58"      "col59"      "col60"      "col61"      "col62"      "col63"      "col64"      "col65"      "col66"      "col67"      "col68"      "col69"     
 [71] "col70"      "col71"      "col72"      "col73"      "col74"      "col75"      "col76"      "col77"      "col78"      "col79"      "col80"      "col81"      "col82"      "col83"     
 [85] "col84"      "col85"      "col86"      "col87"      "col88"      "col89"      "col90"      "col91"      "col92"      "col93"      "col94"      "col95"      "col96"      "col97"     
 [99] "col98"      "col99"  

What? Not there... Now add one column outside of the function call:

> tbl[, newCol := 5]
> colnames(tbl)
  [1] "initialCol" "col1"       "col2"       "col3"       "col4"       "col5"       "col6"       "col7"       "col8"       "col9"       "col10"      "col11"      "col12"      "col13"     
 [15] "col14"      "col15"      "col16"      "col17"      "col18"      "col19"      "col20"      "col21"      "col22"      "col23"      "col24"      "col25"      "col26"      "col27"     
 [29] "col28"      "col29"      "col30"      "col31"      "col32"      "col33"      "col34"      "col35"      "col36"      "col37"      "col38"      "col39"      "col40"      "col41"     
 [43] "col42"      "col43"      "col44"      "col45"      "col46"      "col47"      "col48"      "col49"      "col50"      "col51"      "col52"      "col53"      "col54"      "col55"     
 [57] "col56"      "col57"      "col58"      "col59"      "col60"      "col61"      "col62"      "col63"      "col64"      "col65"      "col66"      "col67"      "col68"      "col69"     
 [71] "col70"      "col71"      "col72"      "col73"      "col74"      "col75"      "col76"      "col77"      "col78"      "col79"      "col80"      "col81"      "col82"      "col83"     
 [85] "col84"      "col85"      "col86"      "col87"      "col88"      "col89"      "col90"      "col91"      "col92"      "col93"      "col94"      "col95"      "col96"      "col97"     
 [99] "col98"      "col99"      "newCol"  

All good. Now add that 100th column:

> fooFun(tbl, 100)
> colnames(tbl)
  [1] "initialCol" "col1"       "col2"       "col3"       "col4"       "col5"       "col6"       "col7"       "col8"       "col9"       "col10"      "col11"      "col12"      "col13"     
 [15] "col14"      "col15"      "col16"      "col17"      "col18"      "col19"      "col20"      "col21"      "col22"      "col23"      "col24"      "col25"      "col26"      "col27"     
 [29] "col28"      "col29"      "col30"      "col31"      "col32"      "col33"      "col34"      "col35"      "col36"      "col37"      "col38"      "col39"      "col40"      "col41"     
 [43] "col42"      "col43"      "col44"      "col45"      "col46"      "col47"      "col48"      "col49"      "col50"      "col51"      "col52"      "col53"      "col54"      "col55"     
 [57] "col56"      "col57"      "col58"      "col59"      "col60"      "col61"      "col62"      "col63"      "col64"      "col65"      "col66"      "col67"      "col68"      "col69"     
 [71] "col70"      "col71"      "col72"      "col73"      "col74"      "col75"      "col76"      "col77"      "col78"      "col79"      "col80"      "col81"      "col82"      "col83"     
 [85] "col84"      "col85"      "col86"      "col87"      "col88"      "col89"      "col90"      "col91"      "col92"      "col93"      "col94"      "col95"      "col96"      "col97"     
 [99] "col98"      "col99"      "newCol"     "col100"    

It's there now. Now add 20 more:

> fooFun(tbl, 120)
> colnames(tbl)
  [1] "initialCol" "col1"       "col2"       "col3"       "col4"       "col5"       "col6"       "col7"       "col8"       "col9"       "col10"      "col11"      "col12"      "col13"     
 [15] "col14"      "col15"      "col16"      "col17"      "col18"      "col19"      "col20"      "col21"      "col22"      "col23"      "col24"      "col25"      "col26"      "col27"     
 [29] "col28"      "col29"      "col30"      "col31"      "col32"      "col33"      "col34"      "col35"      "col36"      "col37"      "col38"      "col39"      "col40"      "col41"     
 [43] "col42"      "col43"      "col44"      "col45"      "col46"      "col47"      "col48"      "col49"      "col50"      "col51"      "col52"      "col53"      "col54"      "col55"     
 [57] "col56"      "col57"      "col58"      "col59"      "col60"      "col61"      "col62"      "col63"      "col64"      "col65"      "col66"      "col67"      "col68"      "col69"     
 [71] "col70"      "col71"      "col72"      "col73"      "col74"      "col75"      "col76"      "col77"      "col78"      "col79"      "col80"      "col81"      "col82"      "col83"     
 [85] "col84"      "col85"      "col86"      "col87"      "col88"      "col89"      "col90"      "col91"      "col92"      "col93"      "col94"      "col95"      "col96"      "col97"     
 [99] "col98"      "col99"      "newCol"     "col100"     "col101"     "col102"     "col103"     "col104"     "col105"     "col106"     "col107"     "col108"     "col109"     "col110"    
[113] "col111"     "col112"     "col113"     "col114"     "col115"     "col116"     "col117"     "col118"     "col119"     "col120"    

Looks OK. Now add a bunch more:

> fooFun(tbl, 240)
> colnames(tbl)
  [1] "initialCol" "col1"       "col2"       "col3"       "col4"       "col5"       "col6"       "col7"       "col8"       "col9"       "col10"      "col11"      "col12"      "col13"     
 [15] "col14"      "col15"      "col16"      "col17"      "col18"      "col19"      "col20"      "col21"      "col22"      "col23"      "col24"      "col25"      "col26"      "col27"     
 [29] "col28"      "col29"      "col30"      "col31"      "col32"      "col33"      "col34"      "col35"      "col36"      "col37"      "col38"      "col39"      "col40"      "col41"     
 [43] "col42"      "col43"      "col44"      "col45"      "col46"      "col47"      "col48"      "col49"      "col50"      "col51"      "col52"      "col53"      "col54"      "col55"     
 [57] "col56"      "col57"      "col58"      "col59"      "col60"      "col61"      "col62"      "col63"      "col64"      "col65"      "col66"      "col67"      "col68"      "col69"     
 [71] "col70"      "col71"      "col72"      "col73"      "col74"      "col75"      "col76"      "col77"      "col78"      "col79"      "col80"      "col81"      "col82"      "col83"     
 [85] "col84"      "col85"      "col86"      "col87"      "col88"      "col89"      "col90"      "col91"      "col92"      "col93"      "col94"      "col95"      "col96"      "col97"     
 [99] "col98"      "col99"      "newCol"     "col100"     "col101"     "col102"     "col103"     "col104"     "col105"     "col106"     "col107"     "col108"     "col109"     "col110"    
[113] "col111"     "col112"     "col113"     "col114"     "col115"     "col116"     "col117"     "col118"     "col119"     "col120"     "col121"     "col122"     "col123"     "col124"    
[127] "col125"     "col126"     "col127"     "col128"     "col129"     "col130"     "col131"     "col132"     "col133"     "col134"     "col135"     "col136"     "col137"     "col138"    
[141] "col139"     "col140"     "col141"     "col142"     "col143"     "col144"     "col145"     "col146"     "col147"     "col148"     "col149"     "col150"     "col151"     "col152"    
[155] "col153"     "col154"     "col155"     "col156"     "col157"     "col158"     "col159"     "col160"     "col161"     "col162"     "col163"     "col164"     "col165"     "col166"    
[169] "col167"     "col168"     "col169"     "col170"     "col171"     "col172"     "col173"     "col174"     "col175"     "col176"     "col177"     "col178"     "col179"     "col180"    
[183] "col181"     "col182"     "col183"     "col184"     "col185"     "col186"     "col187"     "col188"     "col189"     "col190"     "col191"     "col192"     "col193"     "col194"    
[197] "col195"     "col196"     "col197"     "col198" 

No good.

What's going on?

解决方案

@Arun pointed out that this issue has already been addressed on the mailing list: #5204. Following the advice in that thread, I increased the default number of column pointers that are alloc'd when a data.table is created:

options(datatable.alloccol = 900)

This way it won't hit the default pre-allocation of 100 columns when increasing the number of columns over 100 on an already-created table. This works around the underlying issue about shallow copying the object when the pre-allocation limit is reached, so that it produces the expected behavior for the tests in this SO question.

这篇关于通过引用在函数中添加大量列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆