通过引用在函数中添加大量列 [英] adding a large number of columns within a function by reference
问题描述
在这种情况下,我无法弄清data.table的情况:
fooFun < tbl,totCols){
tbl [,paste0(col,1:totCols):= 0]
}
从一个空的1-col数据表开始。
然后通过引用添加99列: p>
> fooFun(tbl,99)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99
现在添加第100列:
> fooFun(tbl,100)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99
不存在...现在在函数调用之外添加一个列:
> tbl [,newCol:= 5]
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newCol
现在添加第100列:
> fooFun(tbl,100)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newColcol100
现在再添加20个:
> fooFun(tbl,120)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newColcol100col101col102col103col104col105col106col107col108 [113]col111col112col113col114col115col116col117col118col119col120
看起来OK。现在添加一堆:
> fooFun(tbl,240)
> col1col1col1col1col2col1col2col3 col13col16col16col16col14col16col16 col27col38col29col28col29col30col31col32col35 col54col50col51col52col53col54col55col54col54
[57]col56col57col58col59col60col61col62col63col64col65col66 b $ b [71]col70col71col72col73col74col75col76col77 $ b [85]col84col85col86col87col88col89col90col91col92col93col94 b [99]col98col99newColcol100col101col102col103col104col105col106col107col108 col118col119col128col121col122col123col124
[col126]col112 127]col125col126col127col128col129col130col131col132col133col134col135col136col137col138 ]col139col140col141col142col143col144col145col146col147col148col149col150col151 col153col154col155col156col157col158col159col160col161col162col163col164col165col166
col167col168col169col170col171col172col173col174col175col176col177col178col179col180
col182col183col184col185col186col187col188col189col190col191col192col193 col196col197col198
发生了什么事?
解决方案 @Arun指出这个问题已在邮件列表中解决:#5204 。按照该线程中的建议,我增加了在创建data.table时alloc_的列指针的默认数量:
options(datatable.alloccol = 900)
在已创建的表上增加列数超过100时分配100列。这可以解决在达到预分配限制时浅复制对象的基本问题,以便为此SO问题中的测试产生预期行为。
I can't figure out what's going on with data.table in this situation:
fooFun <- function(tbl, totCols) {
tbl[, paste0("col", 1:totCols) := 0]
}
Start with an empty 1-col data table.
> tbl = data.table(initialCol=double())
Then add 99 columns by reference:
> fooFun(tbl, 99)
> colnames(tbl)
[1] "initialCol" "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9" "col10" "col11" "col12" "col13"
[15] "col14" "col15" "col16" "col17" "col18" "col19" "col20" "col21" "col22" "col23" "col24" "col25" "col26" "col27"
[29] "col28" "col29" "col30" "col31" "col32" "col33" "col34" "col35" "col36" "col37" "col38" "col39" "col40" "col41"
[43] "col42" "col43" "col44" "col45" "col46" "col47" "col48" "col49" "col50" "col51" "col52" "col53" "col54" "col55"
[57] "col56" "col57" "col58" "col59" "col60" "col61" "col62" "col63" "col64" "col65" "col66" "col67" "col68" "col69"
[71] "col70" "col71" "col72" "col73" "col74" "col75" "col76" "col77" "col78" "col79" "col80" "col81" "col82" "col83"
[85] "col84" "col85" "col86" "col87" "col88" "col89" "col90" "col91" "col92" "col93" "col94" "col95" "col96" "col97"
[99] "col98" "col99"
All looks good. Now add the 100th column:
> fooFun(tbl, 100)
> colnames(tbl)
[1] "initialCol" "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9" "col10" "col11" "col12" "col13"
[15] "col14" "col15" "col16" "col17" "col18" "col19" "col20" "col21" "col22" "col23" "col24" "col25" "col26" "col27"
[29] "col28" "col29" "col30" "col31" "col32" "col33" "col34" "col35" "col36" "col37" "col38" "col39" "col40" "col41"
[43] "col42" "col43" "col44" "col45" "col46" "col47" "col48" "col49" "col50" "col51" "col52" "col53" "col54" "col55"
[57] "col56" "col57" "col58" "col59" "col60" "col61" "col62" "col63" "col64" "col65" "col66" "col67" "col68" "col69"
[71] "col70" "col71" "col72" "col73" "col74" "col75" "col76" "col77" "col78" "col79" "col80" "col81" "col82" "col83"
[85] "col84" "col85" "col86" "col87" "col88" "col89" "col90" "col91" "col92" "col93" "col94" "col95" "col96" "col97"
[99] "col98" "col99"
What? Not there... Now add one column outside of the function call:
> tbl[, newCol := 5]
> colnames(tbl)
[1] "initialCol" "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9" "col10" "col11" "col12" "col13"
[15] "col14" "col15" "col16" "col17" "col18" "col19" "col20" "col21" "col22" "col23" "col24" "col25" "col26" "col27"
[29] "col28" "col29" "col30" "col31" "col32" "col33" "col34" "col35" "col36" "col37" "col38" "col39" "col40" "col41"
[43] "col42" "col43" "col44" "col45" "col46" "col47" "col48" "col49" "col50" "col51" "col52" "col53" "col54" "col55"
[57] "col56" "col57" "col58" "col59" "col60" "col61" "col62" "col63" "col64" "col65" "col66" "col67" "col68" "col69"
[71] "col70" "col71" "col72" "col73" "col74" "col75" "col76" "col77" "col78" "col79" "col80" "col81" "col82" "col83"
[85] "col84" "col85" "col86" "col87" "col88" "col89" "col90" "col91" "col92" "col93" "col94" "col95" "col96" "col97"
[99] "col98" "col99" "newCol"
All good. Now add that 100th column:
> fooFun(tbl, 100)
> colnames(tbl)
[1] "initialCol" "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9" "col10" "col11" "col12" "col13"
[15] "col14" "col15" "col16" "col17" "col18" "col19" "col20" "col21" "col22" "col23" "col24" "col25" "col26" "col27"
[29] "col28" "col29" "col30" "col31" "col32" "col33" "col34" "col35" "col36" "col37" "col38" "col39" "col40" "col41"
[43] "col42" "col43" "col44" "col45" "col46" "col47" "col48" "col49" "col50" "col51" "col52" "col53" "col54" "col55"
[57] "col56" "col57" "col58" "col59" "col60" "col61" "col62" "col63" "col64" "col65" "col66" "col67" "col68" "col69"
[71] "col70" "col71" "col72" "col73" "col74" "col75" "col76" "col77" "col78" "col79" "col80" "col81" "col82" "col83"
[85] "col84" "col85" "col86" "col87" "col88" "col89" "col90" "col91" "col92" "col93" "col94" "col95" "col96" "col97"
[99] "col98" "col99" "newCol" "col100"
It's there now. Now add 20 more:
> fooFun(tbl, 120)
> colnames(tbl)
[1] "initialCol" "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9" "col10" "col11" "col12" "col13"
[15] "col14" "col15" "col16" "col17" "col18" "col19" "col20" "col21" "col22" "col23" "col24" "col25" "col26" "col27"
[29] "col28" "col29" "col30" "col31" "col32" "col33" "col34" "col35" "col36" "col37" "col38" "col39" "col40" "col41"
[43] "col42" "col43" "col44" "col45" "col46" "col47" "col48" "col49" "col50" "col51" "col52" "col53" "col54" "col55"
[57] "col56" "col57" "col58" "col59" "col60" "col61" "col62" "col63" "col64" "col65" "col66" "col67" "col68" "col69"
[71] "col70" "col71" "col72" "col73" "col74" "col75" "col76" "col77" "col78" "col79" "col80" "col81" "col82" "col83"
[85] "col84" "col85" "col86" "col87" "col88" "col89" "col90" "col91" "col92" "col93" "col94" "col95" "col96" "col97"
[99] "col98" "col99" "newCol" "col100" "col101" "col102" "col103" "col104" "col105" "col106" "col107" "col108" "col109" "col110"
[113] "col111" "col112" "col113" "col114" "col115" "col116" "col117" "col118" "col119" "col120"
Looks OK. Now add a bunch more:
> fooFun(tbl, 240)
> colnames(tbl)
[1] "initialCol" "col1" "col2" "col3" "col4" "col5" "col6" "col7" "col8" "col9" "col10" "col11" "col12" "col13"
[15] "col14" "col15" "col16" "col17" "col18" "col19" "col20" "col21" "col22" "col23" "col24" "col25" "col26" "col27"
[29] "col28" "col29" "col30" "col31" "col32" "col33" "col34" "col35" "col36" "col37" "col38" "col39" "col40" "col41"
[43] "col42" "col43" "col44" "col45" "col46" "col47" "col48" "col49" "col50" "col51" "col52" "col53" "col54" "col55"
[57] "col56" "col57" "col58" "col59" "col60" "col61" "col62" "col63" "col64" "col65" "col66" "col67" "col68" "col69"
[71] "col70" "col71" "col72" "col73" "col74" "col75" "col76" "col77" "col78" "col79" "col80" "col81" "col82" "col83"
[85] "col84" "col85" "col86" "col87" "col88" "col89" "col90" "col91" "col92" "col93" "col94" "col95" "col96" "col97"
[99] "col98" "col99" "newCol" "col100" "col101" "col102" "col103" "col104" "col105" "col106" "col107" "col108" "col109" "col110"
[113] "col111" "col112" "col113" "col114" "col115" "col116" "col117" "col118" "col119" "col120" "col121" "col122" "col123" "col124"
[127] "col125" "col126" "col127" "col128" "col129" "col130" "col131" "col132" "col133" "col134" "col135" "col136" "col137" "col138"
[141] "col139" "col140" "col141" "col142" "col143" "col144" "col145" "col146" "col147" "col148" "col149" "col150" "col151" "col152"
[155] "col153" "col154" "col155" "col156" "col157" "col158" "col159" "col160" "col161" "col162" "col163" "col164" "col165" "col166"
[169] "col167" "col168" "col169" "col170" "col171" "col172" "col173" "col174" "col175" "col176" "col177" "col178" "col179" "col180"
[183] "col181" "col182" "col183" "col184" "col185" "col186" "col187" "col188" "col189" "col190" "col191" "col192" "col193" "col194"
[197] "col195" "col196" "col197" "col198"
No good.
What's going on?
解决方案 @Arun pointed out that this issue has already been addressed on the mailing list: #5204. Following the advice in that thread, I increased the default number of column pointers that are alloc'd when a data.table is created:
options(datatable.alloccol = 900)
This way it won't hit the default pre-allocation of 100 columns when increasing the number of columns over 100 on an already-created table. This works around the underlying issue about shallow copying the object when the pre-allocation limit is reached, so that it produces the expected behavior for the tests in this SO question.
这篇关于通过引用在函数中添加大量列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文