data.table R在Red Hat Linux上的f错误 [英] data.table R fwrite bug on Red Hat Linux
问题描述
我一直在使用data.table(v1.10),并注意到使用fwrite时的错误。一些背景。
I have been using data.table (v1.10) and noticed a bug when using fwrite. Some background.
sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)
有多核机器。
生成一些数据
#Generate some data
rows = 2500000
set.seed(Sys.time())
DF <- data.frame(index = 1:rows,
catsA = sample((letters[1:10]),rows,replace=T),
catsB = sample((letters[1:10]),rows,replace=T),
catsC = sample((letters[1:10]),rows,replace=T),
catsD = sample((letters[1:10]),rows,replace=T),
catsE = sample((letters[1:10]),rows,replace=T),
valueA = round(rnorm(rows),3),
valueB = rpois(rows, lambda = 4))
#Convert to data.table
DT <- data.table(DF)
#Create a new column
DT <- DT[,valueNew := valueA*valueB]
#Write
write.csv(DT,file="DT_write_csv.csv",row.names=F)
fwrite(DT, file = "DT_fwrite.csv",row.names=F)
in and join
#Read back in and join
DT_csv <- fread("DT_write_csv.csv")
DT_fwrite <- fread("DT_fwrite.csv")
setkey(DT_csv,"index")
setkey(DT_fwrite,"index")
join_DT <- DT_csv[DT_fwrite]
比较
nrow(join_DT[valueNew != i.valueNew])
[1] 1
join_DT[valueNew != i.valueNew,.(index,valueNew,i.valueNew)]
index valueNew i.valueNew
1: 67097 2.855 5.71
DT[index==67097,.(valueNew)]
valueNew
1: 2.855
从比较,原始DT有一个fwrite破坏。有时它是多行并且在现实生活中的例子传播跨越许多列。
From the Compare the original DT has the a that fwrite corrupts. Sometimes it is more than one row and in a real-life example propagated across many columns.
我对fwrite做错了什么?
Am I doing something wrong with the fwrite?
推荐答案
是否在 fwrite
中有错误。在上周固定在开发,我会尽快得到它的CRAN很快。请检查 新闻 链接修正项目3的顶部:
Yes there is a bug in fwrite
. Fixed in dev last week and I'll try and get it to CRAN soon. Please check NEWS link at the top of homepage, bug fix item 3 :
fwrite()
浮点值不正确,#1968 。 A
线程局部变量不正确的线程全局。这个变量的
使用寿命只有几个时钟周期,所以它需要大数据和
许多线程的几个线程重叠它们的使用和
导致的问题。非常感谢@mgahan和@jmosser查找和
报告。
fwrite()
could write floating point values incorrectly, #1968. A thread-local variable was incorrectly thread-global. This variable's usage lifetime is only a few clock cycles so it needed large data and many threads for several threads to overlap their usage of it and cause the problem. Many thanks to @mgahan and @jmosser for finding and reporting.
请从dev尝试输入命令此处。我知道dev目前失败Travis(一个不相关的原因),这就是为什么安装命令已经设置安装提交从dev,因此应该是确定。
Please try from dev by typing the command here. I know that dev is currently failing Travis (an unrelated reason), which is why the installation command has been setup to install the last-passing commit from dev and therefore should be ok.
这篇关于data.table R在Red Hat Linux上的f错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!