第1000行后带有科学计数法的write_csv read_csv [英] write_csv read_csv with scientific notation after 1000th row

查看:204
本文介绍了第1000行后带有科学计数法的write_csv read_csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用write_csv()将包含小整数项(值小于1000)和大"项(值1000或更大)的混合的数据帧写入csv文件中,会混合科学和非科学项.如果前1000行的值较小,但此后的值较大,则read_csv()似乎对此混为一谈,并以科学计数形式输出NA:

Writing a data frame with a mix of small integer entries (value less than 1000) and "large" ones (value 1000 or more) into csv file with write_csv() mixes scientific and non-scientific entries. If the first 1000 rows are small values but there is a large value thereafter, read_csv() seems to get confused with this mix and outputs NA for scientific notations:

test_write_read <- function(small_value, 
                            n_fills, 
                            position, 
                            large_value) {
    tib             <- tibble(a = rep(small_value, n_fills))
    tib$a[position] <- large_value
    write_csv(tib, "tib.csv")
    tib             <- read_csv("tib.csv")
}

以下几行没有任何问题:

The following lines do not make any problem:

tib <- test_write_read(small_value = 1, 
                       n_fills     = 1001, 
                       position    = 1000, #position <= 1000
                       large_value = 1000)
tib <- test_write_read(1, 1001, 1001, 999)
tib <- test_write_read(1000, 1001, 1000, 1)

但是,以下几行可以做到:

However, the following lines do:

tib <- test_write_read(small_value = 1, 
                       n_fills     = 1001, 
                       position    = 1001, #position > 1000
                       large_value = 1000)
tib <- test_write_read(1, 1002, 1001, 1000)
tib <- test_write_read(999, 1001, 1001, 1000)

典型输出:

problems(tib)
## A tibble: 1 x 5
#  row   col   expected               actual file
#  <int> <chr> <chr>                  <chr>  <chr>
#1 1001  a     no trailing characters e3     'tib.csv'

tib %>% tail(n = 3)
## A tibble: 3 x 1
#      a
#  <int>
#1   999
#2   999
#3    NA

csv文件:

$ tail -n3 tib.csv
#999
#999
#1e3

我正在跑步:

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

使用tidyverse_1.2.1(正在加载readr_1.1.1)

with tidyverse_1.2.1 (loading readr_1.1.1)

是应该报告的错误吗?

推荐答案

我刚刚安装了readr的开发版本:devtools::install_github("tidyverse/readr"),所以现在我有了readr_1.2.0,而NA问题就消失了.但是列"a"现在被read_csv()猜测为dbl(其中是否有大整数),而之前它已被正确读取为int,所以如果我需要将其作为int我仍然必须进行as.integer()转换.至少现在它不会使我的代码崩溃.

I just installed the dev version of readr: devtools::install_github("tidyverse/readr"), so now I have readr_1.2.0, and the NA problem went away. But the column "a" is "guessed" by read_csv() as dbl now (whether or not there is a large integer in it), whereas it was correctly read as int before, so if I need it as int I still have to do a as.integer() conversion. At least now it does not crash my code.

tib <- test_write_read(1, 1002, 1001, 1000)
tib %>% tail(n = 3)
## A tibble: 6 x 1
#        a
#    <dbl>
#1    1.00
#2 1000
#3    1.00

尽管如此,write_csv()仍然将较大的值写为1e3,所以我认为这并不是最终的解决方案.

The large value is still written as 1e3 by write_csv(), though, so to my opinion this is not quite a final solution.

$ tail -n3 tib.csv
#1
#1e3
#1

这篇关于第1000行后带有科学计数法的write_csv read_csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆