为什么write.csv和read.csv不一致? [英] Why write.csv and read.csv are not consistent?

查看:191
本文介绍了为什么write.csv和read.csv不一致?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题很简单,请考虑以下示例:

  m<  - 头(虹膜)
写.csv(m,file ='m.csv')
m1< - read.csv('m.csv')

其结果是 m1 与原始对象不同 m 因为它有一个名为X的新的第一列。如果我真的想让他们平等,我必须使用额外的参数,就像这两个例子:

  write.csv m,file ='m.csv',row.names = FALSE)
#然后
m1< - read.csv('m.csv')

  write.csv ,file ='m.csv')
m1 < - read.csv('m.csv',row.names = 1)

问题是,这个区别的原因是什么?特别是为什么如果 write.csv read.csv 应该是坚持Excel惯例,那么不能导入首先导出的同一个对象?对我来说,这是一个非常反直觉的行为,非常不受欢迎。



(如果我使用这些函数的csv2变体,这个结果会发生完全一样)



提前感谢






这些是data.frames m m1 如果您不想使用R来查看示例:

 > m 
萼片长度萼片宽度花瓣长度花瓣种类
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

> m1
X萼片长度萼片宽度花瓣长度花瓣种类
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa


解决方案

这是我的猜测...



write.table 将data.frame写入文件,data.frames始终具有行名称,因此默认情况下不会写入行名称将抛弃信息。 (是的, write.table 也将写一个矩阵,矩阵不必具有行名称,但是数据框架可能比矩阵使用的频率更高。 / p>

read.table 返回一个data.frame,但是CSV文件没有任何行名的概念,所以有人可能会认为,默认情况下,CSV的第一列是行名称是反直觉的。



现在可能有一种方法来使这两个功能一致,但我认为写入文本文件不是将数据从一个R会话输出到另一个会话的最佳方式。使用 save 加载 saveRDS readRDS 等。


The problem is simple, consider the following example:

m <- head(iris)
write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv')

The result of this is that m1 is different from the original object m in that it has a new first column named "X". If I really wanted to make them equal, I have to use additional arguments, like in these two examples:

write.csv(m, file = 'm.csv', row.names = FALSE)
# and then
m1 <- read.csv('m.csv')

or

write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv', row.names = 1)

The question is, what is the reason of this difference? in particular, why if write.csv and read.csv are supposedly intended to stick to the Excel convention, the don't import the same object that was exported in the first place? To me this is a very counter intuitive behavior and highly undesirable.

(this results happens exactly the same if I use the csv2 variants of these functions)

Thanks in advance!


These are the data.frames m and m1 if you prefer not to use R to see the example:

> m
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> m1
  X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1          5.1         3.5          1.4         0.2  setosa
2 2          4.9         3.0          1.4         0.2  setosa
3 3          4.7         3.2          1.3         0.2  setosa
4 4          4.6         3.1          1.5         0.2  setosa
5 5          5.0         3.6          1.4         0.2  setosa
6 6          5.4         3.9          1.7         0.4  setosa

解决方案

Here's my guess...

write.table writes a data.frame to a file and data.frames always have row names, so not writing row names by default would be throwing away information. (Yes, write.table will also write a matrix and matrices don't have to have row names, but data.frames are probably used much more often than matrices.)

read.table returns a data.frame but CSV files don't have any concept of row names, so someone may argue that it's counter-intuitive to assume, by default, that the first column of a CSV is a row name.

Now there may be a way to make these two functions consistent, but I would argue that writing to a text file isn't the best way to output/input data from one R session to another. It's much safer/faster to use save, load, saveRDS, readRDS, etc.

这篇关于为什么write.csv和read.csv不一致?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆