使用包含数千个分隔符的数字列读取csv [英] Read csv with numeric columns containing thousands separator
问题描述
我尝试读取的csv文件具有以下格式:
The csv file I'm trying to read has exactly the following format :
Date,x,y
"2015/08/01","71,131","20,390"
"2015/08/02","81,599","23,273"
"2015/08/03","79,435","21,654"
"2015/08/04","80,733","20,924"
分隔符是逗号,但是每个值也用引号引起来,因为逗号用作千位分隔符.我尝试过{readr}的read.csv
,read_csv
和{data.table}的fread
,我能做的最好的就是读取所有值作为字符串,然后使用as.numeric
和gsub
的组合来将它们转换成数字.
The separator is comma, but each value is also enclosed in quotes because of the comma that serves as a thousands separator. I tried read.csv
, read_csv
from {readr} and fread
from {data.table} and the best I can do is read all values is as strings and then use a combination of as.numeric
and gsub
to transform them into numbers.
I also found this: Most elegant way to load csv with point as thousands separator in R It is quite useful, but my data has a lot of columns (not all numeric) and I'd rather not specify column types.
任何想法还是我应该开始订阅?在有趣的方面,Excel可以很好地读取文件:)
Any ideas or should I start gsub-ing? On the fun side, Excel reads the file just fine :)
推荐答案
您应该能够使用read.csv
读取数据.这是一个例子
You should be able to read the data with read.csv
. Here an example
#write data
write('Date,x,y\n"2015/08/01","71,131","20,390"\n"2015/08/02","81,599","23,273"\n"2015/08/03","79,435","21,654"\n"2015/08/04","80,733","20,924"',"test.csv")
#use "text" rather than "file" in read.csv
#perform regex substitution before using read.csv
#the outer gsub with '(?<=\\d),(\\d{3})(?!\\d)' performs the thousands separator substitution
#the inner gsub replaces all \" with '
read.csv(text=gsub('(?<=\\d),(\\d{3})(?!\\d)',
'\\1',
gsub("\\\"",
"'",
paste0(readLines("test.csv"),collapse="\n")),
perl=TRUE),
header=TRUE,
quote="'",
stringsAsFactors=FALSE)
结果
# Date x y
#1 2015/08/01 71131 20390
#2 2015/08/02 81599 23273
#3 2015/08/03 79435 21654
#4 2015/08/04 80733 20924
这篇关于使用包含数千个分隔符的数字列读取csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!