使用包含数千个分隔符的数字列读取csv [英] Read csv with numeric columns containing thousands separator

查看:183
本文介绍了使用包含数千个分隔符的数字列读取csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试读取的csv文件具有以下格式:

The csv file I'm trying to read has exactly the following format :

Date,x,y
"2015/08/01","71,131","20,390"
"2015/08/02","81,599","23,273"
"2015/08/03","79,435","21,654"
"2015/08/04","80,733","20,924"

分隔符是逗号,但是每个值也用引号引起来,因为逗号用作千位分隔符.我尝试过{readr}的read.csvread_csv和{data.table}的fread,我能做的最好的就是读取所有值作为字符串,然后使用as.numericgsub的组合来将它们转换成数字.

The separator is comma, but each value is also enclosed in quotes because of the comma that serves as a thousands separator. I tried read.csv , read_csv from {readr} and fread from {data.table} and the best I can do is read all values is as strings and then use a combination of as.numeric and gsub to transform them into numbers.

我还发现了这一点:

I also found this: Most elegant way to load csv with point as thousands separator in R It is quite useful, but my data has a lot of columns (not all numeric) and I'd rather not specify column types.

任何想法还是我应该开始订阅?在有趣的方面,Excel可以很好地读取文件:)

Any ideas or should I start gsub-ing? On the fun side, Excel reads the file just fine :)

推荐答案

您应该能够使用read.csv读取数据.这是一个例子

You should be able to read the data with read.csv. Here an example

#write data
write('Date,x,y\n"2015/08/01","71,131","20,390"\n"2015/08/02","81,599","23,273"\n"2015/08/03","79,435","21,654"\n"2015/08/04","80,733","20,924"',"test.csv")

#use "text" rather than "file" in read.csv
#perform regex substitution before using read.csv
#the outer gsub with '(?<=\\d),(\\d{3})(?!\\d)' performs the thousands separator substitution
#the inner gsub replaces all \" with '
read.csv(text=gsub('(?<=\\d),(\\d{3})(?!\\d)',
                   '\\1',
                   gsub("\\\"",
                        "'",
                        paste0(readLines("test.csv"),collapse="\n")),
                   perl=TRUE),
         header=TRUE,
         quote="'",
         stringsAsFactors=FALSE)

结果

#        Date     x     y
#1 2015/08/01 71131 20390
#2 2015/08/02 81599 23273
#3 2015/08/03 79435 21654
#4 2015/08/04 80733 20924

这篇关于使用包含数千个分隔符的数字列读取csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆