如何检查.csv文件是否以逗号或分号作为分隔符? [英] How to check if .csv-File has a comma or a semicolon as separator?
问题描述
标题已经表明:
我必须自动阅读很多.csv文件. 有些以逗号作为分隔符,然后我使用命令read.csv().
I have to read in a lot of .csv-Files automatically. Some have a comma as a delimiter, then i take the command read.csv().
有些使用分号作为分隔符,然后我使用read.csv2().
Some have a semicolon as a delimiter, then i take read.csv2().
我想写一段代码来识别.csv文件是否以逗号或分号作为定界符(在阅读之前),这样我就不必每次都更改代码. 我的方法是这样的:
I want to write a piece of code that recognizes if the .csv-File has a comma or a semicolon as a a delimiter(before i read it) so that I don´t have to change the code everytime. My approach would be something like this:
try to read.csv("xyz")
if error
read.csv2("xyz")
有可能这样吗? 有人做过吗? 我该如何在没有实际看到的情况下检查是否有错误?
Is something like that possible? Has somebody done this before? How can i check if there was an error without actually seeing it?
我希望问题清楚. 对不起,我的英语
I hope the question is clear. Sorry for my English
预先感谢
推荐答案
这里有一些方法假设文件格式之间的唯一区别是分隔符是分号,小数点是逗号还是分隔符是逗号,小数点是点.
Here are a few approaches assuming that the only difference among the format of the files is whether the separator is semicolon and the decimal is a comma or the separator is a comma and the decimal is a point.
1)fread 如data.table包中注释fread
所述,它将自动检测常见分隔符的分隔符,然后使用检测到的分隔符读取文件.这还可以处理格式上的某些其他更改,例如自动检测文件是否具有标题.
1) fread As mentioned in the comments fread
in data.table package will automatically detect the separator for common separators and then read the file in using the separator it detected. This can also handle certain other changes in format such as automatically detecting whether the file has a header.
2)grepl 查看第一行,看看它是否包含逗号或分号,然后重新读取文件:
2) grepl Look at the first line and see if it has a comma or semicolon and then re-read the file:
L <- readLines("myfile", n = 1)
if (grepl(";", L)) read.csv2("myfile") else read.csv("myfile")
3)count.fields 如果可以假设每个文件中存在多个字段,则在sep =;"时如果存在一个字段我们知道分号不是分隔符.
3) count.fields If we can assume that that more than one field exists in each file then if there were one field when sep = ";" we know that semicolon is not the separarator.
L <- readLines("myfile", n = 1)
numfields <- count.fields(textConnection(L), sep = ";")
if (numfields == 1) read.csv("myfile") else read.csv2("myfile")
更新添加了(3)并对所有三个进行了改进.
Update Added (3) and made improvements to all three.
这篇关于如何检查.csv文件是否以逗号或分号作为分隔符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!