在sqldf中处理CSV文件中的逗号 [英] Dealing with commas in a CSV file in sqldf

查看:189
本文介绍了在sqldf中处理CSV文件中的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里关注我的问题 sqldf返回零观测值带有可复制的示例.

I am following up on my question here sqldf returns zero observations with a reproducible example.

我发现问题可能出在其中一个单元格("1,500+")中的逗号",我认为我必须使用此处建议的过滤器

I found that the problem is probably from the "comma" in one of the cells ("1,500+") and I think that I have to use a filter as suggested here sqldf, csv, and fields containing commas, but I am not sure how to define my filter. Below is the code:

library(sqldf)

df <- data.frame("a" = c("8600000US01770" , "8600000US01937"),
             "b"= c("1,500+" , "-"),
             "c"= c("***" , "**"),
             "d"= c("(x)" , "(x)"),
             "e"= c("(x)" , "(x)"),
             "f"= c(992 , "-"))

write.csv(df, 'df_to_read.csv')  

# 'df_to_read.csv' looks like this:
"","a","b","c","d","e","f"
1,8600000US01770,1,500+,***,(x),(x),992
2,8600000US01937,-,**,(x),(x),-

Housing <- file("df_to_read.csv")
Housing_filtered <- sqldf('SELECT * FROM Housing', file.format = list(eol="\n"))

运行此代码时,出现以下错误:

When I run this code, I get the following error:

connection_import_file中的错误(conn @ ptr,名称,值,sep,eol,skip):RS_sqlite_import:df_to_read.csv第2行预期有7列数据,但发现8

Error in connection_import_file(conn@ptr, name, value, sep, eol, skip) : RS_sqlite_import: df_to_read.csv line 2 expected 7 columns of data but found 8

推荐答案

最好的方法是清理一次文件,这样您以后就不必再为同一问题担心了.这应该可以帮助您:

Best way would be to clean your file once, so that you don't need to worry later again in your analysis for the same issue. This should get you going:

Housing <- readLines("df_to_read.csv")                            # read the file

n <- 6             # number of separators expected = number of columns expected - 1

library(stringr)
ln_idx <- ifelse(str_count(Housing, pattern = ",") == n, 0 , 1)
which(ln_idx == 1)               # line indices with issue, includes the header row
#[1] 2

检查特定问题,然后以相同的索引写回您的文件.例如第(2)行:

Check for the specific issues and write back to you file, at the same indices. for eg line (2):

Housing[2]
#[1] "1,8600000US01770,1,500+,***,(x),(x),992"            # hmm.. extra comma

Housing[2] = "1,8600000US01770,1500+,***,(x),(x),992"     # removed the extra comma
writeLines(Housing, "df_to_read.csv")

现在生意很正常,很高兴去做

Now the business is usual, good to go:

Housing <- file("df_to_read.csv")
Housing_filtered <- sqldf('SELECT * FROM Housing') 

# Housing_filtered 
#               a      b   c   d   e   f
# 1 8600000US01770  1500+ *** (x) (x) 992
# 2 8600000US01937      -  ** (x) (x)   -

这篇关于在sqldf中处理CSV文件中的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆