在R中执行read.csv时，所有行都未被读取 [英] All lines not being read while executing read.csv in R

查看：990 发布时间：2017/2/24 19:02:23 r csv

本文介绍了在R中执行read.csv时，所有行都未被读取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是输入文件： http://www.yourfilelink.com/get。 php？fid = 841283 。我执行了

 选项（stringsAsFactors = FALSE）
x = read.csv（test1.csv，header = FALSE ，sep ='）。

结果是： http://www.yourfilelink.com/get.php?fid=841284

给135行，我只得到7行！列数是正确的，是13. x [6,10]也有它后面的行的内容，只是在字符串中的\\\
分隔。

请帮我。我在这个问题困住了！：/

解决方案

具有多个\\\
的超长项目的描述症状表明您可能需要处理与不匹配的报价。如果在名称或地址条目中有引号，则解析器将在考虑hte条目完成之前等待下一个引号。尝试

  x = read.csv（test1.csv，header = FALSE，sep ='，quote = ）

这实际上并不影响我下载的文件sep参数将在 read.csv 中被忽略。）我需要首先使用count.fields和分隔符，然后使用 read.table 与 fill = TRUE 。结果仍然有点搞砸与几个列填充逗号，但至少有一些工作：

  table（count.fields（〜/ Downloads / test1.txt，sep ='，quote =））
 
 10 13 
 5 130 
x<  -  read.table（〜/ Downloads / test1.txt，header = FALSE，sep ='，quote = ，stringsAsFactors = FALSE，skip = 5）
＃scan中的错误（file，what，nmax，sep，dec，quote，skip，nlines，na.strings，：
＃line 6没有13元素
x < -  read.table（〜/ Downloads / test1.txt，header = FALSE，sep ='，
 quote =，stringsAsFactors = FALSE，fill = TRUE）
 str（x）
 ####################################### ################## 
'data.frame'：135 obs。 of 13 variables：
 $ V1：chrINSERT INTO message VALUES（52，INSERT INTO message VALUES（53，INSERT INTO message VALUES（54，INSERT INTO message VALUES（55， 。
 $ V2：chrpress.release@enron.comoffice.chairman@enron.comoffice.chairman@enron.compress.release@enron.com... 
 $ V3：chr，，，，... 
 $ V4：chr2000-01-21 04:51:002000-01-24 01:37 ：002000-01-24 02:06:002000-02-02 10:21:00... 
 $ V5：chr，，， ... 
 $ V6：chr< 12435833.1075863606729.JavaMail.evans@thyme><29664079.1075863606676.JavaMail.evans@thyme><15300605.1075863606629.JavaMail.evans@thyme> 10522232.1075863606538.JavaMail.evans@thyme>... 
 $ V7：chr，，，，... 
 $ V8：chrENRON HOSTS年度分析会议业务概述和2000年目标超过$ 50  - 你做到了！超过$ 50  - 你做到了！ROAD-SHOW.COM Q4i.COM选择ENRON提供财务网站内容... $ HOUSTON  - 安然公司今天在== 20休斯顿举办年度股东分析师大会。安永董事长兼首席执行官Ken Lay| __truncated__在华尔街，人们都在谈论安然。在安然，我们谈论= 20个人...我们的人。你是驾驶forc| __truncated__在华尔街，人们都在谈论安然。在安然，我们谈论= 20个人...我们的人。您是驾驶forc| __truncated__HOUSTON = 01）Enron宽带服务（EBS），E = nron = 20Corp的全资子公司。和一个领导者交付的高b| __truncated__ ... 
 $ V11：chr，，，... 
 $ V12：chrRobert_Badeer_Aug2000Notes FoldersPress版本Robert_Badeer_Aug2000Notes FoldersPress版本Robert_Badeer_Aug2000Notes FoldersPress版本... ... 
 $ V13：chr）;）;）;...

我有更好的结果以逗号作为分隔符，只是单引号，而不是默认的单引号或双引号， read。*。read-table（〜/ read）。

 $ 
 Downloads / test1.txt，header = FALSE，sep =，，
 quote ='，stringsAsFactors = FALSE，fill = TRUE）
 str（x2）

This is the input file: http://www.yourfilelink.com/get.php?fid=841283 . I executed

options(stringsAsFactors=FALSE)
x=read.csv("test1.csv", header = FALSE, sep="'").

The result is this: http://www.yourfilelink.com/get.php?fid=841284

Instead of giving 135 rows, I am getting only 7 rows! Number of columns is correct, and is 13. x[6,10] has the content of the rows following it as well, just separated by \n in the string.

Please help me in this. I am stuck up in this problem! :/

解决方案

The described symptom of an extremely long item with multiple "\n"'s suggests you probably need to deal with unmatched quotes. If there is a quote mark in a name or address entry then the parser will wait for the next one before considering hte entry complete. Try"

x=read.csv("test1.csv", header = FALSE, sep="'", quote="")

That didn't actually work on the file I downloaded. (And do note that the sep argument will be ignored in read.csv.) I needed to first use count.fields with that separator and then using read.table with fill =TRUE. The results were still a bit messed up with several columns being populated with commas but at least there is something to work with:

table( count.fields("~/Downloads/test1.txt", sep="'", quote=""))

 10  13 
  5 130 
 x <- read.table("~/Downloads/test1.txt", header = FALSE, sep="'", quote="", stringsAsFactors=FALSE, skip=5)
#Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#  line 6 did not have 13 elements
 x <- read.table("~/Downloads/test1.txt", header = FALSE, sep="'", 
                  quote="", stringsAsFactors=FALSE, fill=TRUE)
 str(x)
 #########################################################
'data.frame':   135 obs. of  13 variables:
 $ V1 : chr  "INSERT INTO message VALUES (52," "INSERT INTO message VALUES (53," "INSERT INTO message VALUES (54," "INSERT INTO message VALUES (55," ...
 $ V2 : chr  "press.release@enron.com" "office.chairman@enron.com" "office.chairman@enron.com" "press.release@enron.com" ...
 $ V3 : chr  "," "," "," "," ...
 $ V4 : chr  "2000-01-21 04:51:00" "2000-01-24 01:37:00" "2000-01-24 02:06:00" "2000-02-02 10:21:00" ...
 $ V5 : chr  "," "," "," "," ...
 $ V6 : chr  "<12435833.1075863606729.JavaMail.evans@thyme>" "<29664079.1075863606676.JavaMail.evans@thyme>" "<15300605.1075863606629.JavaMail.evans@thyme>" "<10522232.1075863606538.JavaMail.evans@thyme>" ...
 $ V7 : chr  "," "," "," "," ...
 $ V8 : chr  "ENRON HOSTS ANNUAL ANALYST CONFERENCE PROVIDES BUSINESS OVERVIEW AND GOALS FOR 2000" "Over $50 -- You made it happen!" "Over $50 -- You made it happen!" "ROAD-SHOW.COM Q4i.COM CHOOSE ENRON TO DELIVER FINANCIAL WEB CONTENT" ...
 $ V9 : chr  "," "," "," "," ...
 $ V10: chr  "HOUSTON - Enron Corp. hosted its annual equity analyst conference today in==20Houston.  Ken Lay, Enron chairman and chief execu"| __truncated__ "On Wall Street, people are talking about Enron.  At Enron, we re talking=20about people...our people.  You are the driving forc"| __truncated__ "On Wall Street, people are talking about Enron.  At Enron, we re talking=20about people...our people.  You are the driving forc"| __truncated__ "HOUSTON =01) Enron Broadband Services (EBS), a wholly owned subsidiary of E=nron=20Corp. and a leader in the delivery of high-b"| __truncated__ ...
 $ V11: chr  "" "," "," "," ...
 $ V12: chr  "" "Robert_Badeer_Aug2000Notes FoldersPress releases" "Robert_Badeer_Aug2000Notes FoldersPress releases" "Robert_Badeer_Aug2000Notes FoldersPress releases" ...
 $ V13: chr  "" ");" ");" ");" ...

I got better results with a comma as separator and just single quote rather than the default single- or double-quote that the read.*-functions use:

x2 <- read.table("~/Downloads/test1.txt", header = FALSE, sep=",",
                  quote="'", stringsAsFactors=FALSE, fill=TRUE)
 str(x2)

这篇关于在R中执行read.csv时，所有行都未被读取的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中执行read.csv时，所有行都未被读取 [英] All lines not being read while executing read.csv in R

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录关闭

在R中执行read.csv时，所有行都未被读取 [英] All lines not being read while executing read.csv in R

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

登录关闭