在R中执行read.csv时,所有行都未被读取 [英] All lines not being read while executing read.csv in R

查看:990
本文介绍了在R中执行read.csv时,所有行都未被读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是输入文件: http://www.yourfilelink.com/get。 php?fid = 841283 。我执行了

 选项(stringsAsFactors = FALSE)
x = read.csv(test1.csv,header = FALSE ,sep =')。

结果是: http://www.yourfilelink.com/get.php?fid=841284



给135行,我只得到7行!列数是正确的,是13. x [6,10]也有它后面的行的内容,只是在字符串中的\\\
分隔。



请帮我。我在这个问题困住了! :/

解决方案

具有多个\\\
的超长项目的描述症状表明您可能需要处理与不匹配的报价。如果在名称或地址条目中有引号,则解析器将在考虑hte条目完成之前等待下一个引号。尝试

  x = read.csv(test1.csv,header = FALSE,sep =',quote = )

这实际上并不影响我下载的文件sep参数将在 read.csv 中被忽略。)我需要首先使用count.fields和分隔符,然后使用 read.table fill = TRUE 。结果仍然有点搞砸与几个列填充逗号,但至少有一些工作:

  table(count.fields(〜/ Downloads / test1.txt,sep =',quote =))

10 13
5 130
x< - read.table(〜/ Downloads / test1.txt,header = FALSE,sep =',quote = ,stringsAsFactors = FALSE,skip = 5)
#scan中的错误(file,what,nmax,sep,dec,quote,skip,nlines,na.strings,:
#line 6没有13元素
x < - read.table(〜/ Downloads / test1.txt,header = FALSE,sep =',
quote =,stringsAsFactors = FALSE,fill = TRUE)
str(x)
####################################### ##################
'data.frame':135 obs。 of 13 variables:
$ V1:chrINSERT INTO message VALUES(52,INSERT INTO message VALUES(53,INSERT INTO message VALUES(54,INSERT INTO message VALUES(55, 。
$ V2:chrpress.release@enron.comoffice.chairman@enron.comoffice.chairman@enron.compress.release@enron.com...
$ V3:chr,,,,...
$ V4:chr2000-01-21 04:51:002000-01-24 01:37 :002000-01-24 02:06:002000-02-02 10:21:00...
$ V5:chr,,, ...
$ V6:chr< 12435833.1075863606729.JavaMail.evans@thyme><29664079.1075863606676.JavaMail.evans@thyme><15300605.1075863606629.JavaMail.evans@thyme> 10522232.1075863606538.JavaMail.evans@thyme>...
$ V7:chr,,,,...
$ V8:chrENRON HOSTS年度分析会议业务概述和2000年目标超过$ 50 - 你做到了!超过$ 50 - 你做到了!ROAD-SHOW.COM Q4i.COM选择ENRON提供财务网站内容... $ HOUSTON - 安然公司今天在== 20休斯顿举办年度股东分析师大会。安永董事长兼首席执行官Ken Lay| __truncated__在华尔街,人们都在谈论安然。在安然,我们谈论= 20个人...我们的人。你是驾驶forc| __truncated__在华尔街,人们都在谈论安然。在安然,我们谈论= 20个人...我们的人。您是驾驶forc| __truncated__HOUSTON = 01)Enron宽带服务(EBS),E = nron = 20Corp的全资子公司。和一个领导者交付的高b| __truncated__ ...
$ V11:chr,,,...
$ V12:chrRobert_Badeer_Aug2000Notes FoldersPress版本Robert_Badeer_Aug2000Notes FoldersPress版本Robert_Badeer_Aug2000Notes FoldersPress版本... ...
$ V13:chr);););...

我有更好的结果以逗号作为分隔符,只是单引号,而不是默认的单引号或双引号, read。*。read-table(〜/ read)。

$

Downloads / test1.txt,header = FALSE,sep =,,
quote =',stringsAsFactors = FALSE,fill = TRUE)
str(x2)


This is the input file: http://www.yourfilelink.com/get.php?fid=841283 . I executed

options(stringsAsFactors=FALSE)
x=read.csv("test1.csv", header = FALSE, sep="'"). 

The result is this: http://www.yourfilelink.com/get.php?fid=841284

Instead of giving 135 rows, I am getting only 7 rows! Number of columns is correct, and is 13. x[6,10] has the content of the rows following it as well, just separated by \n in the string.

Please help me in this. I am stuck up in this problem! :/

解决方案

The described symptom of an extremely long item with multiple "\n"'s suggests you probably need to deal with unmatched quotes. If there is a quote mark in a name or address entry then the parser will wait for the next one before considering hte entry complete. Try"

x=read.csv("test1.csv", header = FALSE, sep="'", quote="")

That didn't actually work on the file I downloaded. (And do note that the sep argument will be ignored in read.csv.) I needed to first use count.fields with that separator and then using read.table with fill =TRUE. The results were still a bit messed up with several columns being populated with commas but at least there is something to work with:

table( count.fields("~/Downloads/test1.txt", sep="'", quote=""))

 10  13 
  5 130 
 x <- read.table("~/Downloads/test1.txt", header = FALSE, sep="'", quote="", stringsAsFactors=FALSE, skip=5)
#Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#  line 6 did not have 13 elements
 x <- read.table("~/Downloads/test1.txt", header = FALSE, sep="'", 
                  quote="", stringsAsFactors=FALSE, fill=TRUE)
 str(x)
 #########################################################
'data.frame':   135 obs. of  13 variables:
 $ V1 : chr  "INSERT INTO message VALUES (52," "INSERT INTO message VALUES (53," "INSERT INTO message VALUES (54," "INSERT INTO message VALUES (55," ...
 $ V2 : chr  "press.release@enron.com" "office.chairman@enron.com" "office.chairman@enron.com" "press.release@enron.com" ...
 $ V3 : chr  "," "," "," "," ...
 $ V4 : chr  "2000-01-21 04:51:00" "2000-01-24 01:37:00" "2000-01-24 02:06:00" "2000-02-02 10:21:00" ...
 $ V5 : chr  "," "," "," "," ...
 $ V6 : chr  "<12435833.1075863606729.JavaMail.evans@thyme>" "<29664079.1075863606676.JavaMail.evans@thyme>" "<15300605.1075863606629.JavaMail.evans@thyme>" "<10522232.1075863606538.JavaMail.evans@thyme>" ...
 $ V7 : chr  "," "," "," "," ...
 $ V8 : chr  "ENRON HOSTS ANNUAL ANALYST CONFERENCE PROVIDES BUSINESS OVERVIEW AND GOALS FOR 2000" "Over $50 -- You made it happen!" "Over $50 -- You made it happen!" "ROAD-SHOW.COM Q4i.COM CHOOSE ENRON TO DELIVER FINANCIAL WEB CONTENT" ...
 $ V9 : chr  "," "," "," "," ...
 $ V10: chr  "HOUSTON - Enron Corp. hosted its annual equity analyst conference today in==20Houston.  Ken Lay, Enron chairman and chief execu"| __truncated__ "On Wall Street, people are talking about Enron.  At Enron, we re talking=20about people...our people.  You are the driving forc"| __truncated__ "On Wall Street, people are talking about Enron.  At Enron, we re talking=20about people...our people.  You are the driving forc"| __truncated__ "HOUSTON =01) Enron Broadband Services (EBS), a wholly owned subsidiary of E=nron=20Corp. and a leader in the delivery of high-b"| __truncated__ ...
 $ V11: chr  "" "," "," "," ...
 $ V12: chr  "" "Robert_Badeer_Aug2000Notes FoldersPress releases" "Robert_Badeer_Aug2000Notes FoldersPress releases" "Robert_Badeer_Aug2000Notes FoldersPress releases" ...
 $ V13: chr  "" ");" ");" ");" ...

I got better results with a comma as separator and just single quote rather than the default single- or double-quote that the read.*-functions use:

x2 <- read.table("~/Downloads/test1.txt", header = FALSE, sep=",",
                  quote="'", stringsAsFactors=FALSE, fill=TRUE)
 str(x2)

这篇关于在R中执行read.csv时,所有行都未被读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆