data.table :: fread和Unbalanced“ [英] data.table::fread and Unbalanced "

查看:101
本文介绍了data.table :: fread和Unbalanced“的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我试图使用 data.table:fread(fn,sep ='\t',header = T)读取csv文件时,数据有3个整数变量和1个字符串变量,csv文件中的字符串没有用括起来,并且有一些在字符串变量中包含的字符串和字符不是成对的。



我想知道是否可以让 fread 忽略不成对的

以下是示例数据(只有一条记录)

  N_ID VISIT_DATE REQ_URL REQType 
175931 2013-3-8 23:40:30 http://aaa.com/rest/api2.do?api=getSetMobileSession&data={\"imei :60893ZTE-CN13cd,appkey:android_client,content:Z0JiRA0qPFtWM3BYVltmcx5MWF9ZS0YLdW1ydXoqPycuJS8idXdlY3R0TGBtU 1


解决方案

更新:现在在v1.8.11中实施



来自NEWS:


fread现在在字段中间接受引号(和),
是否以或开头,而不是
quotes'错误,#2694。感谢百度报告。它是已知的,
记录在?fread的顶部(文本现在已删除)。如果字段以它必须结束开头
(如果字段分隔符本身位于
字段内容中则是必需的)。嵌入式引号也可以在列名中。换行符(\\\

仍然不能在引用的字段或引用的列名称中。







是的,@agstudy说,嵌入式引用是一个已知的文档问题尚未实现,因为 fread 是新的。严格来说,我想这些都不是嵌入式的,因为你的示例中的字符串不是以引号开头。



无论如何,我提交了一个bug报告,所以它不会被遗忘。要在下一个版本中完成。感谢您突出显示。



#2694:字符串中包含引号,但不是以fread中的引号开头


When I tried to read a csv file using data.table:fread(fn, sep='\t', header=T), it gives an "Unbalanced " observed on this line" error. The data has 3 integer variables and 1 string variable. The strings in the csv file are not enclosed with ", and yes there are some lines that contains " within the string variable and the " characters are not in pairs.

I am wondering is it possible to let fread just ignore the unpaired " in the variable and continue reading data? Thanks.

Here is the sample data(just one record)

N_ID    VISIT_DATE  REQ_URL REQType
175931  2013-3-8 23:40:30   http://aaa.com/rest/api2.do?api=getSetMobileSession&data={"imei":"60893ZTE-CN13cd","appkey":"android_client","content":"Z0JiRA0qPFtWM3BYVltmcx5MWF9ZS0YLdW1ydXoqPycuJS8idXdlY3R0TGBtU   1

解决方案

UPDATE: Now implemented in v1.8.11

From NEWS :

fread now accepts quotes (both ' and ") in the middle of fields, whether the field starts with " or not, rather than the 'unbalanced quotes' error, #2694. Thanks to baidao for reporting. It was known and documented at the top of ?fread (text now removed). If a field starts with " it must end with " (necessary if the field separator itself is in the field contents). Embedded quotes can be in column names too. Newlines (\n) still can't be in quoted fields or quoted column names, yet.


Yes as @agstudy said, embedded quotes are a known documented problem not yet implemented since fread is new. Strictly speaking, I suppose these ones aren't embedded because the string in your example doesn't start with a quote, though.

Anyway, I've filed this as a bug report so it doesn't get forgotten. To be done in the next release. Thanks for highlighting.

#2694 : Strings including quotes but not starting with quote in fread

这篇关于data.table :: fread和Unbalanced“的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆