R的read.csv在第一列名称前添加垃圾文本 [英] R's read.csv prepending 1st column name with junk text

查看:137
本文介绍了R的read.csv在第一列名称前添加垃圾文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将数据从SQL Server Management Studio中的结果网格导出到csv文件. csv文件看起来正确.

但是当我使用read.csv将数据读入R数据帧时,第一列名称前面带有"ï.. ".我如何摆脱这些垃圾文字?

示例:

str(trainData)

'data.frame':   64169 obs. of  20 variables:    
 $ ï..Column1             : int  3232...   
 $ Column2                : int  4242...

数据看起来像这样(没什么特别的):

第1列,第2列
100116577,100116577
100116698,100116702

解决方案

在文件开头,您已经获得了Unicode UTF-8 BOM:

http://en.wikipedia.org/wiki/Byte_order_mark

文本编辑器或Web浏览器将文本解释为ISO-8859-1或 CP1252为此将显示字符»"

R给您ï,然后将其他两个转换为点,因为它们是非字母数字字符.

这里:

因此,尝试将read.csvfileEncoding="UTF-8-BOM"一起使用,或者说服SQL wotsit不输出BOM.

否则,您还可以测试名字是否以ï..开头并以substr剥离(只要您知道永远不会有真正以该名字开头的列...)

I have exported data from a result grid in SQL Server Management Studio to a csv file. The csv file looks correct.

But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?

Example:

str(trainData)

'data.frame':   64169 obs. of  20 variables:    
 $ ï..Column1             : int  3232...   
 $ Column2                : int  4242...

The data looks something like this (nothing special) :

Column1,Column2
100116577,100116577
100116698,100116702

解决方案

You've got a Unicode UTF-8 BOM at the start of the file:

http://en.wikipedia.org/wiki/Byte_order_mark

A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this

R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.

Here:

http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html

Duncan Murdoch suggests:

You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input

So try your read.csv with fileEncoding="UTF-8-BOM" or persuade your SQL wotsit to not output a BOM.

Otherwise you may as well test if the first name starts with ï.. and strip it with substr (as long as you know you'll never have a column that does start like that genuinely...)

这篇关于R的read.csv在第一列名称前添加垃圾文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆