R 的 read.csv 在第一列名称前面加上垃圾文本 [英] R's read.csv prepending 1st column name with junk text
问题描述
我已将数据从 SQL Server Management Studio 中的结果网格导出到 csv 文件.csv 文件看起来是正确的.
I have exported data from a result grid in SQL Server Management Studio to a csv file. The csv file looks correct.
但是当我使用 read.csv 将数据读入 R 数据帧时,第一列名称前面带有ï..".我如何摆脱这些垃圾文本?
But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?
示例:
str(trainData)
'data.frame': 64169 obs. of 20 variables:
$ ï..Column1 : int 3232...
$ Column2 : int 4242...
数据看起来像这样(没什么特别的):
The data looks something like this (nothing special) :
第一列、第二列
100116577,100116577
100116698,100116702
Column1,Column2
100116577,100116577
100116698,100116702
推荐答案
文件开头有一个 Unicode UTF-8 BOM:
You've got a Unicode UTF-8 BOM at the start of the file:
http://en.wikipedia.org/wiki/Byte_order_mark
文本编辑器或网络浏览器将文本解释为 ISO-8859-1 或CP1252 将为此显示字符 
A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this
R 为您提供 ï,然后将其他两个转换为点,因为它们是非字母数字字符.
R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.
这里:
http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html
邓肯默多克建议:
如果你想,你可以声明一个文件为UTF-8-BOM"编码忽略输入的 BOM
You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input
所以用 fileEncoding="UTF-8-BOM"
试试你的 read.csv
或者说服你的 SQL wotsit 不输出 BOM.
So try your read.csv
with fileEncoding="UTF-8-BOM"
or persuade your SQL wotsit to not output a BOM.
否则你也可以测试名字是否以 ï..
开头并用 substr
去掉它(只要你知道你永远不会有一个列确实是这样开始的……)
Otherwise you may as well test if the first name starts with ï..
and strip it with substr
(as long as you know you'll never have a column that does start like that genuinely...)
这篇关于R 的 read.csv 在第一列名称前面加上垃圾文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!