R的read.csv在第一列名称前添加垃圾文本 [英] R's read.csv prepending 1st column name with junk text
问题描述
我已将数据从SQL Server Management Studio中的结果网格导出到csv文件. csv文件看起来正确.
但是当我使用read.csv将数据读入R数据帧时,第一列名称前面带有"ï.. ".我如何摆脱这些垃圾文字?
示例:
str(trainData)
'data.frame': 64169 obs. of 20 variables:
$ ï..Column1 : int 3232...
$ Column2 : int 4242...
数据看起来像这样(没什么特别的):
第1列,第2列
100116577,100116577
100116698,100116702
在文件开头,您已经获得了Unicode UTF-8 BOM:
http://en.wikipedia.org/wiki/Byte_order_mark
文本编辑器或Web浏览器将文本解释为ISO-8859-1或 CP1252为此将显示字符»"
R给您ï,然后将其他两个转换为点,因为它们是非字母数字字符.
这里:
因此,尝试将 否则,您还可以测试名字是否以 I have exported data from a result grid in SQL Server Management Studio to a csv file.
The csv file looks correct. But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text? Example: The data looks something like this (nothing special) : Column1,Column2 You've got a Unicode UTF-8 BOM at the start of the file: http://en.wikipedia.org/wiki/Byte_order_mark A text editor or web browser interpreting the text as ISO-8859-1 or
CP1252 will display the characters  for this R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters. Here: http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html Duncan Murdoch suggests: You can declare a file to be in encoding "UTF-8-BOM" if you want to
ignore a BOM on input So try your Otherwise you may as well test if the first name starts with 这篇关于R的read.csv在第一列名称前添加垃圾文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!read.csv
与fileEncoding="UTF-8-BOM"
一起使用,或者说服SQL wotsit不输出BOM.ï..
开头并以substr
剥离(只要您知道永远不会有真正以该名字开头的列...)>str(trainData)
'data.frame': 64169 obs. of 20 variables:
$ ï..Column1 : int 3232...
$ Column2 : int 4242...
100116577,100116577
100116698,100116702
read.csv
with fileEncoding="UTF-8-BOM"
or persuade your SQL wotsit to not output a BOM.ï..
and strip it with substr
(as long as you know you'll never have a column that does start like that genuinely...)