关于使用编码功能 [英] About the use of Encoding function

查看:448
本文介绍了关于使用编码功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 编码(self $ Data $ Skills) - UTF-8

但是当我更改列的名称时:

  colnames(self $ Data)<  - 'skills2'

并再次运行:

 编码(self $ Data $ skills2) UTF-8

我有以下错误:

 Encoding中的错误(`* tmp *`,value =UTF-8):
a需要字符向量参数

我不明白为什么会发生这种情况。任何想法?另外,如果我想从这个数据帧中采样数据,情况也是如此。使用:

  self $ Data<  -  data.frame(df [sample(nrow(self $ Data)),dim $ data)[1] * samplePersentance),])

列名称更改,当我编码功能我得到相同的错误。数据是使用 read.csv 函数导入的。



编辑
数据头

 技能
1 null
2'
3'Fin Gaap'
4'Knæ-igennem-hinanden-tr ...
5'Mønt-dans-på-knoerne-tr ...
6必要的知识...

> typeof(self $ Data)
[1]list

> class(s​​elf $ Data)
[1]data.frame

错误:

  try1<  -  structure(list(Skills = c(null,\ ,\\Fin Gaap'\,
\'KnÃ|-igennem-hinanden-tr ... \,\'Mønt- dns-pÃ¥-knoerne-tr ... \,
\'必要的知识... \)),.Names =技能,row.names = c (NA,
6L),class =data.frame)


编码(try1 $技能)< - 'UTF-8'
#the函数正常运行
try2< - data.frame(try1 [sample(nrow(try1),floor(dim(try1)[1] * 0.5))]]
colnames(try2) - 'skills2'
编码(try2 $ skills2)< - 'UTF-8'
#the函数输出错误。

> typeof(try1 $ skills)
'character'
> typeof(try2 $ skills)
'intiger'


解决方案

p>问题是, data.frame 与其默认的 stringsAsFactors = TRUE 将列变成一个因素: p>

  try2<  -  data.frame(try1 [sample(nrow(try1)),floor(dim(try1)[1] * 0.5 ))])
colnames(try2)< - 'skills2'
#'data.frame':3 obs。的1个变量:
#$ skills2:因素w / 3级别\'\,\Fin Gaap'\,..:3 1 2

str(try2)
编码(try2 $ skills2)< - 'UTF-8'
#编码< -`(`* tmp *`,value = 8):
#一个字符向量参数期待

try2 $ skills2< -as.character(try2 $ skills2)
编码(try2 $ skills2)< UTF-8'
#works

当然你不需要 data.frame 在该行中...


I am using the following code for importing special characters in R:

Encoding(self$Data$Skills) <- "UTF-8"

But when I change the name of the column with:

colnames(self$Data) <- 'skills2'

and run again:

Encoding(self$Data$skills2) <- "UTF-8"

I have the following error:

Error in `Encoding<-`(`*tmp*`, value = "UTF-8") : 
a character vector argument expected

I do not understand why is this happening. Any idea? Additionally, the same is happening if I want to sample data from this dataframe. Using:

self$Data <- data.frame(df[sample(nrow(self$Data),dim(self$Data)[1]*samplePersentance),])

the column name changes and when i encoding function i got the same error.The data is imported using read.csv function.

Edit: Head of the data

                         Skills
1                          null
2                           "'"
3                  "'Fin Gaap'"
4 "'Knæ-igennem-hinanden-tr..."
5 "'Mønt-dans-på-knoerne-tr..."
6  "'Necessary knowledge of..."

> typeof(self$Data)
[1] "list"

> class(self$Data)
[1] "data.frame"

And to reproduce the error:

try1 <- structure(list(Skills = c("null", "\"'\"", "\"'Fin Gaap'\"", 
"\"'Knæ-igennem-hinanden-tr...\"", "\"'Mønt-dans-på-knoerne-tr...\"", 
"\"'Necessary knowledge of...\"")), .Names = "Skills", row.names = c(NA, 
6L), class = "data.frame")


Encoding(try1$Skills) <- 'UTF-8'
#the function runs normally
try2 <- data.frame(try1[sample(nrow(try1),floor(dim(try1)[1]*0.5)),])
colnames(try2) <- 'skills2'
Encoding(try2$skills2) <- 'UTF-8'
#the function output an error.

> typeof(try1$skills)
'character'
> typeof(try2$skills)
'intiger'

解决方案

The problem is that data.frame with its default stringsAsFactors = TRUE turns the column into a factor:

try2 <- data.frame(try1[sample(nrow(try1),floor(dim(try1)[1]*0.5)),])
colnames(try2) <- 'skills2'
#'data.frame':  3 obs. of  1 variable:
#  $ skills2: Factor w/ 3 levels "\"'\"","\"'Fin Gaap'\"",..: 3 1 2

str(try2)
Encoding(try2$skills2) <- 'UTF-8'
#Error in `Encoding<-`(`*tmp*`, value = "UTF-8") : 
#  a character vector argument expected

try2$skills2 <-as.character(try2$skills2)
Encoding(try2$skills2) <- 'UTF-8'
#works

Of course you don't need data.frame in that line at all ...

这篇关于关于使用编码功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆