如何纠正数据框架上的字符编码 [英] How to correct the encoding of characters on a data.frame

查看:152
本文介绍了如何纠正数据框架上的字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框:

  data.names< -data.frame(DATA = c :5))
rownames(data.names)< -c(IV\xc1N,JOS\xc9,LUC\xcdA,RAM\xd3N,TO\\ \\ xd1O)
data.names
#DATA
#IV\xc1N 1
#JOS\xc9 2
#LUC\xcdA 3
#RAM\xd3N 4
#TO\xd1O 5

我想要不正确字母替换为正确的(Á,É,Í,...)。明确地说,我想使用申请,因为我看到这是更有效率适用于。我的想法是改变这些字母的功能:

  letters1< -c(\xc1,\xc9 \xcd,\xd3,\xd1)#Á,É,Í,Ó,Ñ
letters2 <-c(Á,É,Í ,Ó,Ñ)
change.names< - function(x){sub(letters1 [x],letters2 [x],rownames(data.names))}

现在,有一个没有任何问题:

  for(i in 1:5)rownames(data.names)< -change.names(i)
data.names
#DATA
# IVÁN1
#JOSÉ2
#LUCÍA3
#RAMÓN4
#TOÑO5

但我没有太多的想法如何做到与申请。我试过:

  apply(matrix(c(1:5),ncol = 5),2,change.names )

输出是一个5列的矩阵,每个列只改变一个字母,我可以不知道如何分配给 rownames(data.names)它们的混合,或者有用的东西。

解决方案

你甚至不需要使用apply,因为 rownames(data.names)是一个向量,向量可能是

 >编码(rownames(data.names))<  - 'latin1'
> data.names
DATA
IVÁN1
JOSÉ2
LUCÍA3
RAMÓN4
TOÑO5
/ pre>

请阅读此回答了解有关编码。


I have a data frame make like this:

data.names<-data.frame(DATA=c(1:5))
rownames(data.names)<-c("IV\xc1N","JOS\xc9","LUC\xcdA","RAM\xd3N","TO\xd1O")
data.names
#          DATA
# IV\xc1N     1
# JOS\xc9     2
# LUC\xcdA    3
# RAM\xd3N    4
# TO\xd1O     5

I want the incorrect letters replace by the right ones (Á,É,Í,...). Make clear that I want to use apply because I read that is much more efficient apply than for. My idea is make a function that changes these letters:

letters1<-c("\xc1","\xc9","\xcd","\xd3", "\xd1") #Á,É,Í,Ó,Ñ
letters2<-c("Á","É","Í","Ó","Ñ")
change.names <- function(x){sub(letters1[x], letters2[x],rownames(data.names))}

Now, with a for I haven't any problems:

for(i in 1:5) rownames(data.names)<-change.names(i)
data.names
#       DATA
# IVÁN     1
# JOSÉ     2
# LUCÍA    3
# RAMÓN    4
# TOÑO     5

But I don't have much idea how to do it with apply. I've tried:

apply(matrix(c(1:5),ncol=5),2,change.names)

And the output is a matrix with 5 columns, where each one only changes one letter and I can't know how to assign to rownames(data.names) a "mix" of them, or something that works.

解决方案

You don't even need to use apply, because rownames(data.names) is a vector and vectors may be recycled

> Encoding(rownames(data.names)) <- 'latin1'
> data.names
         DATA
IVÁN        1
JOSÉ        2
LUCÍA       3
RAMÓN       4
TOÑO        5

Please read this answer for more details about the encoding.

这篇关于如何纠正数据框架上的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆