如何纠正数据框架上的字符编码 [英] How to correct the encoding of characters on a data.frame
问题描述
我有一个这样的数据框:
data.names< -data.frame(DATA = c :5))
rownames(data.names)< -c(IV\xc1N,JOS\xc9,LUC\xcdA,RAM\xd3N,TO\\ \\ xd1O)
data.names
#DATA
#IV\xc1N 1
#JOS\xc9 2
#LUC\xcdA 3
#RAM\xd3N 4
#TO\xd1O 5
我想要不正确字母替换为正确的(Á,É,Í,...)。明确地说,我想使用申请,因为我看到这是更有效率适用于。我的想法是改变这些字母的功能:
letters1< -c(\xc1,\xc9 \xcd,\xd3,\xd1)#Á,É,Í,Ó,Ñ
letters2 <-c(Á,É,Í ,Ó,Ñ)
change.names< - function(x){sub(letters1 [x],letters2 [x],rownames(data.names))}
现在,有一个没有任何问题:
for(i in 1:5)rownames(data.names)< -change.names(i)
data.names
#DATA
# IVÁN1
#JOSÉ2
#LUCÍA3
#RAMÓN4
#TOÑO5
但我没有太多的想法如何做到与申请。我试过:
apply(matrix(c(1:5),ncol = 5),2,change.names )
输出是一个5列的矩阵,每个列只改变一个字母,我可以不知道如何分配给 rownames(data.names)
它们的混合,或者有用的东西。
你甚至不需要使用apply,因为 rownames(data.names)
是一个向量,向量可能是
>编码(rownames(data.names))< - 'latin1'
/ pre>
> data.names
DATA
IVÁN1
JOSÉ2
LUCÍA3
RAMÓN4
TOÑO5
请阅读此回答了解有关编码。
I have a data frame make like this:
data.names<-data.frame(DATA=c(1:5)) rownames(data.names)<-c("IV\xc1N","JOS\xc9","LUC\xcdA","RAM\xd3N","TO\xd1O") data.names # DATA # IV\xc1N 1 # JOS\xc9 2 # LUC\xcdA 3 # RAM\xd3N 4 # TO\xd1O 5
I want the incorrect letters replace by the right ones (Á,É,Í,...). Make clear that I want to use apply because I read that is much more efficient apply than for. My idea is make a function that changes these letters:
letters1<-c("\xc1","\xc9","\xcd","\xd3", "\xd1") #Á,É,Í,Ó,Ñ letters2<-c("Á","É","Í","Ó","Ñ") change.names <- function(x){sub(letters1[x], letters2[x],rownames(data.names))}
Now, with a for I haven't any problems:
for(i in 1:5) rownames(data.names)<-change.names(i) data.names # DATA # IVÁN 1 # JOSÉ 2 # LUCÍA 3 # RAMÓN 4 # TOÑO 5
But I don't have much idea how to do it with apply. I've tried:
apply(matrix(c(1:5),ncol=5),2,change.names)
And the output is a matrix with 5 columns, where each one only changes one letter and I can't know how to assign to
rownames(data.names)
a "mix" of them, or something that works.解决方案You don't even need to use apply, because
rownames(data.names)
is a vector and vectors may be recycled> Encoding(rownames(data.names)) <- 'latin1' > data.names DATA IVÁN 1 JOSÉ 2 LUCÍA 3 RAMÓN 4 TOÑO 5
Please read this answer for more details about the encoding.
这篇关于如何纠正数据框架上的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!