在数据框中将因子转换为整数 [英] Convert factor to integer in a data frame

查看:362
本文介绍了在数据框中将因子转换为整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码

  anna.table< -data.frame(anna1,anna2)
write .table< ;-( anna.table,file =anna.file.txt,sep ='\t',quote = FALSE)

我的表格最后包含如下所示的数字

  chr start end score 
chr2 41237927 41238801 151
chr1 36976262 36977889 226
chr8 83023623 83025129 185

等等......



之后,我试图只获取符合一些标准的值,例如得分小于特定值



所以我做了以下

  anna3<  - data / anna / anna.file.txt
anna.total< -read.table(anna3,header = TRUE)
significant.anna< -subset(anna.total,score <= 0.001)

错误:在Ops.factor(分数,0.001)< =对于因素

所以我猜这个问题m是我的表有因素而不是整数



我想我的anna.total $分数是一个因素,我必须使它成为一个整数



如果我正确读取as.numeric可能会解决我的问题



我正在阅读as.numeric函数,但我不明白我可以使用它



那么你能给我一些建议吗?



提前谢谢你



最好的问候
安娜



PS:我尝试了以下

  anna3<  - data / anna / anna.file.txt
anna.total< -read.table(anna3,header = TRUE)
anna.total $ score.new< -as.numeric(as.character(anna.total $ score))
write.table(anna.total,file =peak.list.numeric.v3.txt ,append = FALSE,quote = FALSE,col.names = TRUE,row.names = FALSE,sep =\t)

annapeaks< -subset(anna.total,fdr。新的< = 0.001)
警告消息:
1:在Ops.factor(score,0.001)中:< =对于因子
无效

再次我有同样的问题......

解决方案

使用 anna.table (它是一个数据框,顺便说一句,表是别的!),最简单的方法就是做:

  anna.table2<  -  data.matrix(anna.table)

as data.matrix()将会将因子转换为底层的数字(整数)级别。这将适用于仅包含数字,整数,因子或可强制为数字的其他变量的数据帧,但任何字符串(字符)都将导致矩阵变为字符矩阵。



如果您想要 anna.table2 作为数据框,而不是矩阵,则可以随后执行以下操作:

  anna.table2<  -  data.frame(anna.table2)

其他选项是将所有因子变量强制为其整数级别。这是一个例子:

  ##虚拟数据
set.seed(1)
dat < - data.frame(a = factor(sample(letters(letters [1:3],10,replace = TRUE)),
b = runif(10))

## `dat`,将因子转换为数字
dat2 < - sapply(dat,function(x)if(is.factor(x)){
as.numeric(x)
} else {
x
})
dat2< - data.frame(dat2)##转换为数据框

其中:

 > str(dat)
'data.frame':10 obs。的2个变量:
$ a:因子w / 3级别a,b,c:1 2 2 3 1 3 3 2 2 1
$ b:num 0.206 0.177 0.687 0.384 0.77 ...
> str(dat2)
'data.frame':10 obs。的2个变量:
$ a:num 1 2 2 3 1 3 3 2 2 1
$ b:num 0.206 0.177 0.687 0.384 0.77 ...
/ pre>

但是,请注意,只有当您想要基础数字表示时,上述操作才有效。如果您的因子具有基本的数字级别,那么我们需要更清晰地了解如何将因子转换为数字,同时保留在级别中编码的数字信息。以下是一个例子:

  ## dummy data 
set.seed(1)
dat3< - data.frame(a = factor(sample(1:3,10,replace = TRUE),levels = 3:1),
b = runif(10))

## sapply通过`dat3`,将因子转换为数字
dat4< - sapply(dat3,function(x)if(is.factor(x)){
as.numeric(as.character(x))
} else {
x
})
dat4< - data.frame(dat4)##转换为数据框
/ pre>

注意在我们做$ $ c $之前,我们需要先做 as.character(x) c> as.numeric()。在我们将其转换为数字之前,额外的调用对级别信息进行编码。要了解为什么这么重要,请注意, dat3 $ a

 > ; dat3 $ a 
[1] 1 2 2 3 1 3 3 2 2 1
等级:3 2 1

如果我们将它转​​换为数字,我们得到错误的数据,因为R转换基础级代码

 > as.numeric(dat3 $ a)
[1] 3 2 2 1 3 1 1 2 2 3

如果我们首先将因子强制为一个字符向量,然后再强制为数字,那么我们保留原始信息不是R的内部表示。

 > as.numeric(as.character(dat3 $ a))
[1] 1 2 2 3 1 3 3 2 2 1

如果你的数据像这个第二个例子,那么你不能使用简单的 data.matrix()技巧,因为它是一样的因为将 as.numeric()直接应用于该因子,如第二个例子所示,不会保留原始信息。


I have the following code

anna.table<-data.frame (anna1,anna2)
write.table<-(anna.table, file="anna.file.txt",sep='\t', quote=FALSE) 

my table in the end contains numbers such as the following

chr         start    end      score
chr2      41237927  41238801    151
chr1      36976262  36977889    226
chr8      83023623  83025129    185

and so on......

after that i am trying to to get only the values which fit some criteria such as score less than a specific value

so i am doing the following

anna3<-"data/anna/anna.file.txt"
anna.total<-read.table(anna3,header=TRUE)
significant.anna<-subset(anna.total,score <=0.001)

Error: In Ops.factor(score, 0.001) <= not meaningful for factors

so i guess the problem is that my table has factors and not integers

I guess that my anna.total$score is a factor and i must make it an integer

If i read correctly the as.numeric might solve my problem

i am reading about the as.numeric function but i cannot understand how i can use it

Hence could you please give me some advices?

thank you in advance

best regards Anna

PS : i tried the following

anna3<-"data/anna/anna.file.txt"
anna.total<-read.table(anna3,header=TRUE)
anna.total$score.new<-as.numeric (as.character(anna.total$score))
write.table(anna.total,file="peak.list.numeric.v3.txt",append = FALSE ,quote = FALSE,col.names =TRUE,row.names=FALSE, sep="\t")

anna.peaks<-subset(anna.total,fdr.new <=0.001)
Warning messages:
1: In Ops.factor(score, 0.001) : <= not meaningful for factors

again i have the same problem......

解决方案

With anna.table (it is a data frame by the way, a table is something else!), the easiest way will be to just do:

anna.table2 <- data.matrix(anna.table)

as data.matrix() will convert factors to their underlying numeric (integer) levels. This will work for a data frame that contains only numeric, integer, factor or other variables that can be coerced to numeric, but any character strings (character) will cause the matrix to become a character matrix.

If you want anna.table2 to be a data frame, not as matrix, then you can subsequently do:

anna.table2 <- data.frame(anna.table2)

Other options are to coerce all factor variables to their integer levels. Here is an example of that:

## dummy data
set.seed(1)
dat <- data.frame(a = factor(sample(letters[1:3], 10, replace = TRUE)), 
                  b = runif(10))

## sapply over `dat`, converting factor to numeric
dat2 <- sapply(dat, function(x) if(is.factor(x)) {
                                    as.numeric(x)
                                } else {
                                    x
                                })
dat2 <- data.frame(dat2) ## convert to a data frame

Which gives:

> str(dat)
'data.frame':   10 obs. of  2 variables:
 $ a: Factor w/ 3 levels "a","b","c": 1 2 2 3 1 3 3 2 2 1
 $ b: num  0.206 0.177 0.687 0.384 0.77 ...
> str(dat2)
'data.frame':   10 obs. of  2 variables:
 $ a: num  1 2 2 3 1 3 3 2 2 1
 $ b: num  0.206 0.177 0.687 0.384 0.77 ...

However, do note that the above will work only if you want the underlying numeric representation. If your factor has essentially numeric levels, then we need to be a bit cleverer in how we convert the factor to a numeric whilst preserving the "numeric" information coded in the levels. Here is an example:

## dummy data
set.seed(1)
dat3 <- data.frame(a = factor(sample(1:3, 10, replace = TRUE), levels = 3:1), 
                   b = runif(10))

## sapply over `dat3`, converting factor to numeric
dat4 <- sapply(dat3, function(x) if(is.factor(x)) {
                                    as.numeric(as.character(x))
                                } else {
                                    x
                                })
dat4 <- data.frame(dat4) ## convert to a data frame

Note how we need to do as.character(x) first before we do as.numeric(). The extra call encodes the level information before we convert that to numeric. To see why this matters, note what dat3$a is

> dat3$a
 [1] 1 2 2 3 1 3 3 2 2 1
Levels: 3 2 1

If we just convert that to numeric, we get the wrong data as R converts the underlying level codes

> as.numeric(dat3$a)
 [1] 3 2 2 1 3 1 1 2 2 3

If we coerce the factor to a character vector first, then to a numeric one, we preserve the original information not R's internal representation

> as.numeric(as.character(dat3$a))
 [1] 1 2 2 3 1 3 3 2 2 1

If your data are like this second example, then you can't use the simple data.matrix() trick as that is the same as applying as.numeric() directly to the factor and as this second example shows, that doesn't preserve the original information.

这篇关于在数据框中将因子转换为整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆