熔化数组并使数值字符 [英] Melt a array and make numeric values character

查看:121
本文介绍了熔化数组并使数值字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组,我想根据dimnames融化它。问题是维度名称是大的数字值,因此使它们的字符将它们转换为错误的ID参见示例:

  test < -  array(1:18,dim = c(3,3,2),dimnames = list(c(00901291282245454545454,329293929929292,2929992929922929),
c(a,b,c ),
c(d,e)))

library(reshape2)
library(data.table)
test2 <
test2 [,Var1:= as.character(Var1)]

> test2
Var1 Var2 Var3 value
1:9.01291282245455e + 20 ad 1
2:329293929929292 ad 2
3:2929992929922929 ad 3
4:9.01291282245455e + 20 bd 4
5:329293929929292 bd 5
6:2929992929922929 bd 6
7:9.01291282245455e + 20 cd 7
8:329293929929292 cd 8
9:2929992929922929 cd 9
10:9.01291282245455e + 20 ae 10
11:329293929929292 ae 11
12:2929992929922929 ae 12
13:9.01291282245455e + 20 be 13
14:329293929929292 be 14
15:2929992929922929 be 15
16:9.01291282245455e + 20 ce 16
17:329293929929292 ce 17
18:2929992929922929 ce 18

如何使第一列有大ID字符?我现在做的是将字符字母粘贴到dimnames然后融化,使它成为一个字符,然后采取一个子字符串,这是真的效率低下。重要的是,它是一个有效的解决方案,因为数据集是数百万行。有两个问题,首先0被删除,如果他们在ID前面,并将其转换为e + 20字符。

解决方案

您需要将您的dimnames定义为字符,然后轻轻地修改 melt.array $ $ test <-array(1:18,dim = c(3,3,2),dimnames = list(c(00901291282245454545454,329293929929292,2929992929922929),
c (a,b,c),
c(d,e)))

自定义 melt.array 添加一个参数,允许您决定是否要转换:

  melt.array2 < -  function(data,varnames = names(dimnames(data)),conv = TRUE,...)
{
$< - as.vector(data)
dn < - dimnames(data)
if(is.null(dn))
dn < ,length(dim(data)))
dn_missing < - sapply(dn,is.null)
dn [dn_missing] )[dn_missing]
if(conv){#conv是知道是否需要转换的新参数
char< - sapply(dn,is.character)
dn [char ] < - lapply(dn [char],type.convert)
}
索引< - do.call(expand.grid,dn)
names(indices)< - varnames
data.frame(indices,value = values)
}

您的示例中的新函数( conv = FALSE ):

  (melt.array2(test,conv = FALSE))
#X1 X2 X3 value
#1 00901291282245454545454 ad 1
#2 329293929929292 ad 2
#3 2929992929922929 ad 3
#4 00901291282245454545454 bd 4
#5 329293929929292 bd 5
#6 2929992929922929 bd 6


$ b b

EDIT



在开发版本 reshape2 devtools :: install_github(hadley / reshape) melt.array 有不同的定义,你可以使用参数 as.is 以避免转换:

 

会给出与上面相同的结果( Var1 etc而不是 X1 等)。


I have a array and I want to melt it based on the dimnames. The problem is that the dimension names are large numeric values and therefore making them character would convert them to a wrong ID see the example:

test <- array(1:18, dim = c(3,3,2), dimnames = list(c(00901291282245454545454,329293929929292,2929992929922929),
                                                   c("a", "b", "c"),
                                                   c("d", "e")))

library(reshape2)
library(data.table)
test2 <- data.table(melt(test))
test2[, Var1 := as.character(Var1)]

> test2
Var1 Var2 Var3 value
1: 9.01291282245455e+20    a    d     1
2:      329293929929292    a    d     2
3:     2929992929922929    a    d     3
4: 9.01291282245455e+20    b    d     4
5:      329293929929292    b    d     5
6:     2929992929922929    b    d     6
7: 9.01291282245455e+20    c    d     7
8:      329293929929292    c    d     8
9:     2929992929922929    c    d     9
10: 9.01291282245455e+20    a    e    10
11:      329293929929292    a    e    11
12:     2929992929922929    a    e    12
13: 9.01291282245455e+20    b    e    13
14:      329293929929292    b    e    14
15:     2929992929922929    b    e    15
16: 9.01291282245455e+20    c    e    16
17:      329293929929292    c    e    17
18:     2929992929922929    c    e    18

How could I make the first column with the large IDs character? What I am currently doing is pasting a character letter to the dimnames and then melt, making it a character and then take a substring, which is really inefficient. It is important that it is an efficient solution because the dataset is millions of rows. There are two problems,first the 0's are deleted if they are in front of the ID and it is converted to a e+20 character.

解决方案

You need to define your dimnames as character and then slighly modify melt.array which is called when you do melt on your array:

test <- array(1:18, dim = c(3,3,2), dimnames = list(c("00901291282245454545454", "329293929929292", "2929992929922929"),
                                                    c("a", "b", "c"),
                                                    c("d", "e")))

Customise melt.array to add a parameter which permits to decide wether you want the conversion or not:

melt.array2 <- function (data, varnames = names(dimnames(data)), conv=TRUE, ...) 
{
    values <- as.vector(data)
    dn <- dimnames(data)
    if (is.null(dn)) 
        dn <- vector("list", length(dim(data)))
    dn_missing <- sapply(dn, is.null)
    dn[dn_missing] <- lapply(dim(data), function(x) 1:x)[dn_missing]
    if(conv){ # conv is the new parameter to know if conversion needs to be done
        char <- sapply(dn, is.character)
        dn[char] <- lapply(dn[char], type.convert)
    }
    indices <- do.call(expand.grid, dn)
    names(indices) <- varnames
    data.frame(indices, value = values)
}

Try the new function on your example (with conv=FALSE):

head(melt.array2(test, conv=FALSE))
                        # X1 X2 X3 value
# 1  00901291282245454545454  a  d     1
# 2          329293929929292  a  d     2
# 3         2929992929922929  a  d     3
# 4  00901291282245454545454  b  d     4
# 5          329293929929292  b  d     5
# 6         2929992929922929  b  d     6

EDIT

In the development version of reshape2 (devtools::install_github("hadley/reshape"), melt.array is differently defined and you can use parameter as.is to avoid the conversion:

melt(test, as.is=TRUE)

will give you the same result as above (with Var1 etc instead of X1 etc).

这篇关于熔化数组并使数值字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆