将任意类的列转换为另一个data.table中匹配列的类 [英] Convert columns of arbitrary class to the class of matching columns in another data.table

查看:125
本文介绍了将任意类的列转换为另一个data.table中匹配列的类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:



我在R的工作。我想要共享的2列data.tables )具有匹配类。我一直在努力将一个未知类对象转换为另一个对象的未知类。






strong>更多上下文:



我知道如何设置data.table中的列的类,我知道 > 作为函数。另外,这个问题不是完全 data.table 具体的,但是当我使用 data.table s。此外,假设期望的强制是可能的。



我有2个data.tables。它们共享一些列名称,这些列旨在表示相同的信息。对于表A和表B共享的列名,我希望A的类匹配B中的类(或其他方式)。






示例 data.table s:

  A <结构(list(year = c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,2L, 2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,3L,3L,3L,3L,3L,3L,3L, 3L,3L,3L,3L),层= c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,1L,2L,3L,4L ,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L ,15L)),.Names = c(year,stratum),row.names = c(NA,-45L),class = c(data.table,data.frame))

B< - 结构(列表(年份= c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3, 3L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,1L,2L,3L,4L ,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L,15L,1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,12L,13L,14L ,15L),BT = C(-9.95187702337873,-9.48946944434626,-9.74178662514147,-5.36167545158338,-4.76405522202426,-5.41964239804882,-0.0807951335119085,0.520481719699774,0.0393874225863578,5.40557402913123,5.47927931969583,5.37228402911139,9.82774396910091,9.89629694010177,9.98105260936272,-9.82469892896284,-9.42530210357904 ,-9.66171049964775,-5.17540952901709,-4.81859082470115,-5.3577146169737,-0.0685310909609001,0.441383303157166,-0.0105897444321987,5.24205882775199,5.65773605162835,5.40217185632441,9.90299445851434,9.78883672575814,9.98747998379124,-9.69843398105195,-9.31530717395811,-9.77406601252698,-4.83080164375344,-4.89056304189872,-5.3904000267275 ,-0.121508487954861,0.493798577602088,-0.118550709142654,5.23654772583187,5.87760447006892,5.22478092346285,9.90949768116403,9.85433376398086,9.91619307289277),yr = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1 1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26, 3,3,3,3,3,3,3,3,3)).Names = c(year,stratum,bt,yr),row.names = c(NA,-45L ),class = c(data.table,data.frame),sorted = c(year,stratum))

这是他们的样子:

  A 
年层数
1:1 1
2:1 2
3:1 3
4:1 4

> B
年层次bt yr
1:1 1 -9.95187702 1
2:1 2 -9.48946944 1
3:1 3 -9.74178663 1
4:1 4 -5.36167545 1

这里是类:

 > sapply(A,class)
year stratum
integerinteger

> sapply(B,class)
year stratum bt yr
numericintegernumericnumeric

手动,我可以通过以下方式完成所需的任务:

  = as.numeric(year)] 

这是很容易的,当只有一列要更改,你知道提前的那一列,并且提前知道所需的类。如果需要,将任意列转换为给定类也很容易。我也知道如何将任意列转换为任何给定的类。






我的失败尝试



(EDIT:This actual works; see my answer)

  s2c < x,type =list)
{
as.call(lapply(c(type,x),as.symbol))
}

这种情况下,我可以假设A的所有列都可以在B
#中找到我也可以假设所需的转换是可能的
B.class < - sapply(B [,eval(s2c (name(A)))],class)
for(col in names(A)){
set(A,j = col,value = as(A [[col]],但是这仍然返回年份列作为<$($ [$ class [col]))
}


$ b c $ c>integer,而不是numeric

 > sapply(A,class)
year stratum
integerinteger

上面例子中的问题是类(as(1L,numeric))仍然返回integer。另一方面, class(as.numeric(1L))返回numeric;但是,我提前不知道需要 as.numeric






问题,重述:



如何使列类匹配,提前知道 c> 中的 / >




其他想法



方式,问题主要是关于任意类匹配。我经常遇到这个问题与data.table,因为它非常有声的类匹配。例如,当需要插入适当类型的 NA 时,遇到类似的问题( NA_real _ vs NA_character _ 等),这取决于列的类别(参见

同样,这个问题可以通过这个问题可以被看作是在事先不知道的任意类之间转换的一般问题。在过去,我使用 switch 编写函数来执行类似 switch(class(x),double = as.numeric ),character = as.character(...),... ,但是这似乎是一个很丑陋的唯一原因,我把它放在data.table的上下文中是因为它

不太优雅,但你可以建立 c $ c> as。* 调用如下:

  ){A [,x] <-eval(call(paste0(as。,class(B [,x])),A [,x]))


Question:

I'm working in R. I want the shared columns of 2 data.tables (shared meaning same column name) to have matching classes. I'm struggling with a way to generically convert an object of unknown class to the unknown class of another object.


More context:

I know how to set the class of a column in a data.table, and I know about the as function. Also, this question isn't entirely data.table specific, but it comes up often when I use data.tables. Further, assume that the desired coercion is possible.

I have 2 data.tables. They share some column names, and those columns are intended to represent the same information. For the column names shared by table A and table B, I want the classes of A to match those in B (or other way around).


Example data.tables:

A <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names = c(NA, -45L), class = c("data.table", "data.frame"))

B <- structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), bt = c(-9.95187702337873, -9.48946944434626, -9.74178662514147, -5.36167545158338, -4.76405522202426, -5.41964239804882, -0.0807951335119085, 0.520481719699774, 0.0393874225863578, 5.40557402913123, 5.47927931969583, 5.37228402911139, 9.82774396910091, 9.89629694010177, 9.98105260936272, -9.82469892896284, -9.42530210357904, -9.66171049964775, -5.17540952901709, -4.81859082470115, -5.3577146169737, -0.0685310909609001, 0.441383303157166, -0.0105897444321987, 5.24205882775199, 5.65773605162835, 5.40217185632441, 9.90299445851434, 9.78883672575814, 9.98747998379124, -9.69843398105195, -9.31530717395811, -9.77406601252698, -4.83080164375344, -4.89056304189872, -5.3904000267275, -0.121508487954861, 0.493798577602088, -0.118550709142654, 5.23654772583187, 5.87760447006892, 5.22478092346285, 9.90949768116403, 9.85433376398086, 9.91619307289277), yr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum", "bt", "yr"), row.names = c(NA, -45L), class = c("data.table", "data.frame"), sorted = c("year", "stratum"))

Here's what they look like:

> A  
    year stratum
 1:    1       1
 2:    1       2
 3:    1       3
 4:    1       4

> B
    year stratum          bt yr
 1:    1       1 -9.95187702  1
 2:    1       2 -9.48946944  1
 3:    1       3 -9.74178663  1
 4:    1       4 -5.36167545  1

Here are the classes:

> sapply(A, class)
     year   stratum 
"integer" "integer"

> sapply(B, class)
     year   stratum        bt        yr 
"numeric" "integer" "numeric" "numeric"

Manually, I can accomplish the desired task through the following:

A[,year:=as.numeric(year)]

This is easy when there's only 1 column to change, you know that column ahead of time, and you know the desired class ahead of time. If desired, it's also pretty easy to to convert arbitrary columns to a given class. I also know how to convert arbitrary columns to any given class.


My Failed Attempt:

(EDIT: This actually works; see my answer)

s2c <- function (x, type = "list") 
{
    as.call(lapply(c(type, x), as.symbol))
}

# In this case, I can assume all columns of A can be found in B
# I am also able to assume that the desired conversion is possible
B.class <- sapply(B[,eval(s2c(names(A)))], class) 
for(col in names(A)){
    set(A, j=col, value=as(A[[col]], B.class[col]))
}

But this still returns the year column as "integer", not "numeric":

> sapply(A, class)
     year   stratum 
"integer" "integer" 

The problem in the above example is that class(as(1L, "numeric")) still returns "integer". On the other hand, class(as.numeric(1L)) returns "numeric"; however, I don't know ahead of time that need as.numeric is needed.


Question, Restated:

How do I make the column classes match, when neither columns nor the to/from classes are known ahead of time?


Additional Thoughts:

In a way, the question is mostly about arbitrary class matching. I run into this issue often with data.table because it's very vocal about class matching. E.g., I run into similar problems when needed to insert NA of the appropriate type (NA_real_ vs NA_character_, etc), depending on the class of the column (see related question/ issue in This Question).

Again, this question can be seen as a general issue of converting between arbitrary classes that aren't known in advance. In the past, I've written functions using switch to do something like switch(class(x), double = as.numeric(...), character = as.character(...), ..., but that seems a big ugly. The only reason I'm bringing this up in the context of data.table is because it's where I most often encounter the need for this type of functionality.

解决方案

Not very elegant but you may 'build' the as.* call like this:

for (x in colnames(A)) { A[,x] <- eval( call( paste0("as.", class(B[,x])), A[,x]) )}

这篇关于将任意类的列转换为另一个data.table中匹配列的类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆