为R中的dataframe中的每行数据创建哈希值 [英] create hash value for each row of data in dataframe in R

查看:520
本文介绍了为R中的dataframe中的每行数据创建哈希值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索如何在R中更有效地比较两个数据框架,我想出了散列。



我的计划是为每行数据创建散列包含相同列的两个数据框,使用 digest 包中的 digest



我尝试使用以下代码为每行数据指定唯一的散列:

 为(以loop.ssi(1:nrow(ssi.10q3.v1)))
{ssi.10q3.v1 [loop.ssi,散列]< ; - 消化(as.character(ssi.10q3.v1 [loop.ssi,))
打印(粘贴(loop.ssi,nrow(ssi.10q3.v1)09月=/))
flush.console()
}

p>

我的方法比较dataframe是否正确?如果是,任何建议加速上面的代码?非常感谢。



UPDATE



我已更新下列代码:

  ssi.10q3.v1 [,UID<  -  1:nrow(ssi.10q3.v1)

ssi.10q3 .v1.hash< - ddply(ssi.10q3.v1,
C(UID),
功能(DF)
{DF [UID]< - NULL
hash< - digest(as.character(df))
data.frame(hash = hash)
},
.progress =text)

我自生成一个 uid

解决方案

如果我得到你想要的东西,digest可以直接使用apply:

  library(digest)
ssi.10q3.v1.hash< - data.frame(uid = 1:nrow(ssi.10q3.v1), hash = apply(ssi.10q3.v1,1,digest))


I am exploring how to compare two dataframe in R more efficiently, and I come up with hash.

My plan is to create hash for each row of data in two dataframe with same columns, using digest in digest package, and I suppose hash should be the same for any 2 identical row of data.

I tried to give and unique hash for each row of data, using the code below:

for (loop.ssi in (1:nrow(ssi.10q3.v1)))
    {ssi.10q3.v1[loop.ssi,"hash"] <- digest(as.character(ssi.10q3.v1[loop.ssi,]))
     print(paste(loop.ssi,nrow(ssi.10q3.v1),sep="/"))
     flush.console()
    }

But this is very slow.

Is my approach in comparing dataframe correct? If yes, any suggestion for speeding up the code above? Thanks.

UPDATE

I have updated the code as below:

ssi.10q3.v1[,"uid"] <- 1:nrow(ssi.10q3.v1)   

ssi.10q3.v1.hash <- ddply(ssi.10q3.v1,
                          c("uid"),
                          function(df)
                             {df[,"uid"]<- NULL
                              hash <- digest(as.character(df))
                              data.frame(hash=hash)
                             },
                          .progress="text")     

I self-generated a uid column for the "unique" purpose.

解决方案

If I get what you want properly, digest will work directly with apply:

library(digest)
ssi.10q3.v1.hash <- data.frame(uid = 1:nrow(ssi.10q3.v1), hash = apply(ssi.10q3.v1, 1, digest))

这篇关于为R中的dataframe中的每行数据创建哈希值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆