为R中的dataframe中的每行数据创建哈希值 [英] create hash value for each row of data in dataframe in R

查看：520 发布时间：2017/3/15 22:49:36 database r hash

本文介绍了为R中的dataframe中的每行数据创建哈希值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在探索如何在R中更有效地比较两个数据框架，我想出了散列。

我的计划是为每行数据创建散列包含相同列的两个数据框，使用 digest 包中的 digest

我尝试使用以下代码为每行数据指定唯一的散列：

 为（以loop.ssi（1：nrow（ssi.10q3.v1）））
 {ssi.10q3.v1 [loop.ssi，散列]< ;  - 消化（as.character（ssi.10q3.v1 [loop.ssi，））
打印（粘贴（loop.ssi，nrow（ssi.10q3.v1）09月=/））
 flush.console（）
}

我的方法比较dataframe是否正确？如果是，任何建议加速上面的代码？非常感谢。

UPDATE

我已更新下列代码：

  ssi.10q3.v1 [，UID<  -  1：nrow（ssi.10q3.v1）
 
 ssi.10q3 .v1.hash<  -  ddply（ssi.10q3.v1，
C（UID），
功能（DF）
 {DF [UID]<  -  NULL 
 hash<  -  digest（as.character（df））
 data.frame（hash = hash）
}，
 .progress =text）

我自生成一个 uid 。

解决方案

如果我得到你想要的东西，digest可以直接使用apply：

  library（digest）
 ssi.10q3.v1.hash<  -  data.frame（uid = 1：nrow（ssi.10q3.v1）， hash = apply（ssi.10q3.v1，1，digest））

I am exploring how to compare two dataframe in R more efficiently, and I come up with hash.

My plan is to create hash for each row of data in two dataframe with same columns, using digest in digest package, and I suppose hash should be the same for any 2 identical row of data.

I tried to give and unique hash for each row of data, using the code below:

for (loop.ssi in (1:nrow(ssi.10q3.v1)))
    {ssi.10q3.v1[loop.ssi,"hash"] <- digest(as.character(ssi.10q3.v1[loop.ssi,]))
     print(paste(loop.ssi,nrow(ssi.10q3.v1),sep="/"))
     flush.console()
    }

But this is very slow.

Is my approach in comparing dataframe correct? If yes, any suggestion for speeding up the code above? Thanks.

UPDATE

I have updated the code as below:

ssi.10q3.v1[,"uid"] <- 1:nrow(ssi.10q3.v1)   

ssi.10q3.v1.hash <- ddply(ssi.10q3.v1,
                          c("uid"),
                          function(df)
                             {df[,"uid"]<- NULL
                              hash <- digest(as.character(df))
                              data.frame(hash=hash)
                             },
                          .progress="text")

I self-generated a uid column for the "unique" purpose.

解决方案

If I get what you want properly, digest will work directly with apply:

library(digest)
ssi.10q3.v1.hash <- data.frame(uid = 1:nrow(ssi.10q3.v1), hash = apply(ssi.10q3.v1, 1, digest))

这篇关于为R中的dataframe中的每行数据创建哈希值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为R中的dataframe中的每行数据创建哈希值 [英] create hash value for each row of data in dataframe in R

问题描述

相关文章

其他数据库最新文章

热门教程

热门工具

登录关闭

为R中的dataframe中的每行数据创建哈希值 [英] create hash value for each row of data in dataframe in R

问题描述

相关文章

其他数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭