距离对象R的CSV [英] CSV of Distances to Dist Object R

查看：168 发布时间：2017/2/26 15:26:00 r csv matrix

本文介绍了距离对象R的CSV的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

可能重复：

将数据帧转换为类dist的对象。没有实际计算R中的距离

我有一个非常大的csv文件一个for循环花费的时间太长，在关键字之间的相似性，当我读入一个data.frame看起来像：

I have a very large csv file (so a for loop takes too long in R) of similarities between keywords that when I read into a data.frame looks like:

> df   
kwd1 kwd2 similarity  
a  b  1  
b  a  1  
c  a  2  
a  c  2

我想将其转换为dist对象，如下所示：

I would like to convert this to a dist object, like this:

> dObject  
  a b  
b 1    
c 2 0

这个工作：
将数据帧转换为类dist的对象没有实际计算R中的距离

另一个想法是使用Matrix（）创建一个稀疏矩阵，但我不确定如何有效地填充矩阵因为我的csv相当大 - 也许是一个apply函数？

Another idea I had was to create a sparse matrix using Matrix(), but I am unsure how to populate the matrix efficiently because my csv is fairly large - maybe an apply function?

也许reshape（）？

Maybe reshape()?

--- - 更新----
这似乎在上面的玩具数据集上工作：
http://stats.stackexchange.com/questions/6827/efficient-way-to-populate-matrix-in-r

---- Update ---- This seems to work on the toy dataset above: http://stats.stackexchange.com/questions/6827/efficient-way-to-populate-matrix-in-r

然而，在这个例子中，他们使用一个matrix（），但我想使用稀疏的内存原因的Matrix（）。

However, in this example, they use a matrix(), but I would like to use Matrix() that is sparse for memory reasons.

- 另外----
有一个类似的帖子从前。但是，我不认为它的建议适用于这种情况下，它们不是数据集中的每个元素之间的链接 - csv不包含所有关键字之间的成对相似性，如在上一篇文章：
将数据帧转换为类dist的对象而不是实际计算R中的距离

--- Furthermore ---- There is a similar posting from before. However, I don't think that the advice from it works for this case in which their isn't a link between every element in the dataset - the csv doesn't contain the pairwise similarities between all keywords as in the previous post: Convert a dataframe to an object of class "dist" without actually calculating distances in R

推荐答案

尝试此

# Generate some dummy data (since you didn't provide your data)
df <- data.frame(V1=sample(letters, 10, TRUE),
                 V2=sample(letters, 10, TRUE),
                 V3=sample(200, 10, TRUE))

$ b b

df $ V1 和 df $ V2 现在是可能具有不同级别的因素，所以我们需要使他们相当，例如请确保 V1 中的a 与a V2 。

df$V1 and df$V2 are now factors, possibly with different levels, so we need to make them comparable, e.g. make sure "a" in V1 is the same as "a" in V2.

# Convert letters to integers
my.objects <- sort(unique(c(as.character(df$V1), as.character(df$V2))))
df$V1 <- match(df$V1, my.objects)
df$V2 <- match(df$V2, my.objects)

创建一个空距离矩阵，并在 V3 中的 V1 code> V2 。最后，我们将它转换为一个合适的 dist 对象。

Create an empty distance matrix and populate it with the values in V3 at the locations specified by V1 and V2. Finally we convert it to a proper dist object.

# Create an empty distance matrix
n <- length(my.objects)
dist.mat <- matrix(NA, n, n)
i <- as.matrix(df[-3])
dist.mat[i] <- dist.mat[i[,2:1]] <- df$V3

my.dist <- as.dist(dist.mat)

这篇关于距离对象R的CSV的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

距离对象R的CSV [英] CSV of Distances to Dist Object R

问题描述

推荐答案

相关文章

Office最新文章

热门教程

热门工具

登录关闭

距离对象R的CSV [英] CSV of Distances to Dist Object R

问题描述

推荐答案

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

登录关闭