距离对象R的CSV [英] CSV of Distances to Dist Object R

查看:168
本文介绍了距离对象R的CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能重复:

将数据帧转换为类dist的对象。没有实际计算R中的距离

我有一个非常大的csv文件一个for循环花费的时间太长,在关键字之间的相似性,当我读入一个data.frame看起来像:

I have a very large csv file (so a for loop takes too long in R) of similarities between keywords that when I read into a data.frame looks like:

> df   
kwd1 kwd2 similarity  
a  b  1  
b  a  1  
c  a  2  
a  c  2 

我想将其转换为dist对象,如下所示:

I would like to convert this to a dist object, like this:

> dObject  
  a b  
b 1    
c 2 0

这个工作:
将数据帧转换为类dist的对象没有实际计算R中的距离

另一个想法是使用Matrix()创建一个稀疏矩阵,但我不确定如何有效地填充矩阵因为我的csv相当大 - 也许是一个apply函数?

Another idea I had was to create a sparse matrix using Matrix(), but I am unsure how to populate the matrix efficiently because my csv is fairly large - maybe an apply function?

也许reshape()?

Maybe reshape()?

--- - 更新----
这似乎在上面的玩具数据集上工作:
http://stats.stackexchange.com/questions/6827/efficient-way-to-populate-matrix-in-r

---- Update ---- This seems to work on the toy dataset above: http://stats.stackexchange.com/questions/6827/efficient-way-to-populate-matrix-in-r

然而,在这个例子中,他们使用一个matrix(),但我想使用稀疏的内存原因的Matrix()。

However, in this example, they use a matrix(), but I would like to use Matrix() that is sparse for memory reasons.

- 另外----
有一个类似的帖子从前。但是,我不认为它的建议适用于这种情况下,它们不是数据集中的每个元素之间的链接 - csv不包含所有关键字之间的成对相似性,如在上一篇文章:
将数据帧转换为类dist的对象而不是实际计算R中的距离

--- Furthermore ---- There is a similar posting from before. However, I don't think that the advice from it works for this case in which their isn't a link between every element in the dataset - the csv doesn't contain the pairwise similarities between all keywords as in the previous post: Convert a dataframe to an object of class "dist" without actually calculating distances in R

推荐答案

尝试此

# Generate some dummy data (since you didn't provide your data)
df <- data.frame(V1=sample(letters, 10, TRUE),
                 V2=sample(letters, 10, TRUE),
                 V3=sample(200, 10, TRUE))

$ b b

df $ V1 df $ V2 现在是可能具有不同级别的因素,所以我们需要使他们相当,例如请确保 V1 中的a a V2

df$V1 and df$V2 are now factors, possibly with different levels, so we need to make them comparable, e.g. make sure "a" in V1 is the same as "a" in V2.

# Convert letters to integers
my.objects <- sort(unique(c(as.character(df$V1), as.character(df$V2))))
df$V1 <- match(df$V1, my.objects)
df$V2 <- match(df$V2, my.objects)

创建一个空距离矩阵,并在 V3 中的 V1 code> V2 。最后,我们将它转​​换为一个合适的 dist 对象。

Create an empty distance matrix and populate it with the values in V3 at the locations specified by V1 and V2. Finally we convert it to a proper dist object.

# Create an empty distance matrix
n <- length(my.objects)
dist.mat <- matrix(NA, n, n)
i <- as.matrix(df[-3])
dist.mat[i] <- dist.mat[i[,2:1]] <- df$V3

my.dist <- as.dist(dist.mat)

这篇关于距离对象R的CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆