用不列出扩展R中的网格并应用 [英] Expand grid in R with unlist and apply

查看:30
本文介绍了用不列出扩展R中的网格并应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望使用R的 expand.grid 来全面枚举和研究用于层次聚类分析的选项.我有一个最终函数 acc ,它将获取一个矩阵并对其进行分析,以进行性能测量,如准确性,精度,F1等,并返回一个命名列表(具有准确性,F1等):最终输出我正在寻找一个表格,其中列出了所有超参数组合,并在其旁边的列中列出了不同的性能指标(准确性,F1等).

I am looking to use R's expand.grid to comprehensively enumerate and investigate options for hierarchical clustering analysis. I have a final function acc which will take a matrix and analyse it for performance measures like accuracy, precision, F1 etc., returning a named list (with accuracy, F1, etc.): the ultimate output I'm looking for is a table where all the hyperparameter combinations are listed and, in columns next to them, the different performance measures (accuracy, F1,...).

例如,可以通过以下方式设置组合表

The table of combinations can be set up for example with

hyperparams =  expand.grid(meths=c("ward.D","ward.D2","single","complete","average","mcquitty","median","centroid"), dists=c("euclidean", "maximum", "manhattan", "canberra", "binary","minkowski"))

接下来,我们将与已知标签进行比较,并获得准确度,并包装了许多功能,为简洁起见,我尝试省略了这些功能(例如 cutree ):

Next we would compare to known labels and get the accuracy, wrapping in a number of functions, which I've tried to omit for brevity (like cutree):

t1 = table(df$Group, hclust(dist(df[-1],method="euclidean"), method="complete"))
Res1 = acc(t1)

目标是在我的 dists method dist method 参数我的 meths 中列出的参数中的 hclust 参数.在最后一行,回想一下,我写了 acc ,它将使用一个矩阵并输出一个精确度,精度,F1的命名列表,...我希望每个列在最终表,其行是 hyperparams 中的超参数组合.

The goal is to vary the method argument for dist across those listed in my dists, and the method argument for hclust across those listed in my meths. In the final line, recall that I've written acc, which will take a matrix and output a named list of accuracy, precision, F1,... which I'd like each on a column of a final table, whose rows are the hyperparameter combinations in hyperparams.

现在,我的第一个问题是,我不确定如何以覆盖上述所有选项的方式使用 unlist .我很确定这是正确的功能,但不确定如何做到这一点.而且我还想创建一个没有循环的表 ,即使用Apply或类似的东西(我猜是沿着 hyperparams ?...的行应用),因为我知道这样的解决方案通常在R中更好.

Now, my first issue is, I'm not sure how to use unlist in a way that will cover all the options above. I'm pretty sure it's the right function but just not sure how to do it. And I also want to create the table without a for-loop, i.e. using apply or something like that (I guess applying along the rows of hyperparams?...), since I know such solutions are generally better in R.

如建议的那样,最终所需的输出实际上是 hyperparams ,但作为带有附加列的数据帧,第三列包含精度,第四列包含精度,等等(在我的函数 acc ).谁能告诉我如何到达那里?

As suggested, the final desired output would be, effectively, hyperparams but as a data-frame with additional columns, the third column containing accuracy, fourth containing precision, etc (the measures listed out in my function acc). Can anyone inform me how to get there?

如果您想为 acc 玩一些游戏,我们可以使用

If you want something to play with for acc, we could use

first = sum(x)
second = sum(x^2)
return(list(First=first,Second=second))

和最终输出表将是两个超参数列,后跟一个 First 列(最终混淆矩阵中元素的总和,对应于该行的超参数组合)和第二(最终混淆矩阵中的元素^ 2之和).如果您想使用给定的功能,这只是一个假设的例子.

and the final output table would be the two hyperparameter columns followed by a column for First (sum of elements in the final confusion matrix, for the hyperparameter combo corresponding to that row) and Second (sum of elements^2 in the final confusion matrix). Just a hypothetical example in case you like to work with given functions.

我真的更喜欢基数R中的解决方案!(如果绝对必要,也可以使用dplyr)

I'd really prefer solutions in base R! (Or dplyr if absolutely necessary)

编辑:好,很多人都要求输入 df .让我们使用 iris ,但是,当然,如果我们要输出,就无法避免使用某些中间函数,例如 cutree .

OK, many people are asking for a df. Let's use iris, but of course if we want output we can't avoid some of the intermediate functions, like cutree.

现在使用 iris ,您可以运行

contingtab1 = table(iris$Species, cutree(hclust(dist(iris[,1:4],method="euclidean"),method="complete"),3))

给出一个列联表.将其传递给 acc 将给出所需输出的一行(对应于 euclidean complete 的行.所需输出将类似于 hyperparams ,其中两个当前列中的每一个都紧随其后(例如)再增加两个列, acc 中我的两个性能指标中的一个.

That gives a contingency table. Passing this into acc would give one row of the desired output (the row corresponding to euclidean and complete. The desired output would then look like hyperparams with each of the two current columns followed by (say) two more columns, one for each of my two performance measures in acc.

推荐答案

一种方法可能是 purrr

library(purrr)
map2(hyperparams$meths, hyperparams$dists,
     ~ acc(hclust(dist(df[-1],method = .x), method = .y)))

这篇关于用不列出扩展R中的网格并应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆