随机投影算法伪代码 [英] Random projection algorithm pseudo code

查看:40
本文介绍了随机投影算法伪代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在非常稀疏的数据集上应用随机投影方法.我找到了有关 Johnson Lindenstrauss 方法的论文和教程,但每一篇都充满了方程式,对我来说没有任何有意义的解释.例如,Johnson-Lindenstrauss

不幸的是,从这个文档中,我无法了解算法的实现步骤.这是一个很长的镜头,但有没有人可以告诉我算法的简单英文版本或非常简单的伪代码?或者我可以从哪里开始挖掘这个方程?有什么建议?

例如,通过阅读这篇关于约翰逊的论文,我从算法中了解到了什么-Lindenstrauss 是:

  1. 假设我们有一个 AxB 矩阵,其中 A 是样本数,B 是维度数,例如100x5000.我想将它的维度减少到 500,这将产生一个 100x500 矩阵.

据我所知:首先,我需要构造一个 100x500 矩阵并用 +1-1 随机填充条目(有 50% 的概率).


好吧,我想我开始明白了.所以我们有一个矩阵A,它是mxn.我们想把它简化为 E,也就是 mxk.

我们需要做的是,构造一个nxk维的矩阵R,并用0填充-1+1,相对于 2/31/61/6概率.

在构建完这个R之后,我们将简单地做一个矩阵乘法AxR来找到我们的缩减矩阵E.但是我们不需要做全矩阵乘法,因为如果Ri的一个元素是0,我们就不需要做计算了.只需跳过它.但是如果我们面对1,我们只需添加列,或者如果是-1,只需从计算中减去它.所以我们将简单地使用求和而不是乘法来找到 E.这就是使这种方法非常快速的原因.

结果是一个非常简洁的算法,虽然我觉得太愚蠢了,无法理解.

解决方案

高维数据 A 到低维数据 E 的映射在后一篇论文中定理 1.1 的陈述中给出——它只是一个标量乘法然后是矩阵乘法.数据向量是矩阵 A 和 E 的行.正如作者在 7.1 节中指出的那样,您不需要使用完整的矩阵乘法算法.

I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss

Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?

For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:

  1. Assume we have a AxB matrix where A is number of samples and B is the number of dimensions, e.g. 100x5000. And I want to reduce the dimension of it to 500, which will produce a 100x500 matrix.

As far as I understand: first, I need to construct a 100x500 matrix and fill the entries randomly with +1 and -1 (with a 50% probability).

Edit:
Okay, I think I started to get it. So we have a matrix A which is mxn. We want to reduce it to E which is mxk.

What we need to do is, to construct a matrix R which has nxk dimension, and fill it with 0, -1 or +1, with respect to 2/3, 1/6 and 1/6 probability.

After constructing this R, we'll simply do a matrix multiplication AxR to find our reduced matrix E. But we don't need to do a full matrix multiplication, because if an element of Ri is 0, we don't need to do calculation. Simply skip it. But if we face with 1, we just add the column, or if it's -1, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E. And that is what makes this method very fast.

It turned out a very neat algorithm, although I feel too stupid to get the idea.

解决方案

The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.

这篇关于随机投影算法伪代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆