产生“随机"消息.固定元素集上一定等级的矩阵 [英] Generate "random" matrix of certain rank over a fixed set of elements

查看:77
本文介绍了产生“随机"消息.固定元素集上一定等级的矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成尺寸为m x n且秩为r的矩阵,其中元素来自指定的有限集,例如{0,1}{1,2,3,4,5}.我希望它们从某种意义上说是随机"的,即我想从算法中获得各种可能的输出,其分布与在指定等级的那组元素上所有矩阵的分布大致相似.

实际上,我实际上并不在乎它是否具有等级r,只是它与等级r的矩阵 close (按Frobenius范数衡量).

当手头的集合是实数时,我一直在进行以下操作,这完全可以满足我的需求:生成大小为m x r且大小为n的矩阵U x r,具有独立于元素的样本,例如普通(0,2).那么U V'是排名为rm x n矩阵(嗯,<= r,但我认为它是r的可能性很高).

如果我只是这样做,然后四舍五入为二进制/1-5,则排名会增加.

也可以通过执行SVD并获取第一个r奇异值来获得矩阵的较低秩近似.但是,这些值不会位于所需的集合中,对它们进行四舍五入将再次提高排名.

此问题是相关的,但已被接受答案不是随机",另一个答案建议使用SVD,如此处所述,它不起作用.

我想到的一种可能性是从集合中生成r线性独立的行或列向量,然后通过线性组合来获取矩阵的其余部分.不过,我不清楚如何获得线性独立的随机"向量,或者之后如何以准随机的方式组合它们.

(不是超级相关,但我正在用numpy进行此操作.)


更新:我已经通过以下简单的方法尝试了EMS在评论中建议的方法:

real = np.dot(np.random.normal(0, 1, (10, 3)), np.random.normal(0, 1, (3, 10)))
bin = (real > .5).astype(int)
rank = np.linalg.matrix_rank(bin)
niter = 0

while rank > des_rank:
    cand_changes = np.zeros((21, 5))
    for n in range(20):
        i, j = random.randrange(5), random.randrange(5)
        v = 1 - bin[i,j]
        x = bin.copy()
        x[i, j] = v
        x_rank = np.linalg.matrix_rank(x)
        cand_changes[n,:] = (i, j, v, x_rank, max((rank + 1e-4) - x_rank, 0))
    cand_changes[-1,:] = (0, 0, bin[0,0], rank, 1e-4)

    cdf = np.cumsum(cand_changes[:,-1])
    cdf /= cdf[-1]
    i, j, v, rank, score = cand_changes[np.searchsorted(cdf, random.random()), :]
    bin[i, j] = v
    niter += 1
    if niter % 1000 == 0:
        print(niter, rank)

它适用于小型矩阵,但对于例如10x10-似乎至少停留在数十万次迭代中,排名为6或7.

使用更好(即不太平坦)的目标函数似乎可以更好地工作,但是我不知道那会是什么.


我还尝试了一种简单的拒绝方法来建立矩阵:

def fill_matrix(m, n, r, vals):
    assert m >= r and n >= r
    trans = False
    if m > n: # more columns than rows I think is better
        m, n = n, m
        trans = True

    get_vec = lambda: np.array([random.choice(vals) for i in range(n)])

    vecs = []
    n_rejects = 0

    # fill in r linearly independent rows
    while len(vecs) < r:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            vecs.append(v)
        else:
            n_rejects += 1
    print("have {} independent ({} rejects)".format(r, n_rejects))

    # fill in the rest of the dependent rows
    while len(vecs) < m:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            n_rejects += 1
            if n_rejects % 1000 == 0:
                print(n_rejects)
        else:
            vecs.append(v)
    print("done ({} total rejects)".format(n_rejects))

    m = np.vstack(vecs)
    return m.T if trans else m

例如,这行得通具有任意等级的10x10二进制矩阵,但不适用于0-4矩阵或具有较低等级的大得多的二进制. (例如,获得20x20等级15的二进制矩阵导致我遭到42,000次拒绝;对于20x20等级10的拒绝,我花费了120万.)

这显然是因为前r行所跨越的空间太小了我要从中采样的空间的一部分,例如{0,1}^10,在这种情况下.

我们希望前r行的跨度与有效值集相交. 因此,我们可以尝试从跨度采样并寻找有效值,但是由于跨度涉及到实数值系数,因此永远不会找到我们有效的向量(即使我们进行归一化处理,例如第一个分量位于有效集中). /p>

也许可以将其表达为整数编程问题,还是其他?

解决方案

这样怎么样?

rank = 30
n1 = 100; n2 = 100
from sklearn.decomposition import NMF
model = NMF(n_components=rank, init='random', random_state=0)
U = model.fit_transform(np.random.randint(1, 5, size=(n1, n2)))
V = model.components_
M = np.around(U) @ np.around(V)

I'd like to generate matrices of size mxn and rank r, with elements coming from a specified finite set, e.g. {0,1} or {1,2,3,4,5}. I want them to be "random" in some very loose sense of that word, i.e. I want to get a variety of possible outputs from the algorithm with distribution vaguely similar to the distribution of all matrices over that set of elements with the specified rank.

In fact, I don't actually care that it has rank r, just that it's close to a matrix of rank r (measured by the Frobenius norm).

When the set at hand is the reals, I've been doing the following, which is perfectly adequate for my needs: generate matrices U of size mxr and V of nxr, with elements independently sampled from e.g. Normal(0, 2). Then U V' is an mxn matrix of rank r (well, <= r, but I think it's r with high probability).

If I just do that and then round to binary / 1-5, though, the rank increases.

It's also possible to get a lower-rank approximation to a matrix by doing an SVD and taking the first r singular values. Those values, though, won't lie in the desired set, and rounding them will again increase the rank.

This question is related, but accepted answer isn't "random," and the other answer suggests SVD, which doesn't work here as noted.

One possibility I've thought of is to make r linearly independent row or column vectors from the set and then get the rest of the matrix by linear combinations of those. I'm not really clear, though, either on how to get "random" linearly independent vectors, or how to combine them in a quasirandom way after that.

(Not that it's super-relevant, but I'm doing this in numpy.)


Update: I've tried the approach suggested by EMS in the comments, with this simple implementation:

real = np.dot(np.random.normal(0, 1, (10, 3)), np.random.normal(0, 1, (3, 10)))
bin = (real > .5).astype(int)
rank = np.linalg.matrix_rank(bin)
niter = 0

while rank > des_rank:
    cand_changes = np.zeros((21, 5))
    for n in range(20):
        i, j = random.randrange(5), random.randrange(5)
        v = 1 - bin[i,j]
        x = bin.copy()
        x[i, j] = v
        x_rank = np.linalg.matrix_rank(x)
        cand_changes[n,:] = (i, j, v, x_rank, max((rank + 1e-4) - x_rank, 0))
    cand_changes[-1,:] = (0, 0, bin[0,0], rank, 1e-4)

    cdf = np.cumsum(cand_changes[:,-1])
    cdf /= cdf[-1]
    i, j, v, rank, score = cand_changes[np.searchsorted(cdf, random.random()), :]
    bin[i, j] = v
    niter += 1
    if niter % 1000 == 0:
        print(niter, rank)

It works quickly for small matrices but falls apart for e.g. 10x10 -- it seems to get stuck at rank 6 or 7, at least for hundreds of thousands of iterations.

It seems like this might work better with a better (ie less-flat) objective function, but I don't know what that would be.


I've also tried a simple rejection method for building up the matrix:

def fill_matrix(m, n, r, vals):
    assert m >= r and n >= r
    trans = False
    if m > n: # more columns than rows I think is better
        m, n = n, m
        trans = True

    get_vec = lambda: np.array([random.choice(vals) for i in range(n)])

    vecs = []
    n_rejects = 0

    # fill in r linearly independent rows
    while len(vecs) < r:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            vecs.append(v)
        else:
            n_rejects += 1
    print("have {} independent ({} rejects)".format(r, n_rejects))

    # fill in the rest of the dependent rows
    while len(vecs) < m:
        v = get_vec()
        if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
            n_rejects += 1
            if n_rejects % 1000 == 0:
                print(n_rejects)
        else:
            vecs.append(v)
    print("done ({} total rejects)".format(n_rejects))

    m = np.vstack(vecs)
    return m.T if trans else m

This works okay for e.g. 10x10 binary matrices with any rank, but not for 0-4 matrices or much larger binaries with lower rank. (For example, getting a 20x20 binary matrix of rank 15 took me 42,000 rejections; with 20x20 of rank 10, it took 1.2 million.)

This is clearly because the space spanned by the first r rows is too small a portion of the space I'm sampling from, e.g. {0,1}^10, in these cases.

We want the intersection of the span of the first r rows with the set of valid values. So we could try sampling from the span and looking for valid values, but since the span involves real-valued coefficients that's never going to find us valid vectors (even if we normalize so that e.g. the first component is in the valid set).

Maybe this can be formulated as an integer programming problem, or something?

解决方案

How about like this?

rank = 30
n1 = 100; n2 = 100
from sklearn.decomposition import NMF
model = NMF(n_components=rank, init='random', random_state=0)
U = model.fit_transform(np.random.randint(1, 5, size=(n1, n2)))
V = model.components_
M = np.around(U) @ np.around(V)

这篇关于产生“随机"消息.固定元素集上一定等级的矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆