如何取消混洗数据? [英] How to un-shuffle data?

查看:53
本文介绍了如何取消混洗数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能存在从sklearn.utils函数shuffle返回的方法?我更好地解释了我的问题:我使用 shuffle 函数来随机化两个矩阵的行:

it may exist a method to coming back from the function shuffle from sklearn.utils? I explain better my problem: I use the shuffle function to randomize the rows of two matrices:

A_s, B_s = shuffle(A, B, random_state = 1)

接下来我在某些操作中使用两个矩阵 A_s, B_s 并获得另一个具有相同维度的矩阵 C_s:例如C_s = f(A_s, B_s).如何回到C的原始顺序为AB?

Next I use both matrices A_s, B_s in some operation and I obtain an other matrix C_s with the same dimension: e.g. C_s = f(A_s, B_s). How to come back to the original order of C as A and B?

我正在考虑类似于 sklearn.preprocessing.MinMaxScaler((0,+1)) 的事情,然后我回来使用 sklearn.inverse_transform().

I'm thinking something similar to sklearn.preprocessing.MinMaxScaler((0,+1)) and after I come back using sklearn.inverse_transform().

推荐答案

这不一定是可能的,这取决于您选择的 f.如果 f 是可逆的,并且您跟踪行被打乱的方式,那么即使效率不高,也是可能的.sklearn.utils shuffle 方法不会跟踪"矩阵打乱的方式.你可能想自己动手.要生成随机洗牌,请生成 range(len(A)) 的随机排列,然后按该顺序迭代交换行.要检索原始矩阵,您只需反转排列即可.这将允许您为 f 的某些选择(例如矩阵加法)

It will not necessarily be possible, depending on your choice of f. If f is invertible, and you keep track of the manner in which the rows were shuffled, it will be possible, if not efficient. The sklearn.utils shuffle method does NOT "keep track" of the manner in which the matrix was shuffled. You may want to roll your own. To generate a random shuffle, generate a random permutation of range(len(A)), then iteratively swap the rows in that order. To retrieve the original matrices, you can just reverse the permutation. This would allow you to recover C for certain choices of f (e.g. matrix addition)

(编辑,OP 要求附加信息)

(EDIT, OP requested additional info)

这对我有用,但可能有更有效的方法:

This works for me, but there's probably a more efficient way to do it:

import numpy as np

def shuffle(A,axis=0,permutation=None):
    A = np.swapaxes(A,0,axis)
    if permutation is None:
        permutation = np.random.permutation(len(A))
    temp = np.copy(A[permutation[0]])
    for i in range(len(A)-1):
        A[permutation[i]] = A[permutation[i+1]]
    A[permutation[-1]] = temp
    A = np.swapaxes(A,0,axis)
    return A, permutation

A = np.array([[1,2],[3,4],[5,6],[7,8]])
print A
B, p = shuffle(A) #NOTE: shuffle is in place, so A is the same object as B!!!!
print "shuffle A"
print B
D, _ = shuffle(B,permutation=p[::-1])
print "unshuffle B to get A"
print D

B = np.copy(B)
C = A+B
print "A+B"
print C

A_s, p = shuffle(A)
B_s, _ = shuffle(B, permutation = p)
C_s = A_s + B_s

print "shuffle A and B, then add"
print C_s

print "unshuffle that to get the original sum"
CC, _ = shuffle(C_s, permutation=p[::-1])
print CC

这篇关于如何取消混洗数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆