有没有更好的办法来确定numpy的阵列交映indicies [英] Is there a better way to determine cross-mapping indicies for numpy arrays

查看:146
本文介绍了有没有更好的办法来确定numpy的阵列交映indicies的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要numpy的并和交业务交叉映射indicies。在code下面我有工作正常,但我想向量化它之前,我把它应用到大型数据集。或者,如果有更好的,内置的,然后一路又是什么?

 #-------定义阵列和设置操作---------
A = np.array(['一','B','C','E','F','G','H','J'])
B = np.array(['H','我','J','K','M'])
C = np.union1d(A,B)
D = np.intersect1d(A,B)#-------得到映射indicies为联盟----
ZC = np.empty((LEN(C)中,3日))
ZC [:] = np.nan
ZC [:,0] =范围(0,LEN(C))的
对于在范围(0,LEN(C))IY:
    在第九范围(0,LEN(A)):
        如果A [九] == C [IY]:
            ZC [IY,1] =九
    有效范围内的九(0,LEN(B)):
        在B [九] == C [IY]:
            ZC [IY,2] =九#-------得到映射indicies的路口----
ZD = np.empty((LEN(D),3))
ZD [:] = np.nan
ZD [:,0] =范围(0,LEN(D))
对于在范围IY(0,LEN(D)):
    在第九范围(0,LEN(A)):
        如果A [九] == D [IY]:
            ZD [IY,1] =九
    有效范围内的九(0,LEN(B)):
        在B [九] == D [IY]:
            ZD [IY,2] =九


解决方案

有关这样的情况下,你可能希望将字符串转换成数字,作为与他们一起工作更为高效。此外,鉴于输出是数字数组,它更有意义让他们成为数字ID前期。使用现在,这种转换为数字ID,我看到人们的λ其他的方法之一,但我会的 np.unique ,这是非常有效的样病例这些。下面是先从数字ID转换的实现 -

 #------------------------安装工作----------- --------------------
_,= IDX1 np.unique(np.append(A,B),return_inverse = TRUE)
A_ID = IDX1 [:A.size]
B_ID = IDX1 [A.size:]#------------------------联盟工作----------------------- --------
#获取ZC的长度,这将是ID + 1的最大
lenC = idx1.max()+1#初始化输出数组ZC和NaN的填充。
ZC1 = np.empty((lenC,3,))
ZC1 [:] = np.nan#填写第一列具有连续数从0开始
ZC1 [:,0] =范围(0,lenC)#最重要的code的一部分:
#从A和B所指定的ID分别的地方设置COLS-1,2
#与从0到相应的ID的程度的值
ZC1 [A_ID,1] = np.arange(A_ID.size)
ZC1 [B_ID,2] = np.arange(B_ID.size)#------------------------路口工作----------------------- --------
#得到相交的A和B之间的索引
intersect_ID = np.argwhere(A_ID [:,无] == B_ID)#基于interesects数初始化输出ZD
借给= intersect_ID.shape [0]
ZD1 = np.empty((出借,3,))
ZD1 [:] = np.nan#填写第一列具有连续数从0开始
ZD1 [:0] =范围(0,出借)
ZD1 [:,1:] = intersect_ID

I need the cross-mapped indicies for numpy union and intersection operations. The code I have below works fine, but I would like to vectorize it before I apply it to large data sets. Or, if there is a better, built in, way then what is it?

# ------- define the arrays and set operations ---------
A = np.array(['a','b','c','e','f','g','h','j'])
B = np.array(['h','i','j','k','m'])
C = np.union1d(A, B)
D = np.intersect1d(A,B)

# ------- get the mapped indicies for the union ----
zc = np.empty((len(C),3,))
zc[:]=np.nan
zc[:,0] = range(0,len(C))
for iy in range(0,len(C)):
    for ix in range(0, len(A)):
        if A[ix] == C[iy]:
            zc[iy,1] = ix
    for ix in range(0, len(B)):
        if B[ix] == C[iy]:
            zc[iy,2] = ix

# ------- get the mapped indicies for the intersection ----
zd = np.empty((len(D),3,))
zd[:]=np.nan
zd[:,0] = range(0,len(D))
for iy in range(0,len(D)):
    for ix in range(0, len(A)):
        if A[ix] == D[iy]:
            zd[iy,1] = ix
    for ix in range(0, len(B)):
        if B[ix] == D[iy]:
            zd[iy,2] = ix

解决方案

For cases like these, you might want to convert the strings into numerals, as working with them is far more efficient. Also, given the fact that the outputs are numeric arrays, it makes more sense to have them as numeric IDs upfront. Now, for this conversion to numeric IDs, I have seen people using lambda among other approaches, but I would go with np.unique, which is quite efficient for cases like these. Here's the implementation starting with the numeric ID conversion -

# ------------------------ Setup work -------------------------------
_,idx1 = np.unique(np.append(A,B),return_inverse=True)
A_ID = idx1[:A.size]
B_ID = idx1[A.size:]

# ------------------------ Union work -------------------------------
# Get length of zc, which would be the max of ID+1.
lenC = idx1.max()+1

# Initialize output array zc and fill with NaNs.
zc1 = np.empty((lenC,3,))
zc1[:]=np.nan

# Fill first column with consecutive numbers starting with 0
zc1[:,0] = range(0,lenC)

# Most important part of the code :
# Set the cols-1,2 at places specified by IDs from A and B respectively
# with values from 0 to the extent of the respective IDs
zc1[A_ID,1] = np.arange(A_ID.size)
zc1[B_ID,2] = np.arange(B_ID.size)

# ------------------------ Intersection work -------------------------------
# Get intersecting indices between A and B
intersect_ID = np.argwhere(A_ID[:,None] == B_ID)

# Initialize output zd based on the number of interesects
lenD = intersect_ID.shape[0]
zd1 = np.empty((lenD,3,))
zd1[:] = np.nan

# Fill first column with consecutive numbers starting with 0
zd1[:,0] = range(0,lenD)
zd1[:,1:] = intersect_ID

这篇关于有没有更好的办法来确定numpy的阵列交映indicies的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆