如何有效地计算两个字典的内积 [英] How to efficiently compute the inner product of two dictionaries

查看：119 发布时间：2020/4/30 12:07:16 python linear-algebra

本文介绍了如何有效地计算两个字典的内积的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我使用字典表示特征向量(为什么?因为我知道特征稀疏，但是稍后会更多).

Suppose I represent a feature vector using a dictionary (why? because I know the features are sparse, but, more on that later).

我应该如何实现两个这样的字典(表示为A，B)的内积

How should I implement the inner product of two such dictionaries (denoted, A, B)

我尝试过幼稚的方法:

for k in A:
  if k in B:
    sum += A[k] * B[k]

但事实证明它很慢.

更多详细信息:

我之所以使用字典来表示要素，是因为

I'm using a dictionary to represent features because

功能键是字符串
大约有20K个密钥
每个向量都是稀疏的(例如，约有1000个非零元素).

我真的很想计算N = 2000个不同字典(即它们的线性核)的成对内积.

I'm really interested in computing the pairwise inner product of N=2000 different dictionaries (that is, their linear kernel).

推荐答案

这是我的答案(根据@ valentin-clement的建议):

Here's my answer (Following on a suggestion by @valentin-clement):

首先，我包装了一个scipy.sparse dok_matrix. 这个想法是给每个可能的特征分配一个索引.

First I wrap a scipy.sparse dok_matrix. The idea is to assign each of the possible features an index.

import scipy.sparse as sps
import numpy as np

class MSK:
    # DD is a dict of dict, whose values are of type float.
    # features - the set of possible features keys
    def __init__(self, DD, features):
        self.features = {k: j for (j, k) in enumerate(features)}
        self.strings = DD.keys()
        n = len(self.strings)
        d = len(self.features)
        self.M = sps.dok_matrix((n, d), dtype=np.float64)
        for (i, s) in enumerate(self.strings):
            v = DD[s]
            for k in v:
                j = self.features[k]
                self.M[i, j] = v[k]

我们使用以下代码进行测试，其中元素数为800，维数也为800，但稀疏度为200(恰好200个元素为非零).

And we test using the following code, where the number of elements is 800, the dimensionality is also 800, but the sparsity is 200 (exactly 200 elements are non-zero).

np.random.seed(1)
N = 800
DD = dict()
R = range(N)
for i in xrange(N):
    DD[i] = dict()
    S = np.random.permutation(R)
    S = S[:N/4]
    for j in S:
        DD[i][j] = np.random.randn(1)[0]

K = MSK(DD, R)
import cProfile
cProfile.runctx("A = K.M * K.M.T", globals(), locals())
print A.todense()

输出为:

2080520 function calls (2080519 primitive calls) in 2.884 seconds

让我们说3秒. 我的幼稚实现(使用@Joowani的if语句)大约需要19秒.

Lets say 3 seconds. My naive implementation (that uses @Joowani's if-statement) takes about 19 seconds.

(MSK代表MatrixSparseKeys)

(MSK stands for MatrixSparseKeys)

这篇关于如何有效地计算两个字典的内积的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何有效地计算两个字典的内积 [英] How to efficiently compute the inner product of two dictionaries

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何有效地计算两个字典的内积 [英] How to efficiently compute the inner product of two dictionaries

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭