如何从张量流中的两个张量中创建一个频率张量 [英] how to create a frequency tensor out of two tensor in tensorflow

查看:35
本文介绍了如何从张量流中的两个张量中创建一个频率张量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的张量,其中值是频率,行是索引(0 to 6):

I have a tensor like this in which the values are the frequency and the rows are the index(0 to 6):

tf_docs = 
[[0, 2],
 [1, 2],
 [2, 1],
 [5, 0],
 [0, 1],
 [7, 8],
 [9, 6]]

我有一个常数张量,其中张量的值是索引:

I have a constant tensor, in which values of the tensor are the index:

tf_topics = tf.constant([[1 2]
                        [1 3]
                        [1 0]
                        [2 3]
                        [2 0]
                        [3 0]
                        [3 4]
                        [3 2]
                        [3 1]
                        [4 2]
                        [4 1]
                        [2 1]], shape=(12, 2), dtype=int32)

我需要在 tf_docs 中逐行检查这些索引,结果矩阵将是 tf_docs 中它们不为零的列数(在两个索引).

I need to check these indexes row-wise in tf_docs and the result matrix would be the number of columns in the tf_docs in which they are not zero (in both indexes).

例如,我们在tf_topics中有[1 2].这意味着检查 tf_docs 中行索引 12 中的值.在 tf_docs 中,第一列和第二列的值都不为零.这就是为什么 [1 2] 的频率是 2.

For example, We have [1 2] in the tf_topics. It means check the values in row index 1 and 2 in tf_docs. In tf_docs the first and second column both values are non-zero. thats why for [1 2] the frequency would be 2.

另一方面,[1,3]得到1作为频率.因为索引 3 的第二列中的值之一为零.

On the other hand, [1,3] get 1 as the frequency. Because one of the value in the second column of the index 3 is zero.

所以结果将是这样的张量(这显然是对称的).对角线将是每个 index 的频率总和:

So the result will be a tensor like this(This is obviously symmetrical). The diagonal will be the sum of frequency of each index:

[[2,   1, 1, 0, null],
 [1,   3, 2, 1, 1   ],
 [1,   2, 3, 1, 1   ],
 [0,   1, 1, 5, 0   ],
 [null,1, 1, 0, 1   ]]

到目前为止我所做的:

我决定在两个矩阵上使用 tf.gathertf.count_nonzero.因为我想拆分 topics 中的 index 并查看这些 indexes 是否共同出现在 tf_docs

I decided to use tf.gather and tf.count_nonzero over the two matrices. because I wanted to split the index in the topics and see if these indexes co-occurred in tf_docs

tf.math.count_nonzero(tf.gather(tf_docs, tf_topics, axis=0), axis=1)

虽然,这似乎并没有给我想要的结果.

Though, this seems does not give me the result that I want.

推荐答案

nonzero_tf_docs 定义为:

zero_tf_docs = tf.cast(tf.equal(tf_docs, tf.zeros_like(tf_docs)), tf.int32)
nonzero_tf_docs = 1 - tf.reduce_max(zero_tf_docs, axis=-1)

OP 要求计算 tf_topicsi, j 的总和 nonzero_tf_docs[i] + nonzero_tf_docs[j]> 并将结果显示在矩阵中.这可以通过以下方式实现:

The OP is asking to compute the sum nonzero_tf_docs[i] + nonzero_tf_docs[j] for each pair of indices i, j in tf_topics and display the result in a matrix. This can be achieved as follows:

def compute_result(tf_topics_, nonzero_tf_docs, tf_docs):
    # Find matrix lower part
    values = tf.reduce_sum(tf.gather(nonzero_tf_docs, tf_topics_), axis=-1)
    max_index = tf.reduce_max(tf_topics) + 1
    out_sparse = tf.sparse.SparseTensor(indices=tf_topics_, values=values, dense_shape=[max_index, max_index])
    out_sparse = tf.cast(out_sparse, dtype=tf.int32)
    out_sparse = tf.sparse.reorder(out_sparse)
    out_dense = tf.sparse.to_dense(out_sparse, default_value=-1)
    out_lower = tf.matrix_band_part(out_dense, -1, 0)

    # Compute diagonal
    diag_values = tf.reduce_sum(tf_docs, axis=-1)
    diag = tf.slice(diag_values,
                    begin=[0],
                    size=[max_index])

    # Construct output matrix
    out = out_lower + tf.transpose(out_lower)
    mask = tf.eye(max_index, dtype=tf.int32)
    out = (1 - mask) * out + mask * diag

    return out


# Find docs without zeros
zero_tf_docs = tf.cast(tf.equal(tf_docs, tf.zeros_like(tf_docs)), tf.int32)
nonzero_tf_docs = 1 - tf.reduce_max(zero_tf_docs, axis=-1)

# Transform counts into matrix format
tf_topics = tf.cast(tf_topics, dtype=tf.int64)
tf_topics_reversed = tf.reverse(tf_topics, [-1])
tf_topics_ = tf_topics_reversed
out_1 = compute_result(tf_topics_, nonzero_tf_docs, tf_docs)
out_2 = compute_result(tf_topics, nonzero_tf_docs, tf_docs)
out = tf.maximum(out_1, out_2)

with tf.Session() as sess:
    r = sess.run(out)
    print(r)  # prints [[ 2  1  1  0 -1]
              #         [ 1  3  2  1  1]
              #         [ 1  2  3  1  1]
              #         [ 0  1  1  5  0]
              #         [-1  1  1  0  1]]

这篇关于如何从张量流中的两个张量中创建一个频率张量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆