在循环中堆叠numpy数组的最快方法是什么? [英] What is the fastest way to stack numpy arrays in a loop?

查看:117
本文介绍了在循环中堆叠numpy数组的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码可以在for循环中生成两个numpy数组(data_transform).在第一个循环中,生成一个(40, 2)的numpy数组,在第二个循环中,生成一个(175, 2).我想将这两个数组合并为一个,以给我一个(215, 2)数组.我尝试使用np.concatenatenp.append,但是由于数组的大小必须相同,这给了我一个错误.这是我如何执行代码的示例:

I have a code that generates me within a for loop two numpy arrays (data_transform). In the first loop generates a numpy array of (40, 2) and in the second loop one of (175, 2). I want to concatenate these two arrays into one, to give me an array of (215, 2). I tried with np.concatenate and with np.append, but it gives me an error since the arrays must be the same size. Here is an example of how I am doing the code:

result_arr = np.array([])

for label in labels_set:
    data = [index for index, value in enumerate(labels_list) if value == label]
    for i in data:
        sub_corpus.append(corpus[i])
    data_sub_tfidf = vec.fit_transform(sub_corpus) 
    data_transform = pca.fit_transform(data_sub_tfidf) 
    #Append array
    sub_corpus = []

我也使用过np.row_stack,但是没有别的给我一个(175, 2)值,这是我要连接的第二个数组.

I have also used np.row_stack but nothing else gives me a value of (175, 2) which is the second array I want to concatenate.

推荐答案

@hpaulj想要说的话

What @hpaulj was trying to say with

在执行循环时附上列表附加项.

Stick with list append when doing loops.

#use a normal list
result_arr = []

for label in labels_set:

    data_transform = pca.fit_transform(data_sub_tfidf) 

    # append the data_transform object to that list
    # Note: this is not np.append(), which is slow here
    result_arr.append(data_transform)

# and stack it after the loop
# This prevents slow memory allocation in the loop. 
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.

result_arr = np.concatenate(result_arr)

# or 
result_arr = np.stack(result_arr, axis=0)

# or
result_arr = np.vstack(result_arr)

您的数组实际上没有不同的尺寸.它们具有一个不同的维度,另一个则是相同的.在这种情况下,您始终可以沿不同"维度堆叠.

Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.

这篇关于在循环中堆叠numpy数组的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆