ValueError:设置具有序列的数组元素.从csv读取的数据 [英] ValueError: setting an array element with a sequence. on data read from csv

查看:137
本文介绍了ValueError:设置具有序列的数组元素.从csv读取的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图按行从csv加载数据,然后从每行创建2d数组并将其存储在数组中:

I am trying to load data from csv by row, then create 2d array out of each row and store it inside array:

加载:

with open('data_more.csv', newline='') as csvfile:
    data = list(csv.reader(csvfile))

解析:

def getTrainingData():
    label_data = []
    for i in range( 0 , len(data) - 1):
        y = list(data[i][1:41:1])
        y = list(map(lambda x: list(map(lambda z: int(z),x)),y))
        y = create2Darray(y)
        label_data.append(y)
    labelY = np.array(label_data,dtype=float)

create2Darray函数:

create2Darray func:

def create2Darray( arr ):
    final_arr = []
    index = 0
    while( index < len(arr)):
        temp = arr[index:index+4:1]
        final_arr.append(temp)
        index+=4
    return final_arr

这是简单的任务,但我一直收到错误提示:

This is simple task, yet i keep recieving erro:

ValueError:设置具有序列的数组元素.

ValueError: setting an array element with a sequence.

我已经读到它与元素形状不相同时的情况有关.但是,当我打印labelY内所有元素的形状时,它会输出相同的形状.

I have read that its related to situation when the shape of elements isnt same. However when i print shape of all elements inside labelY it outputs same shape.

那是什么引起了这个问题呢?该行出现问题

What is causing this problem then? The problem occurs on this line

labelY = np.array(label_data,dtype=float)

我的csv具有格式

number, number, number

行中基本上是N个数字,以,"分隔,示例 感谢您的帮助.

basicly N numbers in row separated by "," example thanks for help.

推荐答案

让我们从头开始:

  1. 您似乎想遍历文件的每一行以创建一个数组.迭代应该在range(0, len(data))之上,而不是在range(0, len(data) - 1)之上:范围的最后一个元素是 exclusive ,因此您当前正在跳过最后一行.实际上,您可以简单地编写range(len(data)),或者甚至可以编写更多类似Python的代码

  1. You seem to want to iterate through every line of your file to create an array. The iteration should be over range(0, len(data)), not range(0, len(data) - 1): the last element of the range is exclusive, so you are currently skipping the last line. In fact, you can write simply range(len(data)), or what is even more Pythonic, do

for y in data:
    y = y[1:41]

  • 基于后续内容,您希望y的40个元素从第二个元素开始.在这种情况下,y[1:41]是正确的(您不需要结尾的:1).如果您不打算跳过第一个元素,请使用y[0:40],或者更广泛地使用y[:40].请记住,索引是从零开始的,停止索引是排他性的.

  • Based on what comes later, you want the 40 elements of y starting with the second element. In that case y[1:41] is correct (you don't need the trailing :1). If you didn't mean to skip the first element, use y[0:40], or more Pythonically y[:40]. Remember that the indexing is zero-based and the stop index is exclusive.

    y列表中的每个元素都是不是数字.这是一个字符串,您可以从文件中读取.通常,您可以使用

    Each element of your y list is not a number. It is a string, which you read from a file. Normally, you could convert it to a list of numbers using

    y = [float(x) for x in y]
    

    OR

    y = list(map(float, y))
    

    您的代码将为每个元素创建一个嵌套列表,并按数字对其进行拆分.这真的是您想要的吗?从其余问题来看,肯定不是那样.

    Your code is instead creating a nested list for each element, splitting it by its digits. Is this really what you intend? It certainly does not seem that way from the rest of the question.

    create2Darray似乎期望一个4n数字列表,并将其分成大小为n-by-4的2D列表.如果您现在想继续使用纯Python,则可以使用range:

    create2Darray seems to expect a list of 4n numbers, and break it into a 2D list of size n-by-4. If you want to keep using pure Python at this point, you can shorten the code using range:

    def create2Darray(arr):
        return [arr[i:i + 4] for i in range(0, len(arr), 4)]
    

  • 使用label_data.append(y)将2D操作的结果附加到3D列表中.当前,由于数字拆分,label_data是第4维参差不齐的4D列表.这样添加列表是非常低效的.最好有一个小的函数在for循环的主体中包含语句,并在列表理解中使用它.
  • 最后,将4D数组(可能应该是3D)转换为numpy数组.此操作失败,因为您的数字并非都具有相同的数字.一旦您解决了第3步,该错误就会消失.仍然存在一个问题,当您将所有内容显式转换为int时为什么要dtype=np.float,但这是您要找出的原因.
  • 别忘了向getTrainingData添加返回值!
  • The result of the 2D operation is appended to a 3D list with label_data.append(y). Currently, because of the digit splitting, label_data is a 4D list with a ragged 4th dimension. It is pretty inefficient to append a list that way. You would do much better to have a small function containing the statements in the body of your for loop, and use that in a list comprehension.
  • Finally, you convert your 4D array (which should probably be 3D), into a numpy array. This operation fails because your numbers don't all have the same number of digits. Once you fix step #3, the error will go away. There still remains the question of why you want dtype=np.float when you explicitly converted everything to an int, but that is for you to figure out.
  • Don't forget to add a return value to getTrainingData!
  • TL; DR

    您真正要做的最简单的事情是在将文件转换为2D numpy数组后进行所有转换.您的程序可以改写为

    The simplest thing you can really do though, is to do all the transformations after you convert the file to a 2D numpy array. Your program could be rewritten as

    with open('data_more.csv', newline='') as file:
        reader = csv.reader(file)
        data = [float(x) for x in line[1:] for line in reader]
    data = np.array(data).reshape(data.shape[0], -1, 4)
    

    这篇关于ValueError:设置具有序列的数组元素.从csv读取的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    相关文章
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆