ValueError:设置具有序列的数组元素.从csv读取的数据 [英] ValueError: setting an array element with a sequence. on data read from csv
问题描述
我试图按行从csv加载数据,然后从每行创建2d数组并将其存储在数组中:
I am trying to load data from csv by row, then create 2d array out of each row and store it inside array:
加载:
with open('data_more.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
解析:
def getTrainingData():
label_data = []
for i in range( 0 , len(data) - 1):
y = list(data[i][1:41:1])
y = list(map(lambda x: list(map(lambda z: int(z),x)),y))
y = create2Darray(y)
label_data.append(y)
labelY = np.array(label_data,dtype=float)
create2Darray函数:
create2Darray func:
def create2Darray( arr ):
final_arr = []
index = 0
while( index < len(arr)):
temp = arr[index:index+4:1]
final_arr.append(temp)
index+=4
return final_arr
这是简单的任务,但我一直收到错误提示:
This is simple task, yet i keep recieving erro:
ValueError:设置具有序列的数组元素.
ValueError: setting an array element with a sequence.
我已经读到它与元素形状不相同时的情况有关.但是,当我打印labelY内所有元素的形状时,它会输出相同的形状.
I have read that its related to situation when the shape of elements isnt same. However when i print shape of all elements inside labelY it outputs same shape.
那是什么引起了这个问题呢?该行出现问题
What is causing this problem then? The problem occurs on this line
labelY = np.array(label_data,dtype=float)
我的csv具有格式
number, number, number
行中基本上是N个数字,以,"分隔,示例 感谢您的帮助.
basicly N numbers in row separated by "," example thanks for help.
推荐答案
让我们从头开始:
-
您似乎想遍历文件的每一行以创建一个数组.迭代应该在
range(0, len(data))
之上,而不是在range(0, len(data) - 1)
之上:范围的最后一个元素是 exclusive ,因此您当前正在跳过最后一行.实际上,您可以简单地编写range(len(data))
,或者甚至可以编写更多类似Python的代码
You seem to want to iterate through every line of your file to create an array. The iteration should be over
range(0, len(data))
, notrange(0, len(data) - 1)
: the last element of the range is exclusive, so you are currently skipping the last line. In fact, you can write simplyrange(len(data))
, or what is even more Pythonic, do
for y in data:
y = y[1:41]
基于后续内容,您希望y
的40个元素从第二个元素开始.在这种情况下,y[1:41]
是正确的(您不需要结尾的:1
).如果您不打算跳过第一个元素,请使用y[0:40]
,或者更广泛地使用y[:40]
.请记住,索引是从零开始的,停止索引是排他性的.
Based on what comes later, you want the 40 elements of y
starting with the second element. In that case y[1:41]
is correct (you don't need the trailing :1
). If you didn't mean to skip the first element, use y[0:40]
, or more Pythonically y[:40]
. Remember that the indexing is zero-based and the stop index is exclusive.
y
列表中的每个元素都是不是数字.这是一个字符串,您可以从文件中读取.通常,您可以使用
Each element of your y
list is not a number. It is a string, which you read from a file. Normally, you could convert it to a list of numbers using
y = [float(x) for x in y]
OR
y = list(map(float, y))
您的代码将为每个元素创建一个嵌套列表,并按数字对其进行拆分.这真的是您想要的吗?从其余问题来看,肯定不是那样.
Your code is instead creating a nested list for each element, splitting it by its digits. Is this really what you intend? It certainly does not seem that way from the rest of the question.
create2Darray
似乎期望一个4n
数字列表,并将其分成大小为n-by-4
的2D列表.如果您现在想继续使用纯Python,则可以使用range
:
create2Darray
seems to expect a list of 4n
numbers, and break it into a 2D list of size n-by-4
. If you want to keep using pure Python at this point, you can shorten the code using range
:
def create2Darray(arr):
return [arr[i:i + 4] for i in range(0, len(arr), 4)]
label_data.append(y)
将2D操作的结果附加到3D列表中.当前,由于数字拆分,label_data
是第4维参差不齐的4D列表.这样添加列表是非常低效的.最好有一个小的函数在for
循环的主体中包含语句,并在列表理解中使用它.int
时为什么要dtype=np.float
,但这是您要找出的原因.getTrainingData
添加返回值!
label_data.append(y)
. Currently, because of the digit splitting, label_data
is a 4D list with a ragged 4th dimension. It is pretty inefficient to append a list that way. You would do much better to have a small function containing the statements in the body of your for
loop, and use that in a list comprehension.dtype=np.float
when you explicitly converted everything to an int
, but that is for you to figure out.getTrainingData
!TL; DR
您真正要做的最简单的事情是在将文件转换为2D numpy数组后进行所有转换.您的程序可以改写为
The simplest thing you can really do though, is to do all the transformations after you convert the file to a 2D numpy array. Your program could be rewritten as
with open('data_more.csv', newline='') as file:
reader = csv.reader(file)
data = [float(x) for x in line[1:] for line in reader]
data = np.array(data).reshape(data.shape[0], -1, 4)
这篇关于ValueError:设置具有序列的数组元素.从csv读取的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!