将csv值读入二维numpy数组的问题 [英] Problems with reading in csv values into 2d numpy array
问题描述
我在从已保存的csv文件读取值时遇到问题.这是我拥有的csv文件的一部分:
000000216739.jpg, 224, [ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14.
8. 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 0. 3. 0. 0. 0. 0. 0.
0. 0. 0. 0. 3. 1. 2. 0. 0. 0. 0. 0. 1. 0. 0. 1. 2. 0.
3. 0. 0. 0. 0. 0.],
[ 0. 0. 0. 0. 35. 33. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 36. ...]
(根据我拥有的csv文件格式化)
这是数据文件的图像:
问题是,我真的不确定如何分别读取每个逗号分隔的值.当我:
with open(CSVFilepath) as f:
reader = csv.reader(f,delimiter=',')
for row in reader:
print(row)
print(row[0])
print(row[1])
print(row[2])
它返回:
['000000216739.jpg', '224', '[ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14.']
000000216739.jpg
224
[ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14. ]
值224
实际上是图像000000216739.jpg
中的[]秒(行)数.
我要读取的是一个2d numpy形状的数组(224,60),其中所有图像的60都是固定的.
所以我想读的是: 例如图像123.jpg(所有形状均为一形(224 x 60):
[[ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14.
8. 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 0. 3. 0. 0. 0. 0. 0.
0. 0. 0. 0. 3. 1. 2. 0. 0. 0. 0. 0. 1. 0. 0. 1. 2. 0.
3. 0. 0. 0. 0. 0.],
... (more np arrays)...
[ 6. 0. 0. 35. 64. 0. 0. 0. 0. 0. 0. 0. 20. 11. 27. 23. 5. 0.
0. 0. 0. 0. 0. 0. 5. 0. 10. 1. 0. 0. 0. 0. 0. 0. 0. 0.
6. 2. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 2. 2. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0.]]
请问我该怎么办..?另外,该文件非常大,我需要一种有效读取文件的方法...非常感谢您的帮助!
您的文件不是正确的csv文件,因此您不应像csv文件那样阅读它.
csv文件中的换行符代表一个新行,但是很明显,在文件中它们并不意味着-您想读取[
和]
中的数字,但它们没有正确定界.>
剖析此文件的一种方法是
with open(file,'r') as fin:
f = fin.readlines()
f = ' '.join(f) # remove newlines
listrows = f.split('[')
listrows = [l.split(']')[0] for l in listrows] # Get string between '[' and ']'
matrix = [row.split('.') for row in listrows] # This is now a 2D matrix
final = [[int(e.replace(' ','')) for e in row] for row in matrix] # Here goes your final matrix
我广泛使用了列表理解功能,所以这不会达到30行.尝试运行此.
I am having issues reading in values from a saved csv file. This is part of the csv file that I have:
000000216739.jpg, 224, [ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14.
8. 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 0. 3. 0. 0. 0. 0. 0.
0. 0. 0. 0. 3. 1. 2. 0. 0. 0. 0. 0. 1. 0. 0. 1. 2. 0.
3. 0. 0. 0. 0. 0.],
[ 0. 0. 0. 0. 35. 33. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 36. ...]
(formatted according to the csv file that I have)
Here's the image of the datafile:
The problem is, I'm really not sure how to read each comma separate value separately. When I:
with open(CSVFilepath) as f:
reader = csv.reader(f,delimiter=',')
for row in reader:
print(row)
print(row[0])
print(row[1])
print(row[2])
it returns:
['000000216739.jpg', '224', '[ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14.']
000000216739.jpg
224
[ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14. ]
The value 224
is actually the number of [ ] s (rows) in image 000000216739.jpg
.
What I'm trying to read in is a 2d numpy array of shape (224,60) with the 60 being fixed for all images.
So what I'm trying to read in is: e.g. for image 123.jpg (everything in one array of shape (224 by 60):
[[ 0. 0. 0. 0. 36. 44. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 9. 14.
8. 0. 0. 0. 0. 0. 0. 0. 0. 0. 7. 0. 3. 0. 0. 0. 0. 0.
0. 0. 0. 0. 3. 1. 2. 0. 0. 0. 0. 0. 1. 0. 0. 1. 2. 0.
3. 0. 0. 0. 0. 0.],
... (more np arrays)...
[ 6. 0. 0. 35. 64. 0. 0. 0. 0. 0. 0. 0. 20. 11. 27. 23. 5. 0.
0. 0. 0. 0. 0. 0. 5. 0. 10. 1. 0. 0. 0. 0. 0. 0. 0. 0.
6. 2. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 2. 2. 1. 0. 0. 0.
0. 0. 0. 0. 0. 0.]]
May I ask what I should do..? Also, this file is quite large and I need a way to read it efficiently... Any help would be greatly appreciated, thanks!
Your file is not a correct csv file, and you should not read it like a csv file.
The linebreak in csv file represent a new row whereas it's apparent that in your file they don't mean that - you want to read the numbers inside [
and ]
but they're not delimited properly.
A way to dissect this file would be to
with open(file,'r') as fin:
f = fin.readlines()
f = ' '.join(f) # remove newlines
listrows = f.split('[')
listrows = [l.split(']')[0] for l in listrows] # Get string between '[' and ']'
matrix = [row.split('.') for row in listrows] # This is now a 2D matrix
final = [[int(e.replace(' ','')) for e in row] for row in matrix] # Here goes your final matrix
I used list comprehension extensively so this does not go to 30 lines. Try running this.
这篇关于将csv值读入二维numpy数组的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!