将csv值读入二维numpy数组的问题 [英] Problems with reading in csv values into 2d numpy array

查看:377
本文介绍了将csv值读入二维numpy数组的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在从已保存的csv文件读取值时遇到问题.这是我拥有的csv文件的一部分:

000000216739.jpg, 224, [ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14. 
8.  0.  0.  0.  0.  0.  0.  0.  0.  0.  7.  0.  3.  0.  0.  0.  0.  0.  
0.  0.  0.  0.  3.  1.  2.  0.  0.  0.  0.  0.  1.  0.  0.  1.  2.  0.  
3.  0.  0.  0.  0.  0.], 
[ 0.  0.  0.  0. 35. 33.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 36.  ...]

(根据我拥有的csv文件格式化)

这是数据文件的图像:

问题是,我真的不确定如何分别读取每个逗号分隔的值.当我:

with open(CSVFilepath) as f:
    reader = csv.reader(f,delimiter=',')

    for row in reader:
        print(row)
        print(row[0])
        print(row[1])
        print(row[2])

它返回:

['000000216739.jpg', '224', '[ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14.']  
000000216739.jpg  
224   
[ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14.   ]

224实际上是图像000000216739.jpg中的[]秒(行)数. 我要读取的是一个2d numpy形状的数组(224,60),其中所有图像的60都是固定的.

所以我想读的是: 例如图像123.jpg(所有形状均为一形(224 x 60):

[[ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14.
  8.  0.  0.  0.  0.  0.  0.  0.  0.  0.  7.  0.  3.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  3.  1.  2.  0.  0.  0.  0.  0.  1.  0.  0.  1.  2.  0.
  3.  0.  0.  0.  0.  0.],  
...  (more np arrays)...  
[ 6.  0.  0. 35. 64.  0.  0.  0.  0.  0.  0.  0. 20. 11. 27. 23.  5.  0.
  0.  0.  0.  0.  0.  0.  5.  0. 10.  1.  0.  0.  0.  0.  0.  0.  0.  0.
  6.  2.  3.  0.  0.  0.  0.  0.  0.  0.  0.  0.  2.  2.  1.  0.  0.  0.
  0.  0.  0.  0.  0.  0.]]

请问我该怎么办..?另外,该文件非常大,我需要一种有效读取文件的方法...非常感谢您的帮助!

解决方案

您的文件不是正确的csv文件,因此您不应像csv文件那样阅读它.

csv文件中的换行符代表一个新行,但是很明显,在文件中它们并不意味着-您想读取[]中的数字,但它们没有正确定界.

剖析此文件的一种方法是

with open(file,'r') as fin:
    f = fin.readlines()
    f = ' '.join(f) # remove newlines
    listrows = f.split('[')
    listrows = [l.split(']')[0] for l in listrows] # Get string between '[' and ']'
    matrix = [row.split('.') for row in listrows] # This is now a 2D matrix
    final = [[int(e.replace(' ','')) for e in row] for row in matrix] # Here goes your final matrix

我广泛使用了列表理解功能,所以这不会达到30行.尝试运行此.

I am having issues reading in values from a saved csv file. This is part of the csv file that I have:

000000216739.jpg, 224, [ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14. 
8.  0.  0.  0.  0.  0.  0.  0.  0.  0.  7.  0.  3.  0.  0.  0.  0.  0.  
0.  0.  0.  0.  3.  1.  2.  0.  0.  0.  0.  0.  1.  0.  0.  1.  2.  0.  
3.  0.  0.  0.  0.  0.], 
[ 0.  0.  0.  0. 35. 33.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 36.  ...]

(formatted according to the csv file that I have)

Here's the image of the datafile:

The problem is, I'm really not sure how to read each comma separate value separately. When I:

with open(CSVFilepath) as f:
    reader = csv.reader(f,delimiter=',')

    for row in reader:
        print(row)
        print(row[0])
        print(row[1])
        print(row[2])

it returns:

['000000216739.jpg', '224', '[ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14.']  
000000216739.jpg  
224   
[ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14.   ]

The value 224 is actually the number of [ ] s (rows) in image 000000216739.jpg. What I'm trying to read in is a 2d numpy array of shape (224,60) with the 60 being fixed for all images.

So what I'm trying to read in is: e.g. for image 123.jpg (everything in one array of shape (224 by 60):

[[ 0.  0.  0.  0. 36. 44.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  9. 14.
  8.  0.  0.  0.  0.  0.  0.  0.  0.  0.  7.  0.  3.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  3.  1.  2.  0.  0.  0.  0.  0.  1.  0.  0.  1.  2.  0.
  3.  0.  0.  0.  0.  0.],  
...  (more np arrays)...  
[ 6.  0.  0. 35. 64.  0.  0.  0.  0.  0.  0.  0. 20. 11. 27. 23.  5.  0.
  0.  0.  0.  0.  0.  0.  5.  0. 10.  1.  0.  0.  0.  0.  0.  0.  0.  0.
  6.  2.  3.  0.  0.  0.  0.  0.  0.  0.  0.  0.  2.  2.  1.  0.  0.  0.
  0.  0.  0.  0.  0.  0.]]

May I ask what I should do..? Also, this file is quite large and I need a way to read it efficiently... Any help would be greatly appreciated, thanks!

解决方案

Your file is not a correct csv file, and you should not read it like a csv file.

The linebreak in csv file represent a new row whereas it's apparent that in your file they don't mean that - you want to read the numbers inside [ and ] but they're not delimited properly.

A way to dissect this file would be to

with open(file,'r') as fin:
    f = fin.readlines()
    f = ' '.join(f) # remove newlines
    listrows = f.split('[')
    listrows = [l.split(']')[0] for l in listrows] # Get string between '[' and ']'
    matrix = [row.split('.') for row in listrows] # This is now a 2D matrix
    final = [[int(e.replace(' ','')) for e in row] for row in matrix] # Here goes your final matrix

I used list comprehension extensively so this does not go to 30 lines. Try running this.

这篇关于将csv值读入二维numpy数组的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆