在Python中从CSV文件访问列数据 [英] Accessing column data from a CSV file in Python

查看:226
本文介绍了在Python中从CSV文件访问列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含A,B,C,D列和N行的CSV文件。
问题是这些列中的数据不是相同的长度,即一些是4.5,一些是4.52。

I have a CSV file with A, B, C, D columns and N rows. The problems is that the data in these columns is not of the same length i.e some are 4.5 and some are 4.52.

我的问题分为两部分:

如何从csv文件访问这些列。
我使用此代码打印csv文件的内容并将其读入数组

How do i access these columns from the csv files. I've used this code to print the contents of the csv file and to read them into an array

    import csv
    with open('file.csv','rb') as f:
        reader = csv.reader(f)
        for row in reader:
            print row

打印CSV文件中的行,替换

to print the rows in the CSV file and i replaced

    print row 


$ b

with

    z = row
    z.append(z)

将数据保存到数组中。

to save the data into an array.

但是z是一个1-D数组。数据类型为字符串。当我尝试执行类型np.median(z)的操作,它给我一个错误。此外,我不能做

But z is a 1-D array. And the data is of type string. When i try performing operations of the type np.median(z), it gives me an error. Also, i cannot do

    z.append(float(z))

这是给我一个TypeError。

This is giving me a TypeError.

并且,是否有截断值,并将它们设置为一定的精度,而我们从csv文件导入它们?喜欢,如果文件有值4.3,4.56,4.299,...,我想约束我最后导入只有一个小数点。

And, is there anyway to truncate the values and set them to a certain precision while we are importing them from the csv file?! Like, if the file has values like 4.3, 4.56, 4.299, ..., i want to constrain what i finally import to just one decimal point.

这个SE问题是最接近回答我的第二个问题 - Python - CSV:大文件与不同长度的行 - 但我不明白它。如果你能帮助我,我会感激。

This SE question is the closest to answering my 2nd question - Python - CSV: Large file with rows of different lengths - but i do not understand it. If any of you can help me regarding this, I'd be thankful.

编辑1:
@ Richie:以下是一个示例数据集 - http://goo.gl/io8Az 。它链接到一个Google文档。
并注册您的注释,这是结果与我运行你的代码在我的csv文件 -

EDIT 1 : @ Richie : here's a sample data set - http://goo.gl/io8Az. It links to a google doc. And regd your comment, this was the outcome with i ran your code on my csv file -

     ValueError: could not convert string to float: plate

@ Pieters:z = row,z.append这 -
['3836','55302','402','22.945717','22.771544','23.081865','22 .428421','21.78294','164.40663689','-1.25641627','1.780485 ','1237674648848106129',[...]]。

@ Pieters : z = row, z.append(z) created this - ['3836', '55302', '402', '22.945717', '22.771544', '23.081865', '22.428421', '21.78294', '164.40663689', '-1.25641627', '1.780485', '1237674648848106129', [...]].

我应该提到,我刚刚开始使用python,我正在学习的东西在一个需要知道的基础!我正在即兴创作与我在网上找到的位和片段的代码。

I should've mentioned that i've just started using python and i'm learning things on a need-to-know basis! I'm improvising with bits and pieces of code i'm finding on the web.

编辑2:
我听说过熊猫。我想我应该开始使用它。

EDIT 2: I've heard about pandas. I guess i should start using it.

@ Khalid - 我已经运行你的代码,我能够检索我想要的列。
而不是打印整行出来,我可以访问它吗?作为静态数组?

@ Khalid - i've run your code and i'm able to retrieve the column i want. Instead of printing the whole row out, can i access it instead?! as a static array?!

编辑3:
@ richie:我第一次运行你的代码,显示 -

EDIT 3: @ richie : the first time i ran your code, this showed up -

Traceback(最近调用最后一个):

中的第4行的文件ValueError:无法将字符串转换为float:plate

Traceback (most recent call last): File "", line 4, in ValueError: could not convert string to float: plate

well,我意识到,包含列名称的第一行是原因,所以我删除了第一行,保存为一个新的文件,并运行该文件的代码,它的工作完全正常。

well, i realized that the first row containing the column names is the cause, so i removed the first row, saved this as a new file and ran the code on that file and it worked perfectly fine.

但是,如果我删除第一行,其中包含列标识符,我不能使用khalid下面提到的方法。我在同时看大熊猫。

But, if i do remove the first line, which contains the column identifiers, i cannot use the method mentioned by khalid below. I am looking at pandas in the meanwhile.

感谢所有人:)

编辑4:
Lesson Learned。熊猫是真棒。作业完成:)...

EDIT 4 : Lesson Learnt. Pandas is Awesome. Job Done :)...

推荐答案

尝试此操作

import csv
import numpy as np
class onefloat(float):
   def __repr__(self):
       return "%0.1f" % self
with open('file.csv','rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print map(onefloat,row) # your issue of 1 decimal point is taken care of here
        print '{:.1f}'.format(np.median(map(float,row))) # in case you want this too to be of 1 decimal point

这是使用Pandas的方法

And this is how it is done using Pandas

import pandas as pd
data = pd.read_csv('richards_quasar_outliers.csv')
print data['plate'].median()

这篇关于在Python中从CSV文件访问列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆