使用 csv 模块从 csv 文件中读取特定列? [英] Read specific columns from a csv file with csv module?

查看:146
本文介绍了使用 csv 模块从 csv 文件中读取特定列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析 csv 文件并仅从特定列中提取数据.

I'm trying to parse through a csv file and extract the data from only specific columns.

示例 csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

我试图只捕获特定的列,比如 IDNameZipPhone.

I'm trying to capture only specific columns, say ID, Name, Zip and Phone.

我看过的代码让我相信我可以通过相应的编号调用特定的列,因此:Name 将对应于 2 并迭代使用 row[2] 的每一行都会生成第 2 列中的所有项目.只有它不会.

Code I've looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn't.

这是我到目前为止所做的:

Here's what I've done so far:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

而且我希望这将只打印出我想要的每一行的特定列,除非它没有,我只得到最后一列.

and I'm expecting that this will print out only the specific columns I want for each row except it doesn't, I get the last column only.

推荐答案

从这段代码中获得最后一列的唯一方法是,如果您不将打印语句包含在for 循环.

The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.

这很可能是您代码的结尾:

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

你希望它是这样的:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

既然我们已经解决了您的错误,我想借此机会向您介绍pandas模块.

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas 在处理 csv 文件方面非常出色,以下代码将是您读取 csv 并将整列保存到变量中所需的全部代码:

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

因此,如果您想将 Names 列中的所有信息保存到一个变量中,您只需:

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

这是一个很棒的模块,我建议您研究一下.如果由于某种原因您的打印语句在 for 循环中并且它仍然只打印出最后一列,这不应该发生,但如果我的假设是错误的,请告诉我.您发布的代码有很多缩进错误,因此很难知道应该在哪里.希望这有帮助!

It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

这篇关于使用 csv 模块从 csv 文件中读取特定列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆