使用csv模块从csv文件读取特定列? [英] Read specific columns from a csv file with csv module?

查看:341
本文介绍了使用csv模块从csv文件读取特定列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过csv文件进行解析,并仅从特定列中提取数据。

I'm trying to parse through a csv file and extract the data from only specific columns.

示例csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

我试图仅捕获特定列,例如 ID 名称 Zip 电话

I'm trying to capture only specific columns, say ID, Name, Zip and Phone.

我看过的代码让我相信我可以通过其相应的数字调用特定列,即: Name 将对应于 2 ,并使用 row [2] 遍历每一行将产生所有项

Code I've looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn't.

这是我到目前为止所做的:

Here's what I've done so far:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

我希望这将只打印我想要的每一行的特定列,除了它不,我得到最后一列。

and I'm expecting that this will print out only the specific columns I want for each row except it doesn't, I get the last column only.

推荐答案

从此代码中获取最后一列的唯一方法是,如果您不在 您的 for 循环。

The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.

这很可能是您的代码的结束:

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content


b $ b

您希望它是这样:

You want it to be this:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

错误,我想借此时间向您介绍 pandas 模块。

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas对于处理csv文件非常壮观,下面的代码将是读取一个csv并将整列保存到一个变量所需要的:

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

所以如果你想保存列中的所有信息 Names 到一个变量,这是所有你需要做的:

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

这是一个伟大的模块,我建议你看看它。如果由于某种原因,你的打印语句是在循环,它仍然只打印出最后一列,这不应该发生,但让我知道,如果我的假设是错误的。您发布的代码有很多缩进错误,所以很难知道应该在哪里。希望这是有帮助的!

It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

这篇关于使用csv模块从csv文件读取特定列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆