python中的2d列表-通过列名访问 [英] 2d list in python - accessing through column names

查看:153
本文介绍了python中的2d列表-通过列名访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析两个具有如下所示数据的文件

I'm parsing two files which has data as shown below

文件1:

    UID       A        B            C           D   
    ------ ---------- ---------- ---------- ---------- 
    456          536         1       148       304 
    1071         908         1       128       243 
    1118           4         8        52       162 
    249            4         8        68       154 
    1072         296       416        68       114 
    118          180       528        68        67 

文件2:

    UID       X         Y            A           Z         B   
    ------ ---------- ---------- ---------- ---------- ---------
    456          536         1       148       304        234
    1071         908         1       128       243        12
    1118           4         8        52       162        123
    249            4         8        68       154        987
    1072         296       416        68       114         45
    118          180       528        68        67          6

我将比较两个这样的文件,但是列数和列名可能会有所不同.对于每个唯一的UID,我需要匹配列名称,进行比较并找出差异.

I will be comparing two such files, however the number of columns might vary and the columns names. For every unique UID, I need to match the column names, compare and find the difference.

问题 1.是否可以通过列名而不是索引来访问列? 2.根据文件数据动态指定列名?

Questions 1. Is there a way to access columns by column names instead of index? 2. Dynamically give column names based on the file data?

我能够将文件加载到列表中,并使用索引进行比较,但这并不是一个合适的解决方案.

I'm able to load the file into list, and compare using indexes, but thats not a proper solutions.

谢谢.

推荐答案

您可以考虑使用 csv.DictReader .它使您既可以按名称寻址列,又可以为打开的每个文件使用列的可变列表.考虑从实际数据中删除------分隔标头,因为它可能会被错误读取.

You might consider using csv.DictReader. It allows you both to address columns by names, and a variable list of columns for each file opened. Consider removing the ------ separating header from actual data as it might be read wrong.

示例:

import csv
with open('File1', 'r', newline='') as f:
    # If you don't pass field names
    # they are taken from the first row.
    reader = csv.DictReader(f)
    for line in reader:
        # `line` is a dict {'UID': val, 'A': val, ... }
        print line

如果您的输入格式没有明确的定界符(多个空格),则可以使用生成器将文件包装起来,该生成器会将连续的空格压缩为例如逗号:

If your input format has no clear delimiter (multiple whitespaces), you can wrap the file with a generator that will compress continous whitespaces into e.g. a comma:

import csv
import re

r = re.compile(r'[ ]+')


def trim_whitespaces(f):
    for line in f:
        yield r.sub(',', line)

with open('test.txt', 'r', newline='') as f:
    reader = csv.DictReader(trim_whitespaces(f))
    for line in reader:
        print line

这篇关于python中的2d列表-通过列名访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆