提取包含特定名称的列 [英] Extracting columns containing a certain name

查看:61
本文介绍了提取包含特定名称的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图用它来处理大型txt文件中的数据.

I'm trying to use it to manipulate data in large txt-files.

我有一个txt文件,其中包含超过2000列,其中大约三分之一的标题包含"Net"一词.我只想提取这些列并将它们写入新的txt文件.关于我该怎么做的任何建议?

I have a txt-file with more than 2000 columns, and about a third of these have a title which contains the word 'Net'. I want to extract only these columns and write them to a new txt file. Any suggestion on how I can do that?

我进行了一些搜索,但没有找到对我有帮助的东西.如果以前曾提出并解决过类似的问题,我们深表歉意.

I have searched around a bit but haven't been able to find something that helps me. Apologies if similar questions have been asked and solved before.

谢谢大家!撰写本文时,有3位用户提出了建议的解决方案,它们都非常有效.老实说,我认为人们不会回答,所以我没有检查一两天,对此感到惊讶.我印象深刻.

EDIT 1: Thank you all! At the moment of writing 3 users have suggested solutions and they all work really well. I honestly didn't think people would answer so I didn't check for a day or two, and was happily surprised by this. I'm very impressed.

我添加了一张图片,该图片显示了原始txt文件的一部分的外观,以防将来对任何人有帮助:

EDIT 2: I've added a picture that shows what a part of the original txt-file can look like, in case it will help anyone in the future:

推荐答案

执行此操作的一种方法是,不安装诸如numpy/pandas之类的第三方模块.给定一个名为"input.csv"的输入文件,如下所示:

One way of doing this, without the installation of third-party modules like numpy/pandas, is as follows. Given an input file, called "input.csv" like this:

a,b,c_net,d,e_net

a,b,c_net,d,e_net

0,0,1,0,1

0,0,1,0,1

0,0,1,0,1

0,0,1,0,1

(删除中间的空白行,它们仅用于格式化 帖子中的内容)

(remove the blank lines in between, they are just for formatting the content in this post)

以下代码可满足您的需求.

The following code does what you want.

import csv


input_filename = 'input.csv'
output_filename = 'output.csv'

# Instantiate a CSV reader, check if you have the appropriate delimiter
reader = csv.reader(open(input_filename), delimiter=',')

# Get the first row (assuming this row contains the header)
input_header = reader.next()

# Filter out the columns that you want to keep by storing the column
# index
columns_to_keep = []
for i, name in enumerate(input_header):
    if 'net' in name:
        columns_to_keep.append(i)

# Create a CSV writer to store the columns you want to keep
writer = csv.writer(open(output_filename, 'w'), delimiter=',')

# Construct the header of the output file
output_header = []
for column_index in columns_to_keep:
    output_header.append(input_header[column_index])

# Write the header to the output file
writer.writerow(output_header)

# Iterate of the remainder of the input file, construct a row
# with columns you want to keep and write this row to the output file
for row in reader:
    new_row = []
    for column_index in columns_to_keep:
        new_row.append(row[column_index])
    writer.writerow(new_row)

请注意,没有错误处理.至少应处理两个.第一个是检查输入文件是否存在(提示:检查os和os.path模块提供的功能).第二个是处理空行或行数不一致的行.

Note that there is no error handling. There are at least two that should be handled. The first one is the check for the existence of the input file (hint: check the functionality provide by the os and os.path modules). The second one is to handle blank lines or lines with an inconsistent amount of columns.

这篇关于提取包含特定名称的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆