提取包含特定名称的列 [英] Extracting columns containing a certain name

查看：61 发布时间：2020/11/2 21:48:14 python text-files extraction

本文介绍了提取包含特定名称的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正试图用它来处理大型txt文件中的数据.

I'm trying to use it to manipulate data in large txt-files.

我有一个txt文件，其中包含超过2000列，其中大约三分之一的标题包含"Net"一词.我只想提取这些列并将它们写入新的txt文件.关于我该怎么做的任何建议?

I have a txt-file with more than 2000 columns, and about a third of these have a title which contains the word 'Net'. I want to extract only these columns and write them to a new txt file. Any suggestion on how I can do that?

我进行了一些搜索，但没有找到对我有帮助的东西.如果以前曾提出并解决过类似的问题，我们深表歉意.

I have searched around a bit but haven't been able to find something that helps me. Apologies if similar questions have been asked and solved before.

谢谢大家！撰写本文时，有3位用户提出了建议的解决方案，它们都非常有效.老实说，我认为人们不会回答，所以我没有检查一两天，对此感到惊讶.我印象深刻.

EDIT 1: Thank you all! At the moment of writing 3 users have suggested solutions and they all work really well. I honestly didn't think people would answer so I didn't check for a day or two, and was happily surprised by this. I'm very impressed.

我添加了一张图片，该图片显示了原始txt文件的一部分的外观，以防将来对任何人有帮助:

EDIT 2: I've added a picture that shows what a part of the original txt-file can look like, in case it will help anyone in the future:

推荐答案

执行此操作的一种方法是，不安装诸如numpy/pandas之类的第三方模块.给定一个名为"input.csv"的输入文件，如下所示:

One way of doing this, without the installation of third-party modules like numpy/pandas, is as follows. Given an input file, called "input.csv" like this:

a，b，c_net，d，e_net

a,b,c_net,d,e_net

0,0,1,0,1

(删除中间的空白行，它们仅用于格式化帖子中的内容)

(remove the blank lines in between, they are just for formatting the content in this post)

以下代码可满足您的需求.

The following code does what you want.

import csv


input_filename = 'input.csv'
output_filename = 'output.csv'

# Instantiate a CSV reader, check if you have the appropriate delimiter
reader = csv.reader(open(input_filename), delimiter=',')

# Get the first row (assuming this row contains the header)
input_header = reader.next()

# Filter out the columns that you want to keep by storing the column
# index
columns_to_keep = []
for i, name in enumerate(input_header):
    if 'net' in name:
        columns_to_keep.append(i)

# Create a CSV writer to store the columns you want to keep
writer = csv.writer(open(output_filename, 'w'), delimiter=',')

# Construct the header of the output file
output_header = []
for column_index in columns_to_keep:
    output_header.append(input_header[column_index])

# Write the header to the output file
writer.writerow(output_header)

# Iterate of the remainder of the input file, construct a row
# with columns you want to keep and write this row to the output file
for row in reader:
    new_row = []
    for column_index in columns_to_keep:
        new_row.append(row[column_index])
    writer.writerow(new_row)

请注意，没有错误处理.至少应处理两个.第一个是检查输入文件是否存在(提示:检查os和os.path模块提供的功能).第二个是处理空行或行数不一致的行.

Note that there is no error handling. There are at least two that should be handled. The first one is the check for the existence of the input file (hint: check the functionality provide by the os and os.path modules). The second one is to handle blank lines or lines with an inconsistent amount of columns.

这篇关于提取包含特定名称的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

提取包含特定名称的列 [英] Extracting columns containing a certain name

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

提取包含特定名称的列 [英] Extracting columns containing a certain name

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭