Python csv;获得所有列的最大长度,然后将所有其他列加长到该长度 [英] Python csv; get max length of all columns then lengthen all other columns to that length

查看:303
本文介绍了Python csv;获得所有列的最大长度,然后将所有其他列加长到该长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个目录,其中包含以下格式的数据文件:

I have a directory full of data files in the following format:

4 2 5 7
1 4 9 8
8 7 7 1
  4 1 4
    1 5
    2 0
    1 0
    0 0
    0 0

它们之间用制表符分隔.第三和第四列包含有用的信息,直到它们达到零"为止.这时,它们将被任意填充为零,直到文件结尾.

They are separated by tabs. The third and fourth columns contain useful information until they reach 'zeroes'.. At which point, they are arbitrarily filled with zeroes until the end of file.

我想获取最长列的长度,在该列中我们不计算底部的零"值.在这种情况下,最长的列是长度为7的第3列,因为我们忽略了底部的零.然后,我想通过在其他所有列上填充零,直到它们的长度等于我的第三列的长度来变换所有其他列(除了第4 b/c列,它已经被零填充了).然后,我想摆脱所有列中超出最大长度的所有零..因此,我想要的文件输出如下:

I want to get the length of the longest column where we do not count the 'zero' values on the bottom. In this case, the longest column is column 3 with a length of 7 because we disregard the zeros at the bottom. Then I want to transform all the other columns by packing zeroes on them until their length is equal to the length of my third column (besides column 4 b/c it is already filled with zeroes). Then I want to get rid of all the zeros beyond my max length in all my columns.. So my desired file output will be as follows:

4 2 5 7
1 4 9 8
8 7 7 1
0 4 1 4
0 0 1 5
0 0 2 0
0 0 1 0

这些文件平均每个包含100,000行...因此处理它们需要一段时间.真的找不到有效的方法来做到这一点.由于文件读取的方式(逐行),我是否正确地假设为了找到列的长度,我们需要在最坏的情况下处理N行?其中N是整个文件的长度.当我只运行一个脚本以打印出所有行时,每个文件花费了大约10秒钟的时间……而且,我想就地修改文件(覆盖).

These files consist of ~ 100,000 rows each on average... So processing them takes a while. Can't really find an efficient way of doing this. Because of the way file-reading goes (line-by-line), am I right in assuming that in order to find the length of a column, we need to process in the worst case, N rows? Where N is the length of the entire file. When I just ran a script to print out all the rows, it took about 10 seconds per file... Also, I'd like to modify the file in-place (over-write).

推荐答案

有两种方法可以做到这一点:

Here are two ways to do it:

# Read in the lines and fill in the zeroes
with open('input.txt') as input_file:
    data = [[item.strip() or '0' 
             for item in line.split('\t')]
            for line in input_file]

# Delete lines near the end that are only zeroes
while set(data[-1]) == {'0'}:
    del data[-1]

# Write out the lines
with open('output.txt', 'wt') as output_file:
    output_file.writelines('\t'.join(line) + '\n' for line in data)

with open('input.txt') as input_file:
    with open('output.txt', 'wt') as output_file:
        for line in input_file:
            line = line.split('\t')
            line = [item.strip() or '0' for item in line]
            if all(item == '0' for item in line):
                break
            output_file.write('\t'.join(line))
            output_file.write('\n')

这篇关于Python csv;获得所有列的最大长度,然后将所有其他列加长到该长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆