修正已删除行的CSV文件的编号 [英] Fix numbering on CSV files that have deleted lines

查看:119
本文介绍了修正已删除行的CSV文件的编号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆已编辑的CSV文件,并且删除了其中所有带有"DIF"的行.我后来意识到的问题是文件中的计数保持与以前相同.这是我编辑之前的CSV示例.

I have a bunch of CSV files that I have edited and gotten rid of all of the lines that have 'DIF' in them. The problem that I realized later is that the count number in the file stays the same as before. Here is an example of the CSV before I edit it.

Name    bunch of stuff                          
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
Count   11                           
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee
N4,Polo
N5,Sneaker
N6,DIF
N7,DIF
N8,DIF
N9,DIF
N10,Heel
N11,Tee

这是输出CSV的外观.我希望计数"旁边的数字等于"ITEMS"列中的数字,并且希望"NUMBER"列中的所有内容都是连续的.

This is how the output CSV looks. I want the number next to 'Count' to equal the number now in the 'ITEMS' column as well as have everything in the 'NUMBER' column to be sequential.

Name    bunch of stuff                          
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
Count   11                           
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee
N4,Polo
N5,Sneaker
N10,Heel
N11,Tee

这是我当前的代码.它可以实现我想要的功能,但是像我上面提到的那样,它会破坏CSV的其余部分.

Here is my current code that does that. It does what I want it to, but it screws up the rest of the CSV like I mentioned above.

import csv
import glob
import os

fns = glob.glob('*.csv') #goes through every CSV file in directory

for fn in fns:
     reader=csv.reader(open(fn,"rb"))
     with open (os.path.join('out', fn), 'wb') as f:
        w = csv.writer(f)
        for row in reader:
            if not ' DIF' in row: #remove DIF
                w.writerow(row)

我已经尝试了一些小事情来解决它,但是我对编程还很陌生,我尝试做的似乎没有什么.任何帮助将不胜感激.

I've tried a few small things to fix it, but I am fairly new to programming and nothing I try seems to do much. Any help would be appreciated.

谢谢

推荐答案

如果需要更新计数,则必须读取两次并计算第一保留的行数.编写匹配的行后,您可以保留一个单独的计数器来重写第一列:

If you need to update the count, then you have to read twice and count the number of rows you are keeping first. You can keep a separate counter to rewrite the first column once you are writing the matched lines:

import re

numbered = re.compile(r'N\d+').match

for fn in fns:
     # open for counting
     reader = csv.reader(open(fn,"rb"))
     count = sum(1 for row in reader if row and not any(r.strip() == 'DIF' for r in row) and numbered(row[0]))

     # reopen for filtering
     reader = csv.reader(open(fn,"rb"))

     with open (os.path.join('out', fn), 'wb') as f:
        counter = 0
        w = csv.writer(f)
        for row in reader:
            if row and 'Count' in row[0].strip():
                row = ['Count', count]
            if row and not any(r.strip() == 'DIF' for r in row): #remove DIF
                if numbered(row[0]):
                    counter += 1
                    row[0] = 'N%d' % counter
            w.writerow(row)

这篇关于修正已删除行的CSV文件的编号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆