如何通过将其与python中的其他csv文件进行比较来删除和替换csv文件中的列? [英] How to delete and replace columns in a csv file by comparing it to other csv files in python?

查看：121 发布时间：2020/7/11 22:15:25 python csv

本文介绍了如何通过将其与python中的其他csv文件进行比较来删除和替换csv文件中的列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个python代码以搜索，删除和替换csv文件中的列我有3个文件.

I am writing a python code to search,delete and replace columns in a csv file I have 3 files.

Input.csv:

aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
mmmmmmmm,nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx

delete.csv:

aaaaaaaa
eeeeeeee
uuuuuuuu

replace.csv:

iiiiiiii,11111111,22222222
mmmmmmmm,33333333,44444444

这是我的代码:

input_file='input.csv'
new_array=[]
for line in open(input_file):
    data=line.split(',')
    a==data[0]
    b=data[1]
    c=data[2]
    d=data[3]
    for line2 in open(delete):
        if (name in line2)==True:
            break
        else:
            for line1 in open(replace):
                data1=line1.split(',')
                aa=data1[0]
                replaced_a=data1[1]
                repalced_b=data1[2]


            if (data[0]==data1[0]):

                data[0]=data1[1]
                data[2]=data1[2]
                new_array=data
                print(new_array)

            else:   
                new_array=data

我的逻辑是:

1)open input.csv read line by line
2)load elements into an array
3)compare first element with entire delete.csv
4)if found in delete.csv then do nothing and take next line in array
5)if not found in delete.csv then compare with replace.csv
6)if the first element is found in the first column of replace.csv then replace the element by the corresponding second column of replace.csv and the second element with the corresponding 3rd third column of repalce.csv.
7)load this array into a bigger 10 element array.

所以我想要的输出是:

11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt

所以现在我面临以下问题: 1)replace.csv或delete.csv中不存在的行不会被打印 2)我的input.csv有可能在一个条目中包含换行符，因此逐行读取是一个问题，但是可以确定分布在不同行上的数据是在引号之间. 例如:

so right now i am facing the following problems: 1)lines that are not present in either replace.csv or delete.csv dont get printed 2)there is a possibility that my input.csv may contain newlines within one entry so reading line by line is a problem,however it is a certainty that the data distributed on the different lines is between quotes. eg:

aaaaa,bbbb,ccccc,"ddddddddddd
ddddddd"
11111,2222,3333,4444

在将代码和我的逻辑整合在一起方面提供的任何帮助.

any help in bringing the code and my logic together is appreciated.

推荐答案

我建议对此进行一些更改:

I would suggest changing this up a bit:

在字典中阅读要replace的内容

将键设置为数据第0个位置，将值设置为用数据替换第0和第1个位置的内容

read the things you want to replace in a dictionary

set key to what is in your datas 0th spot, set value as what to replace the 0th and 1st spot of your data with

如果您的数据行以它开头:请跳过行，否则将其添加到输出中.

遍历您的数据，并使用这两个查找来做正确的事".

Loop over your data and use the both lookups to "do the right thing".

我对您的数据进行了一些更改，以合并提到的转义的"数据，包括换行符:

I changed your data a bit to incorperate the mentioned "escaped" data including newlines:

文件创建:

with open("i.csv","w") as f: 
    f.write("""
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
"mmmm
mmmm",nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx""")

with open ("d.csv","w") as f: 
    f.write("""
aaaaaaaa
eeeeeeee
uuuuuuuu""")

with open ("r.csv","w") as f: 
    f.write("""
iiiiiiii,11111111,22222222
"mmmm
mmmm",33333333,44444444""")

程序:

import csv

def read_file(fn):
    rows = [] 
    with open(fn) as f:
        reader = csv.reader(f, quotechar='"',delimiter=",")
        for row in reader:
            if row:                     # eliminate empty rows from data read
                rows.append(row)
    return rows 

# create a dict for the replace stuff        
replace = {x[0]:x[1:] for x in read_file("r.csv")}

# create a set for the delete stuff
delete = set( (row[0] for row in read_file("d.csv")) )  

# collect what we need to write back
result = []

# https://docs.python.org/3/library/csv.html
with open("i.csv") as f:
    reader = csv.reader(f, quotechar='"')
    for row in reader:
        if row:
            if row[0] in delete:
                continue                                   # skip data row
            elif row[0] in replace:
                # replace with mapping, add rest of row
                result.append(replace[row[0]] + row[2:])   # replace data
            else:
                result.append(row)                         # use as is

# write result back into file
with open ("done.csv", "w", newline="") as f:
    w = csv.writer(f,quotechar='"', delimiter= ",")
    w.writerows(result)

检查结果:

with open ("done.csv") as f:
    print(f.read())

输出:

11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt

Doku:

csv.writer/csv.reader

这篇关于如何通过将其与python中的其他csv文件进行比较来删除和替换csv文件中的列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何通过将其与python中的其他csv文件进行比较来删除和替换csv文件中的列? [英] How to delete and replace columns in a csv file by comparing it to other csv files in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何通过将其与python中的其他csv文件进行比较来删除和替换csv文件中的列? [英] How to delete and replace columns in a csv file by comparing it to other csv files in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭