Python:将列从其他csv文件追加到CSV [英] Python: Append column to CSV from a different csv file

查看:70
本文介绍了Python:将列从其他csv文件追加到CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一个脚本,希望用于合并csv数据文件.例如,我有一个名为process.csv和file.csv的文件,但是当我尝试在一个名为"all_files.csv"的新文件中追加一个文件时它将附加正确的列,但不是从文件顶部开始.

I currently have a script which I want to use to combine csv data files. For example I have a file called process.csv and file.csv but when I try to append one to the other in a new file called 'all_files.csv' it appends it the correct column but not from the top of the file.

此刻发生了什么

                process/sec
08/03/16 11:19  0
08/03/16 11:34  0.1
08/03/16 11:49  0
08/03/16 12:03  0
08/03/16 12:13  0
08/03/16 12:23  0
                                file/sec
                                0
                                43.3
                                0
                                0
                                0
                                0
                                0

我想要什么:

                process/sec     file/sec
08/03/16 11:19  0               0
08/03/16 11:34  0.1             43.3
08/03/16 11:49  0               0
08/03/16 12:03  0               0
08/03/16 12:13  0               0
08/03/16 12:23  0               0

这是我的代码(请注意,我删除了与我用于 per_second 值的算法有关的所有多余代码,并在此示例中使用了静态值):

Here is my code (Note I removed all the excess code relating the an algorithm I use for the per_second value and use a static value in this example):

def all_data(data_name,input_file_name,idx):
    #Create file if first set of data
    if data_name == 'first_set_of_data':
            all_per_second_file = open("all_data.csv", 'wb')
    #Append to file for all other data
    else:
        all_per_second_file = open("all_data.csv", 'a')

        row_position=''
        #For loop with index number to position rows after one another 
        #So not to rewrite new data to the same columns in all_data.csv
        for number in range(0,idx):
            row_position=row_position+','

    with open(input_file_name, 'rb') as csvfile:

        # get number of columns
        for line in csvfile.readlines():
            array = line.split(',')
            first_item = array[0]

        num_columns = len(array)
        csvfile.seek(0)

        reader = csv.reader(csvfile, delimiter=',')
        #Columns to include Date and desired data
        included_cols = [0, 3]
        count =0

        #Test value for example purposes
        per_second=12

        for row in reader:          
            #Create header
            if count==1:
                all_per_second_file.write(row_position+','+event_name+"\n")

            #Intialise date column with first set of data
            #first entry rate must be 0
            if count ==2:
                if event_name == 'first_set_of_data':
                    all_per_second_file.write(row_position+row[0]+",0\n")
                else:
                    all_per_second_file.write(row_position+",0\n")

            #If data after the first row =0 value should reset so data/sec should be 0, not a minus number
            if count>2 and row[3]=='0':
                if event_name == 'first_set_of_data':
                    all_per_second_file.write(row_position+row[0]+",0\n")
                else:
                    all_per_second_file.write(row_position+",0\n")

            #Otherwise calculate rate
            elif count >=3:
                if event_name == 'first_set_of_data':
                    all_per_second_file.write(row_position+row[0]+","+str("%.1f" % per_second)+"\n")
                else:
                    all_per_second_file.write(row_position+","+str("%.1f" % per_second)+"\n")

            count = count+1

    all_per_second_file.close()

更新代码:

我已将脚本更改为以下似乎可以正常工作的脚本:

I have changed my script to the following which seems to work correctly:

def all_data(input_file_name):
    a = pd.read_csv(per_second_address+input_file_name[0])
    b = pd.read_csv(per_second_address+input_file_name[1])
    c = pd.read_csv(per_second_address+input_file_name[2])
    d = pd.read_csv(per_second_address+input_file_name[3])

    b = b.dropna(axis=1)
    c = c.dropna(axis=1)
    d = d.dropna(axis=1)

    merged = a.merge(b, on='Date')
    merged = merged.merge(c, on='Date')
    merged = merged.merge(d, on='Date')    

    merged.to_csv(per_second_address+"all_event_per_second.csv", index=False)

推荐答案

CSV文件的读/写操作基于行.

CSV file read/write operation is line-based.

请检查以下代码以及python可用的基本模块:

Please check the below code with basic modules available with python:

process.csv包含:

process.csv contains:

time,process/sec
8/3/2016 11:19,0
8/3/2016 11:34,0
8/3/2016 11:49,1
8/3/2016 12:03,1
8/3/2016 12:13,0
8/3/2016 12:23,0

files.csv包含:

files.csv contains:

time,files/sec
8/3/2016 11:19,0
8/3/2016 11:34,2
8/3/2016 11:49,3
8/3/2016 12:03,4
8/3/2016 12:13,1
8/3/2016 12:23,0

Python代码将创建"combine.csv":

Python code will create "combine.csv":

import csv

#Read both files
with open('process.csv', 'rb') as a:
    reader = csv.reader(a,delimiter = ",")
    process_csv = list(reader)

with open('files.csv', 'rb') as b:
    reader = csv.reader(b,delimiter = ",")
    data_csv = list(reader)

#Write into combine.csv
if len(process_csv) == len(data_csv):
    with open('combine.csv', 'ab') as f:
        writer = csv.writer(f,delimiter = ",")
        for i in range(0,len(process_csv)):
            temp_list = []
            temp_list.extend(process_csv[i])
            temp_list.append(data_csv[i][1])
            writer.writerow(temp_list)

combine.csv具有:

combine.csv has:

time,process/sec,files/sec
8/3/2016 11:19,0,0
8/3/2016 11:34,0,2
8/3/2016 11:49,1,3
8/3/2016 12:03,1,4
8/3/2016 12:13,0,1
8/3/2016 12:23,0,0


带有pandas模块的代码.


Code with pandas module.

import pandas as pd

a = pd.read_csv("process.csv")
b = pd.read_csv("files.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='time')
merged.to_csv("combine2.csv", index=False)

有关熊猫模块的更多信息,单击此处!!!

More info on pandas module, click here !!!

这篇关于Python:将列从其他csv文件追加到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆