Python:将列从其他csv文件追加到CSV [英] Python: Append column to CSV from a different csv file
问题描述
我目前有一个脚本,希望用于合并csv数据文件.例如,我有一个名为process.csv和file.csv的文件,但是当我尝试在一个名为"all_files.csv"的新文件中追加一个文件时它将附加正确的列,但不是从文件顶部开始.
I currently have a script which I want to use to combine csv data files. For example I have a file called process.csv and file.csv but when I try to append one to the other in a new file called 'all_files.csv' it appends it the correct column but not from the top of the file.
此刻发生了什么
process/sec
08/03/16 11:19 0
08/03/16 11:34 0.1
08/03/16 11:49 0
08/03/16 12:03 0
08/03/16 12:13 0
08/03/16 12:23 0
file/sec
0
43.3
0
0
0
0
0
我想要什么:
process/sec file/sec
08/03/16 11:19 0 0
08/03/16 11:34 0.1 43.3
08/03/16 11:49 0 0
08/03/16 12:03 0 0
08/03/16 12:13 0 0
08/03/16 12:23 0 0
这是我的代码(请注意,我删除了与我用于 per_second
值的算法有关的所有多余代码,并在此示例中使用了静态值):
Here is my code (Note I removed all the excess code relating the an algorithm I use for the per_second
value and use a static value in this example):
def all_data(data_name,input_file_name,idx):
#Create file if first set of data
if data_name == 'first_set_of_data':
all_per_second_file = open("all_data.csv", 'wb')
#Append to file for all other data
else:
all_per_second_file = open("all_data.csv", 'a')
row_position=''
#For loop with index number to position rows after one another
#So not to rewrite new data to the same columns in all_data.csv
for number in range(0,idx):
row_position=row_position+','
with open(input_file_name, 'rb') as csvfile:
# get number of columns
for line in csvfile.readlines():
array = line.split(',')
first_item = array[0]
num_columns = len(array)
csvfile.seek(0)
reader = csv.reader(csvfile, delimiter=',')
#Columns to include Date and desired data
included_cols = [0, 3]
count =0
#Test value for example purposes
per_second=12
for row in reader:
#Create header
if count==1:
all_per_second_file.write(row_position+','+event_name+"\n")
#Intialise date column with first set of data
#first entry rate must be 0
if count ==2:
if event_name == 'first_set_of_data':
all_per_second_file.write(row_position+row[0]+",0\n")
else:
all_per_second_file.write(row_position+",0\n")
#If data after the first row =0 value should reset so data/sec should be 0, not a minus number
if count>2 and row[3]=='0':
if event_name == 'first_set_of_data':
all_per_second_file.write(row_position+row[0]+",0\n")
else:
all_per_second_file.write(row_position+",0\n")
#Otherwise calculate rate
elif count >=3:
if event_name == 'first_set_of_data':
all_per_second_file.write(row_position+row[0]+","+str("%.1f" % per_second)+"\n")
else:
all_per_second_file.write(row_position+","+str("%.1f" % per_second)+"\n")
count = count+1
all_per_second_file.close()
更新代码:
我已将脚本更改为以下似乎可以正常工作的脚本:
I have changed my script to the following which seems to work correctly:
def all_data(input_file_name):
a = pd.read_csv(per_second_address+input_file_name[0])
b = pd.read_csv(per_second_address+input_file_name[1])
c = pd.read_csv(per_second_address+input_file_name[2])
d = pd.read_csv(per_second_address+input_file_name[3])
b = b.dropna(axis=1)
c = c.dropna(axis=1)
d = d.dropna(axis=1)
merged = a.merge(b, on='Date')
merged = merged.merge(c, on='Date')
merged = merged.merge(d, on='Date')
merged.to_csv(per_second_address+"all_event_per_second.csv", index=False)
推荐答案
CSV文件的读/写操作基于行.
CSV file read/write operation is line-based.
请检查以下代码以及python可用的基本模块:
Please check the below code with basic modules available with python:
process.csv包含:
process.csv contains:
time,process/sec
8/3/2016 11:19,0
8/3/2016 11:34,0
8/3/2016 11:49,1
8/3/2016 12:03,1
8/3/2016 12:13,0
8/3/2016 12:23,0
files.csv包含:
files.csv contains:
time,files/sec
8/3/2016 11:19,0
8/3/2016 11:34,2
8/3/2016 11:49,3
8/3/2016 12:03,4
8/3/2016 12:13,1
8/3/2016 12:23,0
Python代码将创建"combine.csv":
Python code will create "combine.csv":
import csv
#Read both files
with open('process.csv', 'rb') as a:
reader = csv.reader(a,delimiter = ",")
process_csv = list(reader)
with open('files.csv', 'rb') as b:
reader = csv.reader(b,delimiter = ",")
data_csv = list(reader)
#Write into combine.csv
if len(process_csv) == len(data_csv):
with open('combine.csv', 'ab') as f:
writer = csv.writer(f,delimiter = ",")
for i in range(0,len(process_csv)):
temp_list = []
temp_list.extend(process_csv[i])
temp_list.append(data_csv[i][1])
writer.writerow(temp_list)
combine.csv具有:
combine.csv has:
time,process/sec,files/sec
8/3/2016 11:19,0,0
8/3/2016 11:34,0,2
8/3/2016 11:49,1,3
8/3/2016 12:03,1,4
8/3/2016 12:13,0,1
8/3/2016 12:23,0,0
带有pandas模块的代码.
Code with pandas module.
import pandas as pd
a = pd.read_csv("process.csv")
b = pd.read_csv("files.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='time')
merged.to_csv("combine2.csv", index=False)
有关熊猫模块的更多信息,单击此处!!!
More info on pandas module, click here !!!
这篇关于Python:将列从其他csv文件追加到CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!