循环通过一个csv文件的行以在另一个中找到相应的数据 [英] loop through rows of one csv file to find corresponding data in another
问题描述
我有一个有趣的问题:
file1.csv有几百行像:
file1.csv has a few hundred rows like:
Code,DTime
1,2010-12-26 17:01
2,2010-12-26 17:07
2,2010-12-26 17:15
file2.csv大约有1100万行,例如:
file2.csv has about 11 million rows like:
id,D,Sym,DateTime,Bid,Ask
1375022797,D,USD,2010-12-26 17:00:15,1.311400,1.311700
1375022965,D,USD,2010-12-26 17:00:56,1.311200,1.311500
1375022984,D,USD,2010-12-26 17:00:56,1.311300,1.311600
1375023013,D,USD,2010-12-26 17:01:01,1.311200,1.311500
1375023039,D,USD,2010-12-26 17:01:02,1.311100,1.311400
1375023055,D,USD,2010-12-26 17:01:03,1.311200,1.311500
1375023063,D,USD,2010-12-26 17:01:03,1.311300,1.311600
我想做的是编写一个脚本,它接受file1.csv中的每个DTime值并找到第一个实例的file2.csv的DateTime列中的部分匹配,并输出DateTime,Bid,Ask的行。部分匹配是前16个字符。
What i'm trying to do is to write a script that takes each DTime value in file1.csv and finds the first instance of a partial match in the DateTime column of file2.csv, and outputs DateTime, Bid, Ask for that row. The partial match is on the first 16 characters.
这两个文件都是从最旧到最新的,所以如果file1.csv中的2010-12-26 17:01匹配4个条目在file2.csv,我只需要提取第一个:2010-12-26 17:01:01
Both files are sorted from oldest to newest, so if "2010-12-26 17:01" from file1.csv matched 4 entries in file2.csv, I only need to extract the first one: "2010-12-26 17:01:01"
不知道如何继续..我试过一个字典,但值的顺序是重要的,所以我不知道如果这将工作。也许把file1的DTime列放到一个列表中,并且该列表中的每个条目在file2中搜索DateTime?
Not sure how to proceed.. I tried a dictionary but the order of values is important so i'm not sure if that would work. Maybe bring file1's DTime column into a list and for each entry in that list search DateTime in file2?
感谢伙伴
推荐答案
code> DTime 值,这应该工作:
If you don't have duplicate DTime
values, this should work:
import csv
file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")
header1 = file1reader.next() #header
header2 = file2reader.next() #header
for Code, DTime in file1reader:
for id_, D, Sym, DateTime, Bid, Ask in file2reader:
if DateTime.startswith(DTime): # found it
print DateTime, Bid, Ask # output data
break # break and continue where we left next time
编辑
import csv
from datetime import datetime
file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")
header1 = file1reader.next() #header
header2 = file2reader.next() #header
for Code, DTime in file1reader:
DTime = datetime.strptime(DTime, "%Y-%m-%d %H:%M")
for id_, D, Sym, DateTime, Bid, Ask in file2reader:
DateTime = datetime.strptime(DateTime, "%Y-%m-%d %H:%M:%S")
if DateTime>=DTime: # found it
print DateTime, Bid, Ask # output data
break # break and continue where we left next time
这篇关于循环通过一个csv文件的行以在另一个中找到相应的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!