在两个文件中查找匹配项并输出 [英] Finding matches in two files and outputting them

查看:127
本文介绍了在两个文件中查找匹配项并输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用第一个文件中的x [#]和第二个文件中的x [#],我想看看这两个值是否匹配,如果要匹配,我想将其与其他几个x [#]一起输出第二个文件中的值位于同一行.

I want to use x[#] from first file and x[#] from second file, I want to see if those two values match, if they do I want to output those, along with several other x[#] values from the second file, which are on the same line.

文件的格式为:(但是有数百万个,我想在两个文件中找到对,因为它们都应该匹配)

The format the files are in :(but there is millions, and I want to find the pairs in the two files because they all should match up)

  line 1  data,data,data,data
  line 2  data,data,data,data

文件1中的数据

 (N'068D556A1A665123A6DD2073A36C1CAF', N'A76EEAF6D310D4FD2F0BD610FAC02C04DFE6EB67',    
N'D7C970DFE09687F1732C568AE1CFF9235B2CBB3673EA98DAA8E4507CC8B9A881');

文件2中的数据:

00000040f2213a27ff74019b8bf3cfd1|index.docbook|Redhat 7.3 (32bit)|Linux
00000040f69413a27ff7401b8bf3cfd1|index.docbook|Redhat 8.0 (32bit)|Linux
00000965b3f00c92a18b2b31e75d702c|Localizable.strings|Mac OS X 10.4|OSX
0000162d57845b6512e87db4473c58ea|SYSTEM|Windows 7 Home Premium (32bit)|Windows
000011b20f3cefd491dbc4eff949cf45|totem.devhelp|Linux Ubuntu Desktop 9.10 (32bit)|Linux

排序的顺序是字母数字,我想使用滑块方法.我的意思是,如果file1 [x]是< file2 [x]根据一个值是否大于另一个值来上下移动滑块,直到找到匹配项为止;如果找到匹配项,则将输出以及其他将标识该哈希值的值打印出来.

The order it is sorted in is alphanumeric, and I want to use a slider method. By that I mean if file1[x] is < file2[x] move the slider down or up depending on whether one value is greater than the other, until a match is found, when and if so, print the output along with other values that will identify that hash.

我想要的结果是:

file1 [x]及其在输出到文件的file2 [x]上的对应匹配,以及其他file1 [x],其中x可以是该行的任何索引.值以及使用索引方法的其他值.

file1[x] and its corresponding match on file2[x] outputted to a file, as well as other file1[x] where x can be any index from the line. values along with other values using an index method.

推荐答案

从起点开始,添加您自己的盐和胡椒粉就不是最佳选择了,应该使用executemany等...但这是您要决定的.

A starting point, add your own salt and pepper it's far from optimal and should use executemany etc...but that's for you to decide.

from StringIO import StringIO
import csv
import sqlite3 as sq3
from operator import methodcaller, itemgetter
from itertools import groupby

data1 = """068D556A1A665123A6DD2073A36C1CAF
A76EEAF6D310D4FD2F0BD610FAC02C04DFE6EB67
D7C970DFE09687F1732C568AE1CFF9235B2CBB3673EA98DAA8E4507CC8B9A881"""

data2 = """00000040f2213a27ff74019b8bf3cfd1|index.docbook|Redhat 7.3 (32bit)|Linux
00000040f69413a27ff7401b8bf3cfd1|index.docbook|Redhat 8.0 (32bit)|Linux
00000965b3f00c92a18b2b31e75d702c|Localizable.strings|Mac OS X 10.4|OSX
0000162d57845b6512e87db4473c58ea|SYSTEM|Windows 7 Home Premium (32bit)|Windows
000011b20f3cefd491dbc4eff949cf45|totem.devhelp|Linux Ubuntu Desktop 9.10 (32bit)|Linux"""

file1 = StringIO(data1)
file2 = StringIO(data2)

db = sq3.connect(':memory:')
db.execute('create table keys (key)')
db.execute('create table details (key, f1, f2, f3)')

for f1data in file1:
    db.execute('insert into keys values(?)', (f1data.strip(),))

for f2data in file2:
    row = map(methodcaller('strip'), f2data.split('|'))
    db.execute('insert into details values (?,?,?,?)', row)

results = db.execute('select * from keys natural join details')

for key, val in groupby(results, itemgetter(0)):
    print key, list(val)

这篇关于在两个文件中查找匹配项并输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆