匹配两个巨大的CSV文件之间的共同标识 [英] Match common IDs between two huge csv files
问题描述
我需要两个巨大的CSV文件比较条目千像波纹管:
I need to compare two huge csv files with a thousand of entries like bellow:
id;val
1;a
2;b
3;c
答第二文件具有以下结构
Ans second file has the following structure
id1;entry
1;002
2;x90
5;d07
期望的结果是相匹配,并与ID / ID1相同的值相结合的线条,创造出与波纹管只匹配条目第三CSV文件:
The desired result is to match and combine the lines with the same value for id/id1 and create a third csv file with only matched entries showing bellow:
idR;valR;entryR
1;a;002
2;b;x90
要做到这一点,我可以加载每个文件中的一个独特的数据库表,并执行一个选择像这样来检索所有匹配的值:
To accomplish this I can load each file in a distinct database table and perform a select like this to retrieve all matched values:
select tb1.id, tb1.val tb2.entry
from tb1, tb2
where tb1.id = tb2.1
目前,一旦我可以检索所需使用这种方法的所有值。
At once I can retrieve all values desired with this approach.
但是让我们假设这些文件可以进行排序,并以这种方式使用它可能用awk打印结果与ID和ID1相同值的条目。我所能做的最好是为每个值创建两个关联数组和用awk执行二进制搜索和sed /切?
But let's suppose these files could be sorted and in this way use it's possible to use awk to print the results for a entries with the same values for id and id1. The best that I can do is to create two associative arrays for each value and perform a binary search using awk and sed/cut?
这是可能加载这两个文件并立即将它们结合起来,产生的结果最终csv文件?
It's possible to load these two files and combine them at once to produce a final csv file with the results?
或者我可以给这与标准库的Perl?
Or I can to this with perl with standard lib?
推荐答案
能做到这一点与标准 加入
效用
Can do this with standard join
utility
FILE1.TXT
1 a
2 b
3 c
FILE2.TXT
1 002
2 x90
5 d07
加入例如
join -1 1 -2 1 -o 1.1,1.2,2.2 file1.txt file2.txt
这里
加入从file1.field1加入到file2.field2和输出使用-o标志指定的字段
here join is joining from file1.field1 to file2.field2 and outputting the fields specified with the -o flag
输出
1 a 002
2 b x90
这篇关于匹配两个巨大的CSV文件之间的共同标识的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!