有条件awk的HashMap的匹配查找 [英] Conditional Awk hashmap match lookup
问题描述
我有2个表格文件。一个文件包含的50个重点值仅名为 lookup_file.txt的映射。
另一个文件具有行30列,数以百万计的实际表格数据。 的data.txt
我想,以取代从 lookup_file.txt值第二个文件的id列。
I have 2 tabular files. One file contains a mapping of 50 key values only called lookup_file.txt. The other file has the actual tabular data with 30 columns and millions of rows. data.txt I would like to replace the id column of the second file with the values from the lookup_file.txt..
我怎样才能做到这一点?我想preFER在bash脚本用awk ..
此外,有没有一个HashMap的数据结构,我可以在bash用于存储50个键/值,而不是另一个文件?
How can I do this? I would prefer using awk in bash script.. Also, Is there a hashmap data-structure i can use in bash for storing the 50 key/values rather than another file?
推荐答案
假设你的文件有逗号分隔的字段和ID列是场3:
Assuming your files have comma-separated fields and the "id column" is field 3:
awk '
BEGIN{ FS=OFS="," }
NR==FNR { map[$1] = $2; next }
{ $3 = map[$3]; print }
' lookup_file.txt data.txt
如果任何这些假设是错误的,线索我们如果修订不明显...
If any of those assumptions are wrong, clue us in if the fix isn't obvious...
编辑:如果你想避免的(恕我直言忽略不计)NR == FNR测试性能的影响,这将是那些每一个罕见病例之一,当使用函数getline是恰当的:
and if you want to avoid the (IMHO negligible) NR==FNR test performance impact, this would be one of those every rare cases when use of getline is appropriate:
awk '
BEGIN{
FS=OFS=","
while ( (getline line < "lookup_file.txt") > 0 ) {
split(line,f)
map[f[1]] = f[2]
}
}
{ $3 = map[$3]; print }
' data.txt
这篇关于有条件awk的HashMap的匹配查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!