将信息从查找文件写入另一个文件 [英] Write information from a lookup file into another file

查看:38
本文介绍了将信息从查找文件写入另一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个包含以下内容的文件目录:

doc1.tsv

 < http://uri.gbv.de/terminology/bk/86.56>< http://uri.gbv.de/terminology/bk/58.28> 

doc2.tsv

 < http://uri.gbv.de/terminology/bk/44.43>< http://uri.gbv.de/terminology/bk/58.28>< http://uri.gbv.de/terminology/bk/44.38> 

此外,还有一个查找文件vocab.tsv,其中包含与数字编码有关的类名称:

 < http://uri.gbv.de/terminology/bk/44.38>药理学< http://uri.gbv.de/terminology/bk/44.43>医学生物学< http://uri.gbv.de/terminology/bk/58.28>药学技术< http://uri.gbv.de/terminology/bk/86.56>Gesundheitsrecht.Lebensmittelrecht 

(定界符应该是一个制表符,但可以未定义.)

如何用各自的类名扩展上面的文件?

结果应如下所示:

doc1.tsv

 < http://uri.gbv.de/terminology/bk/86.56>Gesundheitsrecht.Lebensmittelrecht< http://uri.gbv.de/terminology/bk/58.28>药学技术 

doc2.tsv

 < http://uri.gbv.de/terminology/bk/44.43>医学生物学< http://uri.gbv.de/terminology/bk/58.28>药学技术< http://uri.gbv.de/terminology/bk/44.38>药理学 

到目前为止的优雅方法:

 用于* .tsv中的tsv;做而IFS =''则读-r LINE ||[-n" $ {LINE}"];做newLine = $(grep"$ {LINE}" vocab.tsv)sed -i's/$ {LINE}/$ newLine/g'$ tsv完成<$ tsv完毕 

但是结果完全是胡说八道:

 < http://uri.gbv.de/terminology/bk/< http://uri.gbv.de/terminology/bk/44.43>>< http://uri.gbv.de/terminology/bk/< http://uri.gbv.de/terminology/bk/58.28>>< http://uri.gbv.de/terminology/bk/< http://uri.gbv.de/terminology/bk/44.38>>< http://uri.gbv.de/terminology/bk/44.43>< http://uri.gbv.de/terminology/bk/58.28>< http://uri.gbv.de/terminology/bk/44.38> 

对于初学者:grep命令非常适合bash,当在脚本中运行时,它会剪切类名.

有什么想法吗?

解决方案

部分答案由 Raman Sailopal

awk'FNR == NR {urls [$ 1] = $ 2} FNR!= NR {print $ 1"\ t" urls [$ 1]}'vocab.tsv oc1.tsv

为了对目录中的所有文件执行此操作:

 用于* .tsv中的tsv;做tsv2 = $ {tsv%.tsv} .tsv2awk'FNR == NR {urls [$ 1] = $ 2} FNR!= NR {print $ 1"\ t" urls [$ 1]}'vocab.tsv $ tsv>$ tsv2完毕 

当然,如果不使用.tsv2,它会更加优雅.

There is a directory of files that have the following content:

doc1.tsv

<http://uri.gbv.de/terminology/bk/86.56> 
<http://uri.gbv.de/terminology/bk/58.28>

doc2.tsv

<http://uri.gbv.de/terminology/bk/44.43> 
<http://uri.gbv.de/terminology/bk/58.28> 
<http://uri.gbv.de/terminology/bk/44.38>

Also, there is a lookup file vocab.tsv which contains class names with respect to the numeric coding:

<http://uri.gbv.de/terminology/bk/44.38>        Pharmakologie
<http://uri.gbv.de/terminology/bk/44.43>        Medizinische Mikrobiologie
<http://uri.gbv.de/terminology/bk/58.28>        Pharmazeutische Technologie
<http://uri.gbv.de/terminology/bk/86.56>        Gesundheitsrecht. Lebensmittelrecht

(The delimiter is supposed to be a tab but can be undefined.)

How can the files above be extended with their respective class names?

The result should look like this:

doc1.tsv

<http://uri.gbv.de/terminology/bk/86.56>        Gesundheitsrecht. Lebensmittelrecht 
<http://uri.gbv.de/terminology/bk/58.28>        Pharmazeutische Technologie

doc2.tsv

<http://uri.gbv.de/terminology/bk/44.43>        Medizinische Mikrobiologie 
<http://uri.gbv.de/terminology/bk/58.28>        Pharmazeutische Technologie 
<http://uri.gbv.de/terminology/bk/44.38>        Pharmakologie

The inelegant approach so far:

for tsv in *.tsv ; do

    while IFS='' read -r LINE || [ -n "${LINE}" ]; do
        
        newLine=$(grep "${LINE}" vocab.tsv)

        sed -i 's/${LINE}/$newLine/g' $tsv
    done < $tsv

done

but the result is utter nonsense:

<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/44.43> > 
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/58.28> > 
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/44.38> > 
<http://uri.gbv.de/terminology/bk/44.43> 
<http://uri.gbv.de/terminology/bk/58.28> 
<http://uri.gbv.de/terminology/bk/44.38>

For starters: The grep command, which works perfectly on the bash, cuts the class names when run in the script.

Any ideas?

解决方案

Part of the answer is given by Raman Sailopal

awk 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' vocab.tsv oc1.tsv

In order to do this for all files in the directory:

for tsv in *.tsv ; do

    tsv2=${tsv%.tsv}.tsv2

    awk 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' vocab.tsv $tsv >> $tsv2

done

Of course, it would be more elegant without segwaying to .tsv2.

这篇关于将信息从查找文件写入另一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆