通过bash工具从文本文件中提取列范围 [英] Extracting column range from text file via bash tool

查看:84
本文介绍了通过bash工具从文本文件中提取列范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假定一个文本文件(file1)包含多行字母字符串,每行之后都有一个短的字母数字字符串,用作条形码.字母字符串的长度均相同,前面的字母数字字符串的长度均不相同.字母和字母数字字符串在每行中用空格分隔.

Assume a text file (file1) that contains multiple lines of alphabetic strings, each preceded by a short alphanumeric string that acts as a barcode. The alphabetic strings are all identic in length, the preceding alphanumeric ones are not. Alphabetic and alphanumeric strings are separated by a whitespace in each line.

$ cat file1
a1 abcdefghijklmnopqrstuvwxyz
b27 abcdefghijklmnopqrstuvwxyz
c4 abcdefghijklmnopqrstuvwxyz

假定第二个文件(file2)包含有关列范围的信息.此范围始终小于字母字符串.

Assume a second file (file2) that contains information on a column range. This range is always smaller than the alphabetic string.

$ cat file2
2-13

我正在尝试开发bash代码,以从file1中的字母字符串中提取file2中指定的列范围,同时保留条形码.

I am trying to develop bash code that extracts the column range specified in file2 from the alphabetic strings in file1, while maintaining the barcodes.

$ sought_command file1 file2
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm

我不确定哪个bash动力工具在这方面会有所帮助,但假定awk将是可以做到这一点的工具.

I am uncertain which bash power tool would be helpful in this regard, but presume that awk will be the tool that could do this.

注意:我知道关于此任务,用Python编写的代码可能是最容易的,这就是我所做的.但是,我发现我的Python实现速度过慢,因为要处理的字母字符串长成千上万个字符.因此,我特意尝试使用bash工具解决此问题.

推荐答案

$ awk 'NR==FNR{start=$1;lgth=$2;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklmn
b27 bcdefghijklmn
c4 bcdefghijklmn

或者如果第二个字段是结束位置而不是长度:

or if the 2nd field is the end position rather than the length:

$ awk 'NR==FNR{start=$1;lgth=$2-$1+1;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm

这篇关于通过bash工具从文本文件中提取列范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆