从第二列中删除不是5个连续数字的所有字段 [英] remove all fields from 2nd col which is not 5 consecutive numerical digits
问题描述
Record | RegistrationID
41-1|10551
1-105|5569
4-7|10043
78-3|2176
3-1|19826
12-1|1981
输出文件必须
Output file has to
Record | RegistrationID
1-1|10551
3-1|19826
5-7|10043
我的文件是一个管道分隔
第二栏中的任何数字小于或等于5长度必须被删除,即只有有5个连续号码的记录必须保留。我用谷歌一小时来解决这个问题,任何建议都是非常可观的。在此先感谢
any number in the 2nd col which is less than or more than 5lenght must be removed i.e only records that have 5 consecutive numbers must remain.I'm with google since an hour to fix this out any advice given would be highly appreciable. thanks in advance
试过这个grep -E'[0-9] {5} $ | $'文件名 - >没有得到任何结果,tx to cyrus
tried this grep -E ' [0-9]{5}$|$' filename - > not getting any results ,tx to cyrus
推荐答案
如果这样做没有做到你想要的:
If this doesn't do what you want:
$ awk '(NR==1) || ($NF~/^[0-9]{5}$/)' file
Acno | Zip
high | 12345
tyty | 19812
然后您的真实输入文件与您在示例中提供的格式不匹配,如果您需要更多帮助,d必须自己跟进以找出差异并发布更真实有代表性的示例输入。
then your real input file simply does not match the format that you provided in your example and you'd have to follow up on that yourself to figure out the difference and post more truly representative sample input if you want more help.
给定更新后的输入文件, |
s:
Given your updated input file with no spaces around the |
s:
$ awk -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
Acno | Zip
45775-1|10551
2734455-7|10043
167115-1|19826
如果你的输入中真的有空白,你想从容易完成的输出中删除,但我现在假设你实际上并没有这种情况,它只是更多的错误在您的发布示例输入文件。
If you REALLY have leading white space in your input that you want to remove from the output that's easily done but I'm going to assume for now that you actually don't really have that situation and it's just more mistakes in your posted sample input file.
随着OP的gawk 3.1.7(见下面的评论):
With gawk 3.1.7 as the OP has (see comments below):
awk --re-interval -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
这篇关于从第二列中删除不是5个连续数字的所有字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!