从第二列中删除不是 5 个连续数字的所有字段 [英] remove all fields from 2nd col which is not 5 consecutive numerical digits
问题描述
Record | RegistrationID
41-1|10551
1-105|5569
4-7|10043
78-3|2176
3-1|19826
12-1|1981
输出文件必须
Record | RegistrationID
1-1|10551
3-1|19826
5-7|10043
我的文件是一个管道分隔
必须删除第二列中小于或大于 5lenght 的任何数字,即只有具有 5 个连续数字的记录必须保留.我与谷歌合作了一个小时来解决这个问题,任何给出的建议都非常值得赞赏.提前致谢
any number in the 2nd col which is less than or more than 5lenght must be removed i.e only records that have 5 consecutive numbers must remain.I'm with google since an hour to fix this out any advice given would be highly appreciable. thanks in advance
尝试了这个 grep -E ' [0-9]{5}$|$' 文件名 -> 没有得到任何结果,tx 到 cyrus
tried this grep -E ' [0-9]{5}$|$' filename - > not getting any results ,tx to cyrus
推荐答案
如果这不符合您的要求:
If this doesn't do what you want:
$ awk '(NR==1) || ($NF~/^[0-9]{5}$/)' file
Acno | Zip
high | 12345
tyty | 19812
那么您的真实输入文件与您在示例中提供的格式不匹配,如果您需要更多帮助,您必须自己跟进以找出差异并发布更真实的代表性示例输入.
then your real input file simply does not match the format that you provided in your example and you'd have to follow up on that yourself to figure out the difference and post more truly representative sample input if you want more help.
给定您更新的输入文件,|
s 周围没有空格:
Given your updated input file with no spaces around the |
s:
$ awk -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
Acno | Zip
45775-1|10551
2734455-7|10043
167115-1|19826
如果您的输入中真的有前导空格,您想从输出中删除这很容易完成,但我现在假设您实际上并没有真正遇到这种情况,只是您的错误更多已发布示例输入文件.
If you REALLY have leading white space in your input that you want to remove from the output that's easily done but I'm going to assume for now that you actually don't really have that situation and it's just more mistakes in your posted sample input file.
使用 OP 中的 gawk 3.1.7(见下面的评论):
With gawk 3.1.7 as the OP has (see comments below):
awk --re-interval -F'|' '(NR==1) || ($NF~/^[0-9]{5}$/)' file
这篇关于从第二列中删除不是 5 个连续数字的所有字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!