如何用awk列出所有非ASCII字节? [英] How to list all non ascii bytes with awk?
本文介绍了如何用awk列出所有非ASCII字节?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是Google云端硬盘上的测试文件.
export LC_ALL = Cawk'BEGIN {FS ="} {for(i = 1; i< = NF; ++ i)if($ i〜/[^ \ x00- \ x7f]/)printf(%d%c \ n,i,$ i)}'测试146147148150151152185186187190191192
如何修复我的代码?
解决方案
我在UTF8 shell中:
$语言环境LANG = zh_CN.UTF-8...
那么首先:
$ export LC_ALL = C
然后:
$ awk -F'''#在字段中拆分记录BEGIN {for(n = 0; n <256; n ++)#迭代所有值ord [sprintf(%c",n)] = n}#进行哈希处理ord [char] = n{for(i = 1; i< = NF; i ++)#迭代所有字段if(ord [$ i]> 127)#7f之后打印ord [$ i]}#打印n(值)' 测试
输出:
226128156226128157226128156226128157
以十六进制表示的是
e2809c...
Here is the test file on google drive.
I want to list all bytes non ascii byte which beyond \x00-\x7f with awk in the test file.
There are 12 bytes beyond \x00-\x7f.
It is my try.
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)print i,$i}' test
146 "
148 "
181 "
184 "
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %x \n", i,$i)}' test
146 0
148 0
181 0
184 0
Failed,how to list all the 12 bytes in the file as below format.
146 e2
147 80
148 9c
150 e2
151 80
152 9d
185 e2
186 80
187 9c
190 e2
191 80
192 9d
export LC_ALL=C
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %c\n",i,$i)}' test
146
147 �
148 �
150
151 �
152 �
185
186 �
187 �
190
191 �
192 �
How to fix my code?
解决方案
I'm in a UTF8 shell:
$ locale
LANG=en_US.UTF-8
...
so first:
$ export LC_ALL=C
Then:
$ awk -F '' ' # split record in fields
BEGIN { for(n=0;n<256;n++) # iterate all values
ord[sprintf("%c",n)]=n } # make a hash ord[char]=n
{ for(i=1;i<=NF;i++) # iterate all fields
if(ord[$i]>127) # beyond 7f
print ord[$i] } # print n (value)
' test
Outputs:
226
128
156
226
128
157
226
128
156
226
128
157
which in hex would be:
e2
80
9c
...
这篇关于如何用awk列出所有非ASCII字节?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文