如何用awk列出所有非ASCII字节? [英] How to list all non ascii bytes with awk?

查看:90
本文介绍了如何用awk列出所有非ASCII字节?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是Google云端硬盘上的测试文件.

  export LC_ALL = Cawk'BEGIN {FS ="} {for(i = 1; i< = NF; ++ i)if($ i〜/[^ \ x00- \ x7f]/)printf(%d%c \ n,i,$ i)}'测试146147148150151152185186187190191192 

如何修复我的代码?

解决方案

我在UTF8 shell中:

  $语言环境LANG = zh_CN.UTF-8... 

那么首先:

  $ export LC_ALL = C 

然后:

  $ awk -F'''#在字段中拆分记录BEGIN {for(n = 0; n <256; n ++)#迭代所有值ord [sprintf(%c",n)] = n}#进行哈希处理ord [char] = n{for(i = 1; i< = NF; i ++)#迭代所有字段if(ord [$ i]> 127)#7f之后打印ord [$ i]}#打印n(值)' 测试 

输出:

  226128156226128157226128156226128157 

以十六进制表示的是

  e2809c... 

Here is the test file on google drive.

sample :test file

I want to list all bytes non ascii byte which beyond \x00-\x7f with awk in the test file.
There are 12 bytes beyond \x00-\x7f.

It is my try.

awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)print i,$i}'  test
146 "
148 "
181 "
184 "

awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %x \n", i,$i)}'  test
146 0 
148 0 
181 0 
184 0

Failed,how to list all the 12 bytes in the file as below format.

146  e2
147  80
148  9c
150  e2
151  80
152  9d
185  e2
186  80
187  9c
190  e2
191  80
192  9d

export LC_ALL=C
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %c\n",i,$i)}'  test
146 
147 �
148 �
150 
151 �
152 �
185 
186 �
187 �
190 
191 �
192 �

How to fix my code?

解决方案

I'm in a UTF8 shell:

$ locale
LANG=en_US.UTF-8
...

so first:

$ export LC_ALL=C

Then:

$ awk -F '' '                         # split record in fields
BEGIN { for(n=0;n<256;n++)            # iterate all values
            ord[sprintf("%c",n)]=n }  # make a hash ord[char]=n
      { for(i=1;i<=NF;i++)            # iterate all fields
            if(ord[$i]>127)           # beyond 7f
                print ord[$i] }       # print n (value)
' test

Outputs:

226
128
156
226
128
157
226
128
156
226
128
157

which in hex would be:

e2
80
9c
...

这篇关于如何用awk列出所有非ASCII字节?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆