一个奇怪的文字从工作停止awk命令 [英] An odd text stops awk command from working

查看:128
本文介绍了一个奇怪的文字从工作停止awk命令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 AWK 命令数与年初相同...行

例如, try1.txt ,该文本是:

  B:C
B:C

当我在终端推出以下命令:

 的awk -F':''$ 1 ==B{一[$​​ 2] ++} END {为(我的)打印我,一个[我]}'try1.txt

返回 C 2 这是很好的,因为 B:C 中出现两次try1.txt

我的工具的输出是一个巨大的 output.txt的,不是 try1.txt 要复杂得多。 output.txt的的某些部分包含下列字符:

  ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ 137

据系统由系统写入时一个进程被杀死。我与该确定。不过,我意识到它停止 AWK 从运作良好。例如,在 try2.txt 如下:

  B:C
^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ 137
B:C

命令的awk -F':''$ 1 ==B{一[$​​ 2] ++} END {为(我的)打印我,一个[I] }'try2.txt 收益的C 1 。这就是说,它停止时它满足了奇数行 ^ @ ^ @ ^ @ ^ @ ^ @

我不知道该如何防止系统写奇数行 ^ @ ^ @ ^ @ ^ @ ^ @ ,所以没有人知道如何修改 AWK 命令解决方法吗?

编辑:看来, ^ @ 我发现我的 output.txt的是不正常的字符 ^ @ 。以下是的屏幕截图output.txt的的一部分,在的Emacs ,其中有麻烦显示:

编辑:的建议,我运行 XXD try2.txt ,它给了:

  0000000:6220 3A20 0000 630A 0000 0000 0000 0000 B:C ...........
0000010:0000 0000 0000 0000 0000 0000 0000 0000 ................
0000020:0000 0000 0000 0000 0031 3337 0a62 203A ......... 137.b:
0000030:2063 0A


解决方案

^ @ 可能是一个二进制的0 / NUL字符的重新presentation:


$头-C10的/ dev /零> 10zero
$猫-v 10zero
^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ $

一些面向文本的实用程序可能将此视为文件的末尾。

所以,因为输入文件是一个二进制文件,你应该有更多的运气首先提取它的文本字符串,只是这些操作:

  $字符串try1.txt | awk的-F':''$ 1 ==B{一[$​​ 2] ++} END {为(我的)打印我,一个[I]}
   C 2
$

字符串命令手册页。(BTW注意,当你谷歌男人的字符串 - 你可能会得到你可能没有讨价还价;-)一些图片)


请注意为好奇 - 究竟我重新创建OP的try1.txt文件我的机器上这样的:


  • 捕捉问题的 XXD 输出到一个名为try1.xxd
  • 文本文件
  • XXD -r try1.xxd> try1.txt 反转正常 XXD 操作

I use awk command to count lines with same beginning...

For instance, in try1.txt, the texts are:

b : c
b : c

When I launch the following command in a terminal:

awk -F ' : ' '$1=="b"{a[$2]++} END{for (i in a) print "  ", i,a[i]}' try1.txt

it returns c 2 which is good, because b : c appears twice in try1.txt.

The output of my tool is a huge output.txt, much more complicated than try1.txt. Some part of output.txt contains the following characters:

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^137

It is systematically written by the system when a process is killed. I am OK with that. However, I realize that it stops awk from working well. For example, in try2.txt as follows:

b : c
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^137
b : c

The command awk -F ' : ' '$1=="b"{a[$2]++} END{for (i in a) print " ", i,a[i]}' try2.txt returns c 1. That said, it stopped when it met the odd line ^@^@^@^@^@.

I don't know how to keep the system from writing the odd line ^@^@^@^@^@, so does anyone know how to amend awk command to workaround?

Edit: It seems that the ^@ I found in my output.txt is not normal characters ^@. The following is a part of screen shot of output.txt, displayed in Emacs, which has trouble:

Edit: As suggested, I run xxd try2.txt, it gave:

0000000: 6220 3a20 630a 0000 0000 0000 0000 0000  b : c...........
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0031 3337 0a62 203a  .........137.b :
0000030: 2063 0a  

解决方案

^@ is likely a representation of a binary 0 / NUL character:

$ head -c10 /dev/zero > 10zero
$ cat -v 10zero 
^@^@^@^@^@^@^@^@^@^@$ 

Some text-oriented utilities may treat this as an end of file.

So since your input file is a binary file, you should have more luck extracting the text strings from it first and just operating on those:

$ strings try1.txt | awk -F ' : ' '$1=="b"{a[$2]++} END{for (i in a) print "  ", i,a[i]}'
   c 2
$

The strings command man page. (btw watch out when you google "man strings" - you might get some images you might not have bargained for ;-) )


Note for the curious - I recreated the OP's try1.txt file exactly on my machine thus:

这篇关于一个奇怪的文字从工作停止awk命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆