Unix-如何通过管道转换八进制转义序列 [英] Unix - How to convert octal escape sequences via pipe
问题描述
我正在从文件中提取数据(在这种情况下为exim邮件日志),通常它以转义的八进制序列(如\ NNN)保存字符,其中"N"表示八进制值0-7.这主要发生在主题使用非拉丁字符(例如阿拉伯语)书写时.
I'm pulling data from a file (in this case an exim mail log) and often it saves characters in an escaped octal sequence like \NNN where 'N' represents an octal value 0-7. This mainly happens when the subject is written in non-Latin characters (Arabic for example).
我的目标是找到最干净的方法来转换这些八进制字符,以便在启用utf-8的终端中正确显示,特别是在较少"的情况下,因为这可能会产生大量输出.
My goal is to find the cleanest way to convert these octal characters to display correctly in my utf-8 enabled terminal, specifically in 'less' as there is the potential for lots of output.
到目前为止,我发现的最佳方法如下:
The best approach I have found so far is as follows:
arbitrary_stream | { while read -r temp; do printf %b "$temp\n"; done } | less
这似乎工作得很好,但是我假设有一些翻译器工具,或者甚至内置在较少"中的标志来处理此问题.我还发现,如果您使用sed之类的东西在每个\之后注入0,则可以将其存储为变量,然后使用'echo -e $ data',但这比以前的解决方案更加混乱.
This seems to work pretty well, however I would assume that there is some translator tool, or maybe even a flag built into 'less' to handle this. I also found that if you use something like sed to inject a 0 after each \, you can store it as a variable, then use 'echo -e $data' however this was more messy than the previous solution.
测试用例:
octalvar="\342\202\254"
期望的输出(小于):
€
expected output in less:
€
我正在寻找一种比我上面的解决方案更干净,更完整或更优于以下形式的东西:
I'm looking for something cleaner, more complete or just better than my above solution in the form of either:
echo $octalvar | do_something | less
或
echo $octalvar | less --some_magic_flag
有什么建议吗?还是我的解决方案像我期望的那么干净?
Any suggestions? Or is my solution about as clean as I can expect?
推荐答案
GNU awk中的转换(用于使用strtonum
).事实证明这很麻烦,所以代码很乱,也许可以简化,随时咨询:
Conversion in GNU awk (for using strtonum
). It proved out to be a hassle so the code is a mess and maybe could be streamlined, feel free to advice:
awk '{
while(match($0,/\\[0-8]{3}/)) { # search for \NNNs
o=substr($0,RSTART,RLENGTH) # extract it
sub(/\\/,"0",o) # replace \ with 0 for strtonum
c=sprintf("%c",strtonum(o)) # convert to a character
sub(/\\[0-8]{3}/,c) # replace the \NNN with the char
}
}1' foo > bar
或将单引号之间的代码粘贴到文件above_program.awk
中,然后像awk -f above_program.awk foo > bar
一样运行它.测试文件foo
:
or paste the code between single quotes to a file above_program.awk
and run it like awk -f above_program.awk foo > bar
. Test file foo
:
test 123 \342\202\254
在非UTF8语言环境中运行它,我使用语言环境C:
Run it in a non-UTF8 locale, I used locale C:
$ locale
...
LC_ALL=C
$ awk -f above_program.awk foo
test 123 €
如果您以UTF8语言环境运行它,则会发生转换:
If you run it a UTF8 locale, conversion will happen:
$ locale
...
LC_ALL=en_US.utf8
$ awk -f above_program.awk foo
test 123 â¬
这篇关于Unix-如何通过管道转换八进制转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!