Unix-如何通过管道转换八进制转义序列 [英] Unix - How to convert octal escape sequences via pipe

查看:108
本文介绍了Unix-如何通过管道转换八进制转义序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从文件中提取数据(在这种情况下为exim邮件日志),通常它以转义的八进制序列(如\ NNN)保存字符,其中"N"表示八进制值0-7.这主要发生在主题使用非拉丁字符(例如阿拉伯语)书写时.

I'm pulling data from a file (in this case an exim mail log) and often it saves characters in an escaped octal sequence like \NNN where 'N' represents an octal value 0-7. This mainly happens when the subject is written in non-Latin characters (Arabic for example).

我的目标是找到最干净的方法来转换这些八进制字符,以便在启用utf-8的终端中正确显示,特别是在较少"的情况下,因为这可能会产生大量输出.

My goal is to find the cleanest way to convert these octal characters to display correctly in my utf-8 enabled terminal, specifically in 'less' as there is the potential for lots of output.

到目前为止,我发现的最佳方法如下:

The best approach I have found so far is as follows:

arbitrary_stream | { while read -r temp; do printf %b "$temp\n"; done } | less

这似乎工作得很好,但是我假设有一些翻译器工具,或者甚至内置在较少"中的标志来处理此问题.我还发现,如果您使用sed之类的东西在每个\之后注入0,则可以将其存储为变量,然后使用'echo -e $ data',但这比以前的解决方案更加混乱.

This seems to work pretty well, however I would assume that there is some translator tool, or maybe even a flag built into 'less' to handle this. I also found that if you use something like sed to inject a 0 after each \, you can store it as a variable, then use 'echo -e $data' however this was more messy than the previous solution.

测试用例:

octalvar="\342\202\254"

期望的输出(小于):

expected output in less:

我正在寻找一种比我上面的解决方案更干净,更完整或更优于以下形式的东西:

I'm looking for something cleaner, more complete or just better than my above solution in the form of either:

echo $octalvar | do_something | less

echo $octalvar | less --some_magic_flag

有什么建议吗?还是我的解决方案像我期望的那么干净?

Any suggestions? Or is my solution about as clean as I can expect?

推荐答案

GNU awk中的转换(用于使用strtonum).事实证明这很麻烦,所以代码很乱,也许可以简化,随时咨询:

Conversion in GNU awk (for using strtonum). It proved out to be a hassle so the code is a mess and maybe could be streamlined, feel free to advice:

awk '{
    while(match($0,/\\[0-8]{3}/)) {  # search for \NNNs
        o=substr($0,RSTART,RLENGTH)  # extract it
        sub(/\\/,"0",o)              # replace \ with 0 for strtonum
        c=sprintf("%c",strtonum(o))  # convert to a character
        sub(/\\[0-8]{3}/,c)          # replace the \NNN with the char
    }
}1' foo > bar

或将单引号之间的代码粘贴到文件above_program.awk中,然后像awk -f above_program.awk foo > bar一样运行它.测试文件foo:

or paste the code between single quotes to a file above_program.awk and run it like awk -f above_program.awk foo > bar. Test file foo:

test 123 \342\202\254

在非UTF8语言环境中运行它,我使用语言环境C:

Run it in a non-UTF8 locale, I used locale C:

$ locale 
...
LC_ALL=C
$ awk -f above_program.awk foo
test 123 €

如果您以UTF8语言环境运行它,则会发生转换:

If you run it a UTF8 locale, conversion will happen:

$ locale
...
LC_ALL=en_US.utf8
$ awk -f above_program.awk foo
test 123 â¬

这篇关于Unix-如何通过管道转换八进制转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆