Unix排序实用程序:使用十六进制字节值作为分隔符 [英] Unix sort utility: use hexadecimal byte value as delimiter

查看:298
本文介绍了Unix排序实用程序:使用十六进制字节值作为分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否可以使用十六进制值作为Unix sort实用程序的定界符. 基本上我想做类似的事情:

I'm wondering if I can use a hexadecimal value as delimiter of the Unix sort utility. Basically I want to do something like:

sort -t '\x00' <input

但是如果我按照上面的方式操作,它是行不通的.

But it doesn't work if I do it in the way above.

推荐答案

如果您阅读了GNU sort手册,则会发现:

If you read the GNU sort manual, you will find:

-t separator--field-separator=separator

在每个字符中找到排序键时,请使用字符分隔符作为字段分隔符 线.默认情况下,字段之间用空字符串分隔,非空白 字符和空白字符.默认情况下,空格是空格或制表符,但是 LC_CTYPE语言环境可以更改此设置. 也就是说,给定输入行foo bar,排序将其分为字段foobar.字段分隔符不被视为前一个字段的一部分 或后面的字段,因此对于sort -t " ",同一输入行具有三个 字段:一个空字段,"foo"和"bar".但是,扩展到结尾的字段 的行(如-k 2)或由范围组成的字段(如-k 2,3)保留该字段 范围端点之间存在分隔符. 要将ASCII nul指定为字段分隔符,请使用两个字符的字符串\0, 例如sort -t ’\0’.

Use character separator as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-blank character and a blank character. By default a blank is a space or a tab, but the LC_CTYPE locale can change this. That is, given the input line foo bar, sort breaks it into fields foo and bar. The field separator is not considered to be part of either the field preceding or the field following, so with sort -t " " the same input line has three fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end of the line, as -k 2, or fields consisting of a range, as -k 2,3, retain the field separators present between the endpoints of the range. To specify ASCII nul as the field separator, use the two-character string \0, e.g., sort -t ’\0’.

这与旧版本(GNU CoreUtils 5.97)sort兼容.

This worked with old (GNU CoreUtils 5.97) sort.

在Linux上似乎没有办法做到这一点.

There does not seem to be a way to do it on Linux. I've tried a number of tricks to get a NUL (0x00) byte into the delimiter, and the sort command complains:

sort: empty tab

在键入命令行时,您不能使用 Control-V @ 来执行此操作;外壳(bash)不喜欢这样.

You can't do it with Control-V @ as you are typing the command line; the shell (bash) does not like that.

我有一个程序genchar,该程序将字节写入输出,所以我尝试了:

I have a program genchar that writes bytes to output, so I tried:

sort -t "$(genchar 0)" ...

那也不起作用;我收到了sort的错误.

And that did not work either; I got the error from sort.

$ genchar 0 | od -c
0000000  \0  \n
0000002
$

如果您能够使用control-A,那么就没有问题.

If you were able to use control-A instead, then there'd be no problem.

请注意,sort不会在'-t'选项参数中扩展十六进制转义序列;您必须提供要使用的实际字节.您也可能无法将换行符用作字段定界符;如果这样做的话,记录分隔符是什么?

Note that sort does not expand hex escape sequences in the '-t' option argument; you have to supply the actual byte you want to use. You probably can't use newline as a field delimiter, either; if you did, what would the record delimiter be?

GNU'sort'(无论如何来自CoreUtils 5.97;当前版本为8.12-从2011-04-26开始)确实支持-z选项:

GNU 'sort' (from CoreUtils 5.97, at any rate; the current version is 8.12 - as of 2011-04-26) does support a -z option:

  • -z--zero-terminated带有0字节而不是换行符的结束行
  • -z, --zero-terminated end lines with 0 byte, not newline

可悲的是,这不是您要寻找的东西.

This is not, sadly, what you are looking for.

这篇关于Unix排序实用程序:使用十六进制字节值作为分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆