Unix排序实用程序:使用十六进制字节值作为分隔符 [英] Unix sort utility: use hexadecimal byte value as delimiter
问题描述
我想知道是否可以使用十六进制值作为Unix sort
实用程序的定界符.
基本上我想做类似的事情:
I'm wondering if I can use a hexadecimal value as delimiter of the Unix sort
utility.
Basically I want to do something like:
sort -t '\x00' <input
但是如果我按照上面的方式操作,它是行不通的.
But it doesn't work if I do it in the way above.
推荐答案
如果您阅读了GNU sort
手册,则会发现:
If you read the GNU sort
manual, you will find:
-t separator
,--field-separator=separator
在每个字符中找到排序键时,请使用字符分隔符作为字段分隔符
线.默认情况下,字段之间用空字符串分隔,非空白
字符和空白字符.默认情况下,空格是空格或制表符,但是
LC_CTYPE语言环境可以更改此设置.
也就是说,给定输入行foo bar
,排序将其分为字段foo
和
bar
.字段分隔符不被视为前一个字段的一部分
或后面的字段,因此对于sort -t " "
,同一输入行具有三个
字段:一个空字段,"foo"和"bar".但是,扩展到结尾的字段
的行(如-k 2
)或由范围组成的字段(如-k 2,3
)保留该字段
范围端点之间存在分隔符.
要将ASCII nul指定为字段分隔符,请使用两个字符的字符串\0
,
例如sort -t ’\0’
.
Use character separator as the field separator when finding the sort keys in each
line. By default, fields are separated by the empty string between a non-blank
character and a blank character. By default a blank is a space or a tab, but
the LC_CTYPE locale can change this.
That is, given the input line foo bar
, sort breaks it into fields foo
and
bar
. The field separator is not considered to be part of either the field preceding
or the field following, so with sort -t " "
the same input line has three
fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end
of the line, as -k 2
, or fields consisting of a range, as -k 2,3
, retain the field
separators present between the endpoints of the range.
To specify ASCII nul as the field separator, use the two-character string \0
,
e.g., sort -t ’\0’
.
这与旧版本(GNU CoreUtils 5.97)sort
兼容.
This worked with old (GNU CoreUtils 5.97) sort
.
在Linux上似乎没有办法做到这一点.
There does not seem to be a way to do it on Linux. I've tried a number of tricks to get a NUL (0x00) byte into the delimiter, and the sort
command complains:
sort: empty tab
在键入命令行时,您不能使用 Control-V @ 来执行此操作;外壳(bash
)不喜欢这样.
You can't do it with Control-V @ as you are typing the command line; the shell (bash
) does not like that.
我有一个程序genchar
,该程序将字节写入输出,所以我尝试了:
I have a program genchar
that writes bytes to output, so I tried:
sort -t "$(genchar 0)" ...
那也不起作用;我收到了sort
的错误.
And that did not work either; I got the error from sort
.
$ genchar 0 | od -c
0000000 \0 \n
0000002
$
如果您能够使用control-A,那么就没有问题.
If you were able to use control-A instead, then there'd be no problem.
请注意,sort
不会在'-t
'选项参数中扩展十六进制转义序列;您必须提供要使用的实际字节.您也可能无法将换行符用作字段定界符;如果这样做的话,记录分隔符是什么?
Note that sort
does not expand hex escape sequences in the '-t
' option argument; you have to supply the actual byte you want to use. You probably can't use newline as a field delimiter, either; if you did, what would the record delimiter be?
GNU'sort'(无论如何来自CoreUtils 5.97;当前版本为8.12-从2011-04-26开始)确实支持-z
选项:
GNU 'sort' (from CoreUtils 5.97, at any rate; the current version is 8.12 - as of 2011-04-26) does support a -z
option:
-
-z
,--zero-terminated
带有0字节而不是换行符的结束行
-z
,--zero-terminated
end lines with 0 byte, not newline
可悲的是,这不是您要寻找的东西.
This is not, sadly, what you are looking for.
这篇关于Unix排序实用程序:使用十六进制字节值作为分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!