如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序? [英] How to use a Unix sort command to sort by human-readable numeric file size in a column?

查看:39
本文介绍了如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题现已得到解答 - 滚动到本文末尾以获取解决方案.

如果答案已经在这里,我很抱歉,但到目前为止我找到的所有答案都建议使用 -h 标志或 -n 标志,而这些都不适合我......

我从 curl 命令获得了一些输出,该命令为我提供了几列数据.其中一列是人类可读的文件大小(1.6mb"、4.3gb"等).

我使用 unix sort 命令按相关列排序,但它似乎试图按字母顺序而不是数字顺序排序.我试过同时使用 -n 和 -h 标志,但尽管它们确实改变了顺序,但在这两种情况下,顺序在数字上都不正确.

我在 CentOS Linux 机器上,版本 7.2.1511.我的 sort 版本是sort (GNU coreutils) 8.22".

我曾尝试在以下不同格式中使用 -h 标志:

curl localhost:9200/_cat/indices |排序 -k9,9h |头-n5卷曲本地主机:9200/_cat/indices |排序 -k9 -h |头-n5卷曲本地主机:9200/_cat/indices |排序 -k 9 -h |头-n5卷曲本地主机:9200/_cat/indices |排序 -k9h |头-n5

我总是得到这些结果:

绿色开放索引A 5 1 0 0 1.5kb 800b绿色开放指数B 5 1 9823178 2268791 152.9gb 76.4gb绿色开放指数C 5 1 35998 7106 364.9mb 182.4mb绿色开放索引D 5 1 108 11 387.1kb 193.5kb绿色开放索引E 5 1 0 0 1.5kb 800b

我尝试使用与上述相同格式的 -n 标志:

curl localhost:9200/_cat/indices |排序 -k9,9n |头-n5卷曲本地主机:9200/_cat/indices |排序 -k9 -n |头-n5卷曲本地主机:9200/_cat/indices |排序 -k 9 -n |头-n5卷曲本地主机:9200/_cat/indices |排序 -k9n |头-n5

我总是得到这些结果:

green open index1 5 1 1021 0 3.2mb 1.6mb绿色开放索引2 5 1 8833 0 4.1mb 2mb绿色开放指数3 5 1 4500 0 5mb 2.5mb绿色开放索引4 1 0 3 0 3.9kb 3.9kb绿色开放指数5 3 1 2516794 0 8.6gb 4.3gb

原来有两个问题:

1) sort 希望看到大写的单个字母 - M、K 和 G 而不是 mb、kb 和 gb(对于字节,您可以留空).

2) sort 将包含前导空格,除非您明确排除它们,这会扰乱排序.

解决方案是用大写替换小写,并使用 -b 标志使排序忽略前导空格(我的答案基于下面@Vinicius 的解决方案,因为如果您不知道,它更容易阅读正则表达式):

curl localhost:9200/_cat/indices |tr '[kmg]b' '[KMG] ' |排序 -k9hb

解决方案

你的 'm' 和 'g' 单位应该是大写的.GNU sort 手册阅读:

<块引用>

-h --human-numeric-sort --sort=human-numeric

按数字排序,首先按数字符号(负、零或正);然后按 SI 后缀(空,或‘k’或‘K’,或‘MGTPEZY’之一,按该顺序;参见块大小);最后是数值.

您可以像这样使用 GNU sed 更改 curl 的输出:

curl localhost:9200/_cat/indices \|sed 's/[0-9][mgtpezy]/\U&/g'|排序 -k9,9h \|头-n5

产量:

green open index4 1 0 3 0 3.9kb 3.9kb绿色开放指数1 5 1 1021 0 3.2Mb 1.6Mb绿色开放索引2 5 1 8833 0 4.1Mb 2Mb绿色开放索引3 5 1 4500 0 5Mb 2.5Mb绿色开放索引5 3 1 2516794 0 8.6Gb 4.3Gb

其他字母如b"将被视为无单位":

绿色开放索引A 5 1 0 0 1.5kb 800b绿色开放索引E 5 1 0 0 1.5kb 800b绿色开放索引D 5 1 108 11 387.1kb 193.5kb绿色开放指数C 5 1 35998 7106 364.9Mb 182.4Mb绿色开放索引B 5 1 9823178 2268791 152.9Gb 76.4Gb

如果需要,您可以通过管道将排序输出中的单位改回小写字母 sed 's/[0-9][MGTPEZY]/\L&/g'

This question now answered - scroll to the end of this post for the solution.

Apologies if the answer is already here, but all the answers I have found so far suggest either the -h flag or the -n flag, and neither of those are working for me...

I have some output from a curl command that is giving me several columns of data. One of those columns is a human-readable file size ("1.6mb", "4.3gb" etc).

I am using the unix sort command to sort by the relevant column, but it appears to be trying to sort alphabetically instead of numercially. I have tried using both the -n and the -h flags, but although they do change the order, in neither case is the order numerically correct.

I am on CentOS Linux box, version 7.2.1511. The version of sort I have is "sort (GNU coreutils) 8.22".

I have tried using the -h flag in these different formats:

curl localhost:9200/_cat/indices | sort -k9,9h | head -n5
curl localhost:9200/_cat/indices | sort -k9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k9h | head -n5

I always get these results:

green open indexA            5 1        0       0   1.5kb    800b
green open indexB            5 1  9823178 2268791 152.9gb  76.4gb
green open indexC            5 1    35998    7106 364.9mb 182.4mb
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexE            5 1        0       0   1.5kb    800b

I have tried using the -n flag in the same formats as above:

curl localhost:9200/_cat/indices | sort -k9,9n | head -n5
curl localhost:9200/_cat/indices | sort -k9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k9n | head -n5

I always get these results:

green open index1      5 1     1021       0   3.2mb   1.6mb
green open index2      5 1     8833       0   4.1mb     2mb
green open index3      5 1     4500       0     5mb   2.5mb
green open index4      1 0        3       0   3.9kb   3.9kb
green open index5      3 1  2516794       0   8.6gb   4.3gb

Edit: It turned out there were two problems:

1) sort expects to see capital single letters - M, K and G instead of mb, kb and gb (for bytes you can just leave blank).

2) sort will include leading spaces unless you explicitly exclude them, which messes with the ordering.

The solution is to replace lower case with upper case and use the -b flag to make sort ignore leading spaces (I've based this answer on @Vinicius' solution below, because it's easier to read if you don't know regex):

curl localhost:9200/_cat/indices | tr '[kmg]b' '[KMG] ' | sort -k9hb

解决方案

Your 'm' and 'g' units should be uppercase. GNU sort manual reads:

-h --human-numeric-sort --sort=human-numeric

Sort numerically, first by numeric sign (negative, zero, or positive); then by SI suffix (either empty, or ‘k’ or ‘K’, or one of ‘MGTPEZY’, in that order; see Block size); and finally by numeric value.

You can change the output of curl with GNU sed like this:

curl localhost:9200/_cat/indices \
| sed 's/[0-9][mgtpezy]/\U&/g'
| sort -k9,9h \
| head -n5

Yields:

green open index4      1 0        3       0   3.9kb   3.9kb
green open index1      5 1     1021       0   3.2Mb   1.6Mb
green open index2      5 1     8833       0   4.1Mb     2Mb
green open index3      5 1     4500       0     5Mb   2.5Mb
green open index5      3 1  2516794       0   8.6Gb   4.3Gb

Other letters like "b" will be treated as "no unit":

green open indexA            5 1        0       0   1.5kb    800b
green open indexE            5 1        0       0   1.5kb    800b
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexC            5 1    35998    7106 364.9Mb 182.4Mb
green open indexB            5 1  9823178 2268791 152.9Gb  76.4Gb

If so desired, you can change the units in the sorted output back to lowercase by piping to sed 's/[0-9][MGTPEZY]/\L&/g'

这篇关于如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆