在 linux 中打印数字统计信息的命令行实用程序 [英] command line utility to print statistics of numbers in linux
本文介绍了在 linux 中打印数字统计信息的命令行实用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我经常发现自己的文件每行有一个数字.我最终将其导入 excel 以查看诸如中位数、标准差等内容.
I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.
Linux 中是否有命令行实用程序可以执行相同操作?我通常需要找到平均值、中值、最小值、最大值和标准偏差.
Is there a command line utility in linux to do the same? I usually need to find the average, median, min, max and std deviation.
推荐答案
这对 R 来说是轻而易举的.对于看起来像这样的文件:
This is a breeze with R. For a file that looks like this:
1
2
3
4
5
6
7
8
9
10
使用这个:
R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"
为了得到这个:
V1
Min. : 1.00
1st Qu.: 3.25
Median : 5.50
Mean : 5.50
3rd Qu.: 7.75
Max. :10.00
[1] 3.02765
-q
标志抑制 R 的启动许可和帮助输出-e
标志告诉 R 你将从终端传递一个表达式x
是一个data.frame
- 一个表格,基本上.它是一种容纳多个向量/数据列的结构,如果您只是在单个向量中读取,这有点奇怪.这会影响您可以使用哪些功能.- 有些函数,比如
summary()
,自然而然地容纳了data.frames
.如果x
有多个字段,summary()
将为每个字段提供上述描述性统计信息. - 但是
sd()
一次只能取一个向量,这就是为什么我为那个命令索引x
(x[ , 1]
返回x
的第一列).您可以使用apply(x, MARGIN = 2, FUN = sd)
获取所有列的 SD. - The
-q
flag squelches R's startup licensing and help output - The
-e
flag tells R you'll be passing an expression from the terminal x
is adata.frame
- a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use.- Some functions, like
summary()
, naturally accommodatedata.frames
. Ifx
had multiple fields,summary()
would provide the above descriptive stats for each. - But
sd()
can only take one vector at a time, which is why I indexx
for that command (x[ , 1]
returns the first column ofx
). You could useapply(x, MARGIN = 2, FUN = sd)
to get the SDs for all columns.
这篇关于在 linux 中打印数字统计信息的命令行实用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文