命令行实用程序打印linux中的数字统计 [英] command line utility to print statistics of numbers in linux
本文介绍了命令行实用程序打印linux中的数字统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我经常发现一个文件每行有一个数字。我最后在excel中导入它来查看像中位数,标准偏差等等。
I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.
在linux中有命令行实用程序做同样吗?我通常需要找到平均值,中位数,最小值,最大值和std偏差。
Is there a command line utility in linux to do the same? I usually need to find the average, median, min, max and std deviation.
推荐答案
一个如下所示的文件:
1
2
3
4
5
6
7
8
9
10
b $ b
使用:
Use this:
R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"
得到这个:
V1
Min. : 1.00
1st Qu.: 3.25
Median : 5.50
Mean : 5.50
3rd Qu.: 7.75
Max. :10.00
[1] 3.02765
编辑添加几个澄清的评论回到这里,不记得一些理由):
Edit to add a couple of clarifying comments (because I came back to this and didn't remember some of the rationale):
-
-q
flag squelling R's startup licensing and help output -
-e
标志告诉R你将传递一个表达式 -
x
是一个data.frame
它是一个容纳多个向量/列数据的结构,如果你只是在一个向量中读取,这是一个有点奇怪。 - 某些函数,如
summary()
,自然包含data.frames
。如果x
有多个字段,summary()
将为每个字段提供上述描述性统计信息。 - 但是
sd()
每次只能使用一个向量,这就是为什么我索引x
对于该命令(x [,1]
返回x
的第一列)。您可以使用apply(x,MARGIN = 2,FUN = sd)
来获取所有列的SD。
- The
-q
flag squelches R's startup licensing and help output - The
-e
flag tells R you'll be passing an expression from the terminal x
is adata.frame
- a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use.- Some functions, like
summary()
, naturally accommodatedata.frames
. Ifx
had multiple fields,summary()
would provide the above descriptive stats for each. - But
sd()
can only take one vector at a time, which is why I indexx
for that command (x[ , 1]
returns the first column ofx
). You could useapply(x, MARGIN = 2, FUN = sd)
to get the SDs for all columns.
这篇关于命令行实用程序打印linux中的数字统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文