如何在gnuplot中对boxplot离群值进行分组 [英] How to group boxplot outliers in gnuplot

查看:169
本文介绍了如何在gnuplot中对boxplot离群值进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量的数据点。我尝试用箱线图绘制它们,但是某些离群值是完全相同的值,它们在彼此相邻的线上表示。我发现了如何设置离群值之间的水平距离

I have a large set of data points. I try to plot them with a boxplot, but some of the outliers are the exact same value and they are represented on a line beside each other. I found How to set the horizontal distance between outliers in gnuplot boxplot, but it doesn't help too much, as it is apparently not possible.

是否可以将异常值分组在一起,打印一个?点,然后在方括号内打印一个数字,以指示有多少点?我认为这将使它在图形中更具可读性。

Is it possible to group the outliers together, print one point and then print a number in brackets beside it to indicate how many points there are? I think this would make it more readable in a graph.

有关信息,我对一个x值有3个箱形图,而在一个图形中乘以6。我正在使用gnuplot 5,并且已经使用了点大小,这不再减小距离。
希望您能提供帮助!

For information, I have three boxplots for one x value and that times six in one graph. I am using gnuplot 5 and already played around with the pointsize, which doesn't reduce the distance anymore. I hope you can help!

编辑:

set terminal pdf
set output 'dat.pdf'
file0 = 'dat1.dat'
file1 = 'dat2.dat'
file2 = 'dat3.dat'
set pointsize 0.2
set notitle
set xlabel 'X'
set ylabel 'Y'
header = system('head -1 '.file0);
N = words(header)

set xtics ('' 1)
set for [i=1:N] xtics add (word(header, i) i)

set style data boxplot
plot file0 using (1-0.25):1:(0.2) with boxplot lw 2 lc rgb '#8B0000' fs pattern 16 title 'A'
plot file1 using (1):1:(0.2) with boxplot lw 2 lc rgb '#00008B' fs pattern 4 title 'B'
plot file2 using (1+0.25):1:(0.2) with boxplot lw 2 lc rgb '#006400' fs pattern 5 title 'C'
for [i=2:N] plot file0 using (i-0.25):i:(0.2) with boxplot lw 2 lc rgb '#8B0000' fs pattern 16 notitle
for [i=2:N] plot file1 using (i):i:(0.2) with boxplot lw 2 lc rgb '#00008B' fs pattern 4 notitle
for [i=2:N] plot file2 using (i+0.25):i:(0.2) with boxplot lw 2 lc rgb '#006400' fs pattern 5 notitle

在已有此代码的情况下实现它的最佳方法是什么?

What is the best way to implement it with this code already in place?

推荐答案

没有选项可以自动完成此操作。在gnuplot中手动执行此操作所需的步骤是:

There is not option to have this done automatically. Required steps to do this manually in gnuplot are:

(在下面,我假设数据文件 data.dat 只有一列。)

(In the following I assume, that the data file data.dat has only a single column.)


  1. 使用 stats 确定异常值的边界:

stats 'data.dat' using 1
range = 1.5 # (this is the default value of the `set style boxplot range` value)
lower_limit = STATS_lo_quartile - range*(STATS_up_quartile - STATS_lo_quartile)
upper_limit = STATS_up_quartile + range*(STATS_up_quartile - STATS_lo_quartile)


  • 仅计算离群值并将其写入临时文件

  • Count only the outliers and write them to a temporary file

    set table 'tmp.dat'
    plot 'data.dat' using 1:($1 > upper_limit || $1 < lower_limit ? 1 : 0) smooth frequency
    unset table
    


  • 绘制箱形图,不包含离群值,并包含标签的打印样式:

    set style boxplot nooutliers
    plot 'data.dat' using (1):1 with boxplot,\
         'tmp.dat' using (1):($2 > 0 ? $1 : 1/0):(sprintf('(%d)', int($2))) with labels offset 1,0 left point pt 7
    


  • 这需要对每个箱形图进行。

    And this needs to be done for every single boxplot.

    免责声明:此过程应该基本上可以工作,但是没有示例数据,我无法对其进行测试。

    Disclaimer: This procedure should work basically, but having no example data I couldn't test it.

    这篇关于如何在gnuplot中对boxplot离群值进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆