在第二列中找到在第一列中进行选择的概率 [英] Find the probability in 2nd column for a selection in 1st column

查看:47
本文介绍了在第二列中找到在第一列中进行选择的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两栏如下

ifile.dat
1   10
3   34
1   4
3   32
5   3
2   2
4   20
3   13
4   50
1   40
2   20
5   2

我想在第二列中为第一列中的某些选择计算概率.

I would like to calculate the probability in 2nd column for some selection in 1st column.

ofile.dat
1-2   0.417 #Here 1-2 means all values in 1st column ranging from 1 to 2; 
            #0.417 is the probability of corresponding values in 2nd column 
            # i.e. count(10,4,2,40,20)/total = 5/12 
3-4   0.417 #count(34,32,20,13,50)/total = 5/12
5-6   0.167 #count(3,2)/total = 2/12

类似地,如果我用3个数字选择选择范围,那么期望的输出将是

Similarly if I choose the range of selection with 3 number, then the desire output will be

ofile.dat
1-3  0.667
4-6  0.333

RavinderSingh13和James Brown给出了不错的脚本(请参阅答案),但是它们不适用于第一列中大于10的较大值.

RavinderSingh13 and James Brown had given nice scripts (see answer), but these are not working for lager values than 10 in 1st column.

ifile2.txt
10   10
30   34
10   4
30   32
50   3
20   2
40   20
30   13
40   50
10   40
20   20
50   2

推荐答案

:考虑到OP的编辑示例,您可以尝试以下方法.我已经使用OP的第一个和最新的编辑示例成功地对其进行了测试,并且在两个示例中都可以很好地工作.

Considering OP's edited samples could you please try following. I have tested it successfully with OP's 1st and latest edit samples and it worked perfectly fine with both of them.

还有一件事,我提出了这样的解决方案,以使拐角情况"中的范围可能留在打印元素上,以防万一它没有超过最后一行的范围值.类似于OP的第一个示例,其中 range = 2 但最大值为 5 ,因此此处不会保留5.

Also one more thing, I made this solution such that a "corner case" where range could leave printing elements in case it is NOT crossing range value at last lines. Like OP's 1st sample where range=2 but max value is 5 so it will NOT leave 5 in here.

sort -n Input_file |
awk -v range="2" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]?d[$1] OFS:"")$2
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i++){
       num+=split(d[i],array," ")
       if(++j==range){
          start=start?start:1
          printf("%s-%s %.02f\n",start,i,num/tot_element)
          start=i+1
          j=num=""
          delete array
       }
       if(j!="" && i==till){
          printf("%s-%s %.02f\n",start,i,num/tot_element)
       }
    }
  }
'

输出如下.

1-10 0.25
11-20 0.17
21-30 0.25
31-40 0.17
41-50 0.17



:如果您的Input_file没有第二列,请尝试执行以下操作.



In case your Input_file don't have 2nd column then try following.

sort -k1 Input_file |
awk -v range="1" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]?d[$1] OFS:"")$0
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i+=(range+1)){
       for(j=i;j<=i+range;j++){
          num=split(d[c[j]],array," ")
          total+=num
       }
       print i"-"i+range,tot_element?total/tot_element:0
       total=num=""
    }
  }
'



请尝试按照所示的示例进行尝试,编写和测试.



Could you please try following, written and tested with shown samples.

sort -k1 Input_file |
awk -v range="1" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]?d[$1] OFS:"")$2
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i+=(range+1)){
       for(j=i;j<=i+range;j++){
          num=split(d[c[j]],array," ")
          total+=num
       }
       print i"-"i+range,tot_element?total/tot_element:0
       total=num=""
    }
  }
'



如果您不必包含任何 0 值,请尝试执行以下操作.



In case you don't have to include any 0 value then try following.

sort -k1 Input_file |
awk -v range="1" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]!=0?d[$1] OFS:"")$2
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i+=(range+1)){
       for(j=i;j<=i+range;j++){
          num=split(d[c[j]],array," ")
          total+=num
       }
       print i"-"i+range,tot_element?total/tot_element:0
       total=num=""
    }
  }
'

这篇关于在第二列中找到在第一列中进行选择的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆