从列unix提取数据 [英] Extracting data from columns unix

查看:255
本文介绍了从列unix提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个列的文件,我想从特定列读取值。我可以使用 awk {print $ column_number} 读取一列。



每个文件具有不同的列长度,即一些可能为1000个条目,其他可能只有2个,依此类推。条目本身范围从1位数到最多5位数。这对所有文件都是一样的。



我想计算重复值最多的范围。例如,如果该列显示为:

  5 
93
201
2002
20003
20005
20087
31450
31451
31452
31458
52400
52428

那么我想将 31,400 存储为最重复的值, $ c> 20,000 和 52,000 作为第二大和第三重复的值,等等。你可以说,我把值四舍五入,看到最重复的数字,如果这是有道理的。这些值(最重复的,第二次重复的)可以被认为是100的倍数。因此基本上代码应该看起来像这样:



for f in ls path-to-the-files / *



do

 同时阅读i 

do
<值>
done

解决方案

这可能适用于您:

  sed's / .\?。$ //; s / ^ $ / 0 /; s /。 $ /,& 00 /; s / ^,/ 0,/'file |排序| uniq -c | sort -nr 
4 31,400
3 20,000
2 52,400
2 0,000
1 2,000
1 0,200

如果您对格式使用不感兴趣:

  sed's / .\?。$ //; s / $ / 00 /; s / ^ 00 $ / 0 /排序| uniq -c | sort -nr 
4 31400
3 20000
2 52400
2 0
1 2000
1 200


I have files with multiple columns and I want to read values from a particular column. I can read a column using awk{print $column_number}.

Each file has different lengths of columns i.e. some might be 1000 entries long and others might be just 2 and so on. The entries themselves range from 1 digit up to a maximum of 5 digits. This is the same for all the files.

I want to count the range of the most repeated values. For example if the column reads:

5
93
201
2002
20003
20005
20087
31450
31451
31452
31458
52400
52428

then I want to store 31,400 as the most repeated value, then 20,000 and 52,000 as the second most and third most repeated values and so on. You can say that I am rounding up the values to see the most repeated numbers, if that makes sense. These values(the most repeated, the second most repeated) can be considered to be multiples of 100. So basically the code should look something like this:

for f in ls path-to-the-files/*

do

while read i

    do
    <do the operation to sort and store the values>
done        

done

I'd appreciate help with this!

解决方案

This might work for you:

sed 's/.\?.$//;s/^$/0/;s/.$/,&00/;s/^,/0,/' file | sort | uniq -c | sort -nr 
4 31,400
3 20,000
2 52,400
2 0,000
1 2,000
1 0,200

If you're not interested in the , format use:

sed 's/.\?.$//;s/$/00/;s/^00$/0/' file | sort | uniq -c | sort -nr
  4 31400
  3 20000
  2 52400
  2 0
  1 2000
  1 200

这篇关于从列unix提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆