从列unix提取数据 [英] Extracting data from columns unix
问题描述
我有多个列的文件,我想从特定列读取值。我可以使用 awk {print $ column_number}
读取一列。
每个文件具有不同的列长度,即一些可能为1000个条目,其他可能只有2个,依此类推。条目本身范围从1位数到最多5位数。这对所有文件都是一样的。
我想计算重复值最多的范围。例如,如果该列显示为:
5
93
201
2002
20003
20005
20087
31450
31451
31452
31458
52400
52428
那么我想将 31,400
存储为最重复的值, $ c> 20,000 和 52,000
作为第二大和第三重复的值,等等。你可以说,我把值四舍五入,看到最重复的数字,如果这是有道理的。这些值(最重复的,第二次重复的)可以被认为是100的倍数。因此基本上代码应该看起来像这样:
for f in ls path-to-the-files / *
do
同时阅读i
do
<值>
done
这可能适用于您:
sed's / .\?。$ //; s / ^ $ / 0 /; s /。 $ /,& 00 /; s / ^,/ 0,/'file |排序| uniq -c | sort -nr
4 31,400
3 20,000
2 52,400
2 0,000
1 2,000
1 0,200
如果您对,
格式使用不感兴趣:
sed's / .\?。$ //; s / $ / 00 /; s / ^ 00 $ / 0 /排序| uniq -c | sort -nr
4 31400
3 20000
2 52400
2 0
1 2000
1 200
I have files with multiple columns and I want to read values from a particular column. I can read a column using awk{print $column_number}
.
Each file has different lengths of columns i.e. some might be 1000 entries long and others might be just 2 and so on. The entries themselves range from 1 digit up to a maximum of 5 digits. This is the same for all the files.
I want to count the range of the most repeated values. For example if the column reads:
5
93
201
2002
20003
20005
20087
31450
31451
31452
31458
52400
52428
then I want to store 31,400
as the most repeated value, then 20,000
and 52,000
as the second most and third most repeated values and so on. You can say that I am rounding up the values to see the most repeated numbers, if that makes sense. These values(the most repeated, the second most repeated) can be considered to be multiples of 100. So basically the code should look something like this:
for f in ls path-to-the-files/*
do
while read i
do
<do the operation to sort and store the values>
done
done
I'd appreciate help with this!
This might work for you:
sed 's/.\?.$//;s/^$/0/;s/.$/,&00/;s/^,/0,/' file | sort | uniq -c | sort -nr
4 31,400
3 20,000
2 52,400
2 0,000
1 2,000
1 0,200
If you're not interested in the ,
format use:
sed 's/.\?.$//;s/$/00/;s/^00$/0/' file | sort | uniq -c | sort -nr
4 31400
3 20000
2 52400
2 0
1 2000
1 200
这篇关于从列unix提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!