脚本从列基于其他列的值获得最大 [英] script to get the max from column based on other column values

查看：187 发布时间：2016/8/4 9:15:58 python perl bash csv max

本文介绍了脚本从列基于其他列的值获得最大的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一个脚本在csv文件（orig.csv）和输出读取格式化CSV文件（format.csv）

在原稿csv文件将是这样的：

 时间，标签，帧，时隙，SSN，董事会，BT，SRN，LabelFrame，SRNAME，LabelID，完整性，MAX_VAL
2014年3月17日，lableA，1,8,0，SPUB，1，NNN，NNN 1100％，60
2014年3月17日，lableA，2,22,0，GOUC，2，NNN02，NNN02,1,100％
2014年3月17日，lableB，2,8,0，CCCB，2，NNN02，NNN02,1,100％，59
2014年3月17日，lableB，1,2,4，CCCB，1，NNN，NNN 1100％，48
2014年3月17日，lableB，1,0,6，CCCB，1，NNN，NNN 1100％，59
2014年3月17日，lableC，2,6,0，，SCUA，2，NNN02，NNN02,1,100％，55
2014年3月17日，标有，2,4,1，CCCB，2，NNN02，NNN02,1,100％，59
2014年3月17日，标有，0,2,7，CCCB，0，公安部，公安部1100％，46
2014年3月17日，标有，1,4,3，CCCA，1，NNN，NNN 1100％，43
2014年3月17日，lableE，2,2,7-，CCCB，2，NNN02，NNN02,1,100％，58

重新格式化会经过原稿CSV文件，并从中获得的列2（标签）和值的相应最大值的所有唯一名称的列13（MAX_VAL） ，见下面的例子。（EG lableA-E是利益和lableB [59,48,59]的最大感兴趣）我也想，以迎合动态orig.csv文件，其中可能的。

报告的csv文件将是这样的：

 时间，标签，帧，时隙，SSN，董事会，BT，SRN，LabelFrame，SRNAME，LabelID，完整性，MAX_VAL
2014年3月17日，lableA，1,8,0，CCCB，1，NNN，NNN 1100％，60
2014年3月17日，lableB，2,8,0，CCCB，2，NNN02，NNN02,1,100％，59
2014年3月17日，lableC，2,6,0，，SCUA，2，NNN02，NNN02,1,100％，55
2014年3月17日，标有，2,4,1，CCCB，2，NNN02，NNN02,1,100％，59
2014年3月17日，lableE，2,2,7-，CCCB，2，NNN02，NNN02,1,100％，58

注：我是新来的脚本所以不知道是什么在写这最好的语言，但沿着庆典，外壳，PERL的线条在想，但对其他人开放

。

:我这是怎么会看在我的CSV数据拉

 ＆LT;！DOCTYPE HTML＆GT;
＆LT; HTML LANG =ENGT＆;
    ＆LT; HEAD＆GT;
        ＆LT;间的charset =UTF-8＆GT;
        ＆LT;标题＆GT; D3：从CSV文件＆LT装载数据; /标题＆GT;
        ＆LT;脚本类型=文/ JavaScript的SRC =D3 / d3.v3.js＆GT;＆LT; / SCRIPT＆GT;
    ＆LT; /头＆GT;
    ＆LT;身体GT;
        ＆LT;脚本类型=文/ JavaScript的＆GT;            d3.csv（XPU最大load_format1（XPU负载）.csv，下载功能（数据）{
                的console.log（数据）;
            }）;        ＆LT; / SCRIPT＆GT;
    ＆LT; /身体GT;
＆LT; / HTML＆GT;

解决方案

这是一个Perl解决您的问题。它保持数据的哈希值％的数据为具有在 MAX_VAL 的最高值每个标签。它还保留标签的列表中 @labels 持续跟踪新的标签，在遇到他们，这样才能保持相同的顺序输入输出。

我在我的评论说，有一个在您的数据线有13我已经加入code将之视作为零，这是不必要的，如果这是在您的文章的错误一个空列。

 使用严格的;
使用警告;打开我的$ orig_fh，'＆LT;'，'orig.csv'或死亡$ !;
打开我的$ format_fh，'＆GT;'，'format.csv'或死亡$ !;打印$ format_fh标量＆下; $ orig_fh取代; ＃复制标题行我％的数据;
我@labels;而（小于$ orig_fh＆GT）{
  终日啃食;
  我@fields =分流/，/，$ _，-1;
  我（$标签$ MAX_VAL）= @fields [1,12]。
  如果（$存在数据{$标签}）{
    我$prev_max_val = $ {数据标签$} [12] || 0;
    $数据{$}标签= \\ @fields如果$ MAX_VAL和$ MAX_VAL＆GT; $prev_max_val;
  }
  其他{
    $数据{$}标签= \\ @fields;
    推@labels，$标签;
  }
}我的$标签（@labels）{
  打印$ format_fh加入（'，'@ {$ {数据标签$}}），\\ n;
}

输出

 时间，标签，帧，时隙，SSN，董事会，BT，SRN，LabelFrame，SRNAME，LabelID，完整性，MAX_VAL
2014年3月17日，lableA，1,8,0，SPUB，1，NNN，NNN 1100％，60
2014年3月17日，lableB，2,8,0，CCCB，2，NNN02，NNN02,1,100％，59
2014年3月17日，lableC，2,6,0，，SCUA，2，NNN02，NNN02,1,100％，55
2014年3月17日，标有，2,4,1，CCCB，2，NNN02，NNN02,1,100％，59
2014年3月17日，lableE，2,2,7-，CCCB，2，NNN02，NNN02,1,100％，58

I need a script to read in a csv file(orig.csv) and output a reformatted csv file(format.csv)

The orig csv file will look like this:

Time,Label,frame,slot,SSN,Board,BT,SRN,LabelFrame,SRNAME,LabelID,Integrity,MAX_val
2014-03-17,lableA,1,8,0,,SPUB,1,NNN,NNN,1,100%,60
2014-03-17,lableA,2,22,0,,GOUC,2,NNN02,NNN02,1,100%,
2014-03-17,lableB,2,8,0,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableB,1,2,4,,CCCB,1,NNN,NNN,1,100%,48
2014-03-17,lableB,1,0,6,,CCCB,1,NNN,NNN,1,100%,59
2014-03-17,lableC,2,6,0,,SCUA,2,NNN02,NNN02,1,100%,55
2014-03-17,lableD,2,4,1,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableD,0,2,7,,CCCB,0,MPS,MPS,1,100%,46
2014-03-17,lableD,1,4,3,,CCCA,1,NNN,NNN,1,100%,43
2014-03-17,lableE,2,2,7,,CCCB,2,NNN02,NNN02,1,100%,58

The reformatting will go through the orig csv file and get all the unique names from column2(Label) and the corresponding max of the values from column 13(MAX_val), see the example below.(E.G. lableA-E is of interest and for lableB the max of [59,48,59] is of interest) I also want to it to cater for a dynamic orig.csv file where possible.

the reported csv file will look like this:

Time,Label,frame,slot,SSN,Board,BT,SRN,LabelFrame,SRNAME,LabelID,Integrity,MAX_val
2014-03-17,lableA,1,8,0,,CCCB,1,NNN,NNN,1,100%,60
2014-03-17,lableB,2,8,0,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableC,2,6,0,,SCUA,2,NNN02,NNN02,1,100%,55
2014-03-17,lableD,2,4,1,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableE,2,2,7,,CCCB,2,NNN02,NNN02,1,100%,58

Note : i am new to scripting so not sure what is the best language to write this in but was thinking along the lines of bash, shell, perl but open to others.

EDIT:: this is how I would look to pull in my csv data

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>D3: Loading data from a CSV file</title>
        <script type="text/javascript" src="d3/d3.v3.js"></script>
    </head>
    <body>
        <script type="text/javascript">

            d3.csv("XPU max load_format1(XPU load).csv", function(data) {
                console.log(data);
            });

        </script>
    </body>
</html>

解决方案

This is a Perl solution to your problem. It keeps a hash %data of the data for each label that has the highest value in for MAX_val. It also keeps a list of labels in @labels that keeps track of new labels as they are encountered, so as to keep the output in the same order as the input.

As I said in my comment, there is a line in your data that has an empty column 13. I have added code to treat this as zero, which is unnecessary if that is an error in your post.

use strict;
use warnings;

open my $orig_fh,   '<', 'orig.csv'   or die $!;
open my $format_fh, '>', 'format.csv' or die $!;

print $format_fh scalar <$orig_fh>; # Copy header line

my %data;
my @labels;

while (<$orig_fh>) {
  chomp;
  my @fields = split /,/, $_, -1;
  my ($label, $max_val) = @fields[1,12];
  if ( exists $data{$label} ) {
    my $prev_max_val = $data{$label}[12] || 0;
    $data{$label} = \@fields if $max_val and $max_val > $prev_max_val;
  }
  else {
    $data{$label} = \@fields;
    push @labels, $label;
  }
}

for my $label (@labels) {
  print $format_fh join(',', @{ $data{$label} }), "\n";
}

output

Time,Label,frame,slot,SSN,Board,BT,SRN,LabelFrame,SRNAME,LabelID,Integrity,MAX_val
2014-03-17,lableA,1,8,0,,SPUB,1,NNN,NNN,1,100%,60
2014-03-17,lableB,2,8,0,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableC,2,6,0,,SCUA,2,NNN02,NNN02,1,100%,55
2014-03-17,lableD,2,4,1,,CCCB,2,NNN02,NNN02,1,100%,59
2014-03-17,lableE,2,2,7,,CCCB,2,NNN02,NNN02,1,100%,58

这篇关于脚本从列基于其他列的值获得最大的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

脚本从列基于其他列的值获得最大 [英] script to get the max from column based on other column values

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

脚本从列基于其他列的值获得最大 [英] script to get the max from column based on other column values

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭