Groovy - 从复杂的嵌套地图汇总和建模数据 [英] Groovy - Aggregating and Modelling Data from Complex Nested Maps

查看:122
本文介绍了Groovy - 从复杂的嵌套地图汇总和建模数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  def productAvailability = [
[id: 1,startDate:2014-12-22,endDate:2015-01-02,storeId:1,productId:1,categoryId:1],
[id:4,startDate:2014-12 -24,endDate:2015-01-08,storeId:2,productId:1,categoryId:1],
[id:8,startDate:2014-12-25,endDate:2015 -01-01,storeId:2,productId:3,categoryId:1],
[id:9,startDate:2014-12-22,endDate:2015-01-02,storeId: 1,productId:3,categoryId:1],
[id:10,startDate:2015-01-10,endDate:2015-01-21,storeId:1,productId:1,categoryId: 1]
];

目标是获得像这样的结果:

产品统计数据

 产品ID:1 |可用性指数:15 + 11 + 11 = 37. 
最长可用产品(按先前开始日期排序*先*,然后是商店编号):
1.2014-12-24至2015- (15天)
2.2014-12-22到2015-01-02商店ID 1.(11天)
3. (11天)
产品编号:3 |2015-01-10到2015-01-21可用性指数:7 + 11 = 18.
最长可用产品(按过去开始日期排序*先*然后商店标识):
1.2014-12-22至2015-01- (11天)
2.商店编号2中的2014-12-25到2015-01-01(7天)

商店统计信息

 商店Id:1 |可用性指数:11 + 11 + 11 = 33. 
Most Available Product(按大多数可用产品排序,然后按产品ID排序):
1.产品编号:3 on [2014-12-22 到2015-01-02](11天)
2.产品编号:1 on [2014-12-22至2015-01-02,2015-01-10至2015-01-21](11天)
商店Id:2 |可用性指数:15 + 7 = 22.
Most Available Product(按大多数可用产品排序,然后按产品ID排序):
1.产品编号:1 on [2014-12-24至2015-01-08](15天)
2. [2014-12-25至2015-01-01](7天)的产品编号:3

总可用性指数:37 + 18或33 + 22 = 55.

以上是产品统计和商店统计。
我会寻求优化,高效且易于理解的解决方案来打印上面的结果。



我试图从上述数据中获得结果:

  // productAvailability =>请参阅问题开头的上面的声明变量! 
列表aggregateDates = productAvailability.collect({[
storeId:it.storeId,
productId:it.productId,
availabilityIndex:Date.parse(YYYY-MM-dd ,it.endDate) - Date.parse(YYYY-MM-dd,it.startDate)
]});
printlnTotal Availability Index:+ aggregateDates.clone()。sum({it.availabilityIndex});
printlnTotal Products:+ aggregateDates.clone()。unique({it.productId})。count({it.productId});
printlnTotal Stores:+ aggregateDates.clone()。unique({it.storeId})。count({it.storeId});
printlnAverage Availability Index:+ aggregateDates.clone()。sum({it.availabilityIndex})/ aggregateDates.size();

正如您在上面的代码片段中看到的那样,我可以非常轻松地获得SUM,AVG和COUNT productAvailability 数据中有多少PRODUCT和STORE。然而,使用日期范围来获取基于PRODUCT和STORE的可用性是很困难的,以达到上述目的。



使用日期范围查看我的代码。 p>

  def dailyDatesAvailability = [:] as Map< Date,Integer>; 
def dailyStoresAvailability = [:]。withDefault {0} as Map< Integer,Integer>;
def dailyProductsAvailability = [:]。withDefault {0} as Map< Integer,Integer>;
(Date.parse(YYYY-MM-dd,2014-12-01))。upto((Date.parse(YYYY-MM-dd,2015-01-30)) )){日期runningDate - >
dailyDatesAvailability [runningDate] = 0;
productAvailability.each({_availability - >
def _startDate = Date.parse(YYYY-MM-dd,_availability.startDate);
def _endDate = Date.parse(YYYY如果(_startDate< = runningDate&& _endDate> = runningDate){
dailyDatesAvailability [runningDate] ++;
dailyProductsAvailability [ _availability.productId] ++;
dailyStoresAvailability [_availability.storeId] ++;
}

//在此处执行某些操作以获取具有日期范围的商店中最受欢迎的PRODUCT
});
///或者在这里做点什么....?
}

打印上述目标的最佳方式使用Groovy?请分享代码片段以便测试。

解决方案

对此深感兴趣,并提出:

 列表< Range>简化(List< Range>范围){
ranges.drop(1).inject(ranges.take(1)){r,curr - >
//找到重叠范围
def ov = r.find {curr.from< = it.to&& curr.to> = it.from}
if(ov){
ov.from = [curr.from,ov.from] .min()
ov.to = [curr .to,ov.to] .max()
simplified(r)
}
else {
r<< curr
}
}
}

def操作(数据,主要,次要){
data.groupBy {it。$ primary}
.collect {id,vals - >
def joined = vals.collect {it - >
[id:it.id,
range:Date.parse('yyyy-MM-dd',it.startDate).. Date.parse('yyyy-MM-dd',it.endDate ),
key:secondary,
value:it。$ secondary]
} .groupBy {it.value}
.collectMany {sid,ran - >简化(ran.range).collect {[key:secondary,value:sid,range:it,days:(it.to - it.from)]}}
.sort {a,b - > b.days< => a.days?:a.value - b.value}
[name:primary,id:id,data:joined]
}
}

def dump (数据){
data.collect {a - >
def sum = a.data.days.sum()
println$ a.name:$ a.id |可用性索引$ {a.data.days.join('+')} = $ {sum}
a.data.eachWithIndex {row,idx - >
println$ {idx + 1}。$ {row.range.from.format('yyyy-MM-dd')} to $ {row.range.to.format('yyyy-MM-dd' )} $ row.key $ row.value($ row.days days)

sum
}
}

def productAvailability = [
[id:1,startDate:2014-12-22,endDate:2015-01-02,storeId:1,productId:1,categoryId:1],
[id: 4,startDate:2014-12-24,endDate:2015-01-08,storeId:2,productId:1,categoryId:1],
[id:8,startDate:2014-12 -25,endDate:2015-01-01,storeId:2,productId:3,categoryId:1],
[id:9,startDate:2014-12-22,endDate:2015 -01-02,storeId:1,productId:3,categoryId:1],
[id:10,startDate:2015-01-10,endDate:2015-01-21,storeId: 1,productId:1,categoryId:1]
];
$ b $ def p = dump(操作(productAvailability,'productId','storeId'))
println''
def s = dump(操作(productAvailability,'storeId', 'productId'))
println''
println总可用性索引:$ {p.join('+')}或$ {s.join('+')} = $ {[p .sum(),s.sum()] .max()}

打印输出:

  productId:1 |可用性指数15 + 11 + 11 = 37 
1. 2014-12-24至2015-01-08 in storeId 2(15天)
2. 2014-12-22至2015-01-02 in storeId 1(11 days)
3. 2015-01-10至2015-01-21 in storeId 1(11 days)
productId:3 |可用性指数11 + 7 = 18
1. 2014-12-22至2015-01-02 in storeId 1(11天)
2. 2014-12-25至2015-01-01 in storeId 2(7天)

storeId:1 |可用性指数11 + 11 + 11 = 33
1. 2014-12-22至2015-01-02 in productId 1(11天)
2. 2015-01-10至2015-01-21 in productId 1(11 days)
3. 2014-12-22至2015-01-02 in productId 3(11 days)
storeId:2 |可用性指数15 + 7 = 22
1. 2014-12-24至2015-01-08 in productId 1(15天)
2. 2014-12-25至2015-01-01 in productId 3(7天)

可用总指数:37 + 18或33 + 22 = 55


I have data in groovy presented in the code snippet below:

def productAvailability = [
  [id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
  [id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
  [id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
  [id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
  [id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];

The objective is to get the result like this:

PRODUCT STATISTICS

Product Id: 1 | Availability Index: 15 + 11 + 11 = 37.
   Longest Available Products (Sort By Past Start Date *first* then, Store Id): 
       1. "2014-12-24" to "2015-01-08" in store id 2. (15 days)
       2. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
       3. "2015-01-10" to "2015-01-21" in store id 1. (11 days)
Product Id: 3 | Availability Index: 7 + 11 = 18.
   Longest Available Products (Sort By Past Start Date *first* then, Store Id): 
       1. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
       2. "2014-12-25" to "2015-01-01" in store id 2. (7 days)

STORE STATISTICS

Store Id: 1 | Availability Index: 11 + 11 + 11 = 33.
   Most Available Product (sort by most available product, then sort by product id):
       1. Product Id: 3 on ["2014-12-22" to "2015-01-02"] (11 days)
       2. Product Id: 1 on ["2014-12-22" to "2015-01-02", "2015-01-10" to "2015-01-21"] (11 days)
Store Id: 2 | Availability Index: 15 + 7 = 22.
   Most Available Product (sort by most available product, then sort by product id):
       1. Product Id: 1 on ["2014-12-24" to "2015-01-08"] (15 days)
       2. Product Id: 3 on ["2014-12-25" to "2015-01-01"] (7 days)

Total Availability Index: 37 + 18 or 33 + 22 = 55.

Here with the printed result above are the product statistics and store statistics. I would seek for the optimized, efficient, and easy to understand solution to print the result above.

My attempt to achieve the results from the above data:

// productAvailability => see the declaration variable above in the beginning of question!
List aggregateDates = productAvailability.collect({[
    storeId: it.storeId,
    productId: it.productId,
    availabilityIndex: Date.parse("YYYY-MM-dd", it.endDate) - Date.parse("YYYY-MM-dd", it.startDate) 
]});
println "Total Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex });
println "Total Products: " +  aggregateDates.clone().unique({ it.productId }).count({ it.productId });
println "Total Stores: " + aggregateDates.clone().unique({ it.storeId }).count({ it.storeId });
println "Average Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex }) / aggregateDates.size();

As you can see in the snippet above, I can get very easily the aggregate SUM, AVG, and COUNT how many PRODUCT and STORE are inside the productAvailability data. However, this is difficult for me to get the availability based on PRODUCT and STORE using date ranges to achieve the objective above.

See my code below using date ranges.

def dailyDatesAvailability = [:] as Map<Date, Integer>;
def dailyStoresAvailability = [:].withDefault {0} as Map<Integer, Integer>;
def dailyProductsAvailability  = [:].withDefault {0} as Map<Integer, Integer>;
(Date.parse("YYYY-MM-dd", "2014-12-01")).upto((Date.parse("YYYY-MM-dd", "2015-01-30"))) { Date runningDate ->
        dailyDatesAvailability[runningDate] = 0;
        productAvailability.each({ _availability ->
            def _startDate = Date.parse("YYYY-MM-dd", _availability.startDate);
            def _endDate = Date.parse("YYYY-MM-dd", _availability.endDate);
            if (_startDate <= runningDate && _endDate >= runningDate) {
                dailyDatesAvailability[runningDate]++;
                dailyProductsAvailability[_availability.productId]++;
                dailyStoresAvailability[_availability.storeId]++;
            }

         // Do something here to get the MOST available PRODUCT in a STORE with date ranges
        });
       /// or do something here....?
    }

What is the best way to print the objectives above using Groovy? Please share the code snippet to be able to test.

解决方案

Got intrigued by this, and came up with:

List<Range> simplify( List<Range> ranges ) {
  ranges.drop( 1 ).inject( ranges.take( 1 ) ) { r, curr ->
    // Find an overlapping range
    def ov = r.find { curr.from <= it.to && curr.to >= it.from }
    if( ov ) {
      ov.from = [ curr.from, ov.from ].min()
      ov.to   = [ curr.to, ov.to ].max()
      simplify( r )
    }
    else {
      r << curr
    }
  }
}

def manipulate(data, primary, secondary) {
    data.groupBy { it."$primary" }
        .collect { id, vals ->
            def joined = vals.collect { it ->
                [ id: it.id,
                  range: Date.parse('yyyy-MM-dd', it.startDate)..Date.parse('yyyy-MM-dd', it.endDate),
                  key: secondary,
                  value: it."$secondary" ]
            }.groupBy { it.value }
             .collectMany { sid, ran -> simplify(ran.range).collect { [key: secondary, value: sid, range:it, days:(it.to - it.from)] } }
             .sort { a, b -> b.days <=> a.days ?: a.value - b.value }
            [name:primary, id:id, data:joined]
        }
}

def dump(data) {
    data.collect { a ->
        def sum = a.data.days.sum()
        println "$a.name: $a.id | availability index ${a.data.days.join(' + ')} = ${sum}"
        a.data.eachWithIndex { row, idx ->
            println "    ${idx+1}. ${row.range.from.format('yyyy-MM-dd')} to ${row.range.to.format('yyyy-MM-dd')} in $row.key $row.value ($row.days days)"
        }
        sum
    }
}

def productAvailability = [
  [id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
  [id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
  [id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
  [id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
  [id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];

def p = dump(manipulate(productAvailability, 'productId', 'storeId'))
println ''
def s = dump(manipulate(productAvailability, 'storeId', 'productId'))
println ''
println "Total Availability Index: ${p.join(' + ')} or ${s.join(' + ')} = ${[p.sum(), s.sum()].max()}"

Which prints out:

productId: 1 | availability index 15 + 11 + 11 = 37
    1. 2014-12-24 to 2015-01-08 in storeId 2 (15 days)
    2. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
    3. 2015-01-10 to 2015-01-21 in storeId 1 (11 days)
productId: 3 | availability index 11 + 7 = 18
    1. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
    2. 2014-12-25 to 2015-01-01 in storeId 2 (7 days)

storeId: 1 | availability index 11 + 11 + 11 = 33
    1. 2014-12-22 to 2015-01-02 in productId 1 (11 days)
    2. 2015-01-10 to 2015-01-21 in productId 1 (11 days)
    3. 2014-12-22 to 2015-01-02 in productId 3 (11 days)
storeId: 2 | availability index 15 + 7 = 22
    1. 2014-12-24 to 2015-01-08 in productId 1 (15 days)
    2. 2014-12-25 to 2015-01-01 in productId 3 (7 days)

Total Availability Index: 37 + 18 or 33 + 22 = 55

这篇关于Groovy - 从复杂的嵌套地图汇总和建模数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆