将sql逻辑移动到后端-bash [英] moving sql logic to backend - bash
问题描述
一种sql逻辑正在移到后端,我需要使用shell脚本生成报告.为了便于理解,我将其简化如下.
One of the sql logic is moving to backend and I need to generate a report using shell scripting. For understanding, I'm making it simple as follows.
我的输入文件-sales.txt(编号,价格,月份)
My input file - sales.txt (id, price, month)
101,50,2019-10
101,80,2020-08
101,80,2020-10
201,100,2020-09
201,350,2020-10
每个ID的输出应为6个月的窗口,例如t1 = 2020-07和t2 = 2020-12
The output should be for 6 months window for each id e.g t1=2020-07 and t2=2020-12
101,50,2020-07
101,80,2020-08
101,80,2020-09
101,80,2020-10
101,80,2020-11
101,80,2020-12
201,100,2020-09
201,350,2020-10
201,350,2020-11
201,350,2020-12
对于ID为 101
的ID,尽管没有2020-07的条目,但它应取自销售文件中可用的上个月的前一个值.因此,将2019-10年度的价格= 50用于2020-07年度.
For id 101
, though there is no entry for 2020-07, it should take from the immediate previous month value that is available in the sales file.
So the price=50 from 2019-10 is used for 2020-07.
对于 201
,第一个条目本身是从2020-09开始的,因此2020-08和2020-07不适用.只要有差距,就应该传播前一个月的值.
For 201
, the first entry itself is from 2020-09, so 2020-08 and 2020-07 are not applicable.
Wherever there are gaps the immediate previous month value should be propagated.
我正在尝试使用awk解决此问题,我正在创建一个可重复使用的脚本util.awk,如下所示要生成缺少的值,请将其传递给sort命令,然后再次使用util.awk作为最终输出.
I'm trying to use awk to solve this problem, I'm creating a reusable script util.awk like below to generate the missing values, pipe it to sort command and then again use the util.awk for final output.
util.awk
function get_month(a,b,t1) { return strftime("%Y%m",mktime(a " " b t1)) }
BEGIN { ss=" 0 0 0 "; ts1=" 1 " ss; ts2=" 35 " ss ; OFS="," ; x=1 }
{
tsc=get_month($3,$4,ts1);
if ( NR>1 && $1==idp )
{
if( tsc == tsp) { print $1,$2,get_month($3,$4,ts1); x=0 }
else { for(i=tsp; i < tsc; i=get_month(j1,j2,i) )
{
j1=substr(i,1,4); j2=substr(i,5,2);
print $1,tpr,i;
}
}
}
tsp=get_month($3,$4,ts2);
idp=$1;
tpr=$2;
if(x!=0) print $1,$2,tsc
x=1;
}
但是它无限地运行 awk -F"[,-]";-f utils.awk sales.txt
尽管我在awk中尝试过,但我也欢迎在bash环境中也可以使用的其他答案.
Though I tried in awk, I welcome other answers as well that would work in bash environment.
推荐答案
总体计划:
- 假设:
sales.txt
已按第一列(数字方式)进行排序 - 用户提供要显示的最小->最大日期范围(
awk
变量mindt
和maxdt
) - 对于一个独特的
id
值,我们会将所有价格和日期加载到一个数组中(prices []
) - 日期将用作关联数组的索引以存储价格(
prices [YYYY-MM]
) - 一旦我们读取了给定
id
... 的所有记录, - 按索引对
prices []
数组进行排序(即,按YYYY-MM
进行排序) - 找到最大日期的价格小于
mindt
(另存为prevprice
) - 对于
mindt
和maxdt
(含)之间的每个日期,如果我们有价格,则将其显示(并另存为prevprice
)... - 如果我们没有价格,但是我们有
prevprice
,则使用此prevprice
作为当前日期的price
(即,用之前的价格填补空白)
- assumption:
sales.txt
is already sorted (numerically) by the first column - user provides the min->max date range to be displayed (
awk
variablesmindt
andmaxdt
) - for a distinct
id
value we'll load all prices and dates into an array (prices[]
) - dates will be used as the indices of an associative array to store prices (
prices[YYYY-MM]
) - once we've read all records for a given
id
... - sort the
prices[]
array by the indices (ie, sort byYYYY-MM
) - find the price for the max date less than
mindt
(save asprevprice
) - for each date between
mindt
andmaxdt
(inclusive), if we have a price then display it (and save asprevprice
) else ... - if we don't have a price but we do have a
prevprice
then use thisprevprice
as the current date'sprice
(ie, fill the gap with the previous price)
一个(GNU) awk
想法:
mindate='2020-07'
maxdate='2020-12'
awk -v mindt="${mindate}" -v maxdt="${maxdate}" -v OFS=',' -F',' '
# function to add "months" (number) to "indate" (YYYY-MM)
function add_month(indate,months) {
dhms="1 0 0 0" # default day/hr/min/secs
split(indate,arr,"-")
yr=arr[1]
mn=arr[2]
return strftime("%Y-%m", mktime(arr[1]" "(arr[2]+months)" "dhms))
}
# function to print the list of prices for a given "id"
function print_id(id) {
if ( length(prices) == 0 ) # if prices array is empty then do nothing (ie, return)
return
PROCINFO["sorted_in"]="@ind_str_asc" # sort prices[] array by index in ascending order
for ( i in prices ) # loop through indices (YYYY-MM)
{ if ( i < mindt ) # as long as less than mindt
prevprice=prices[i] # save the price
else
break # no more pre-mindt indices to process
}
for ( i=mindt ; i<=maxdt ; i=add_month(i,1) ) # for our mindt - maxdt range
{ if ( !(i in prices) && prevprice ) # if no entry in prices[], but we have a prevprice, then ...
prices[i]=prevprice # set prices[] to prevprice (ie, fill the gap)
if ( i in prices ) # if we have an entry in prices[] then ...
{ prevprice=prices[i] # update prevprice (for filling future gap) and ...
print id,prices[i],i # print our data to stdout
}
}
}
BEGIN { split("",prices) } # pre-declare prices as an array
previd != $1 { print_id(previd) # when id changes print the prices[] array, then ...
previd=$1 # reset some variables for processing of the next id and ...
prevprice=""
delete prices # delete the prices[] array
}
{ prices[$3]=$2 } # for the current record create an entry in prices[]
END { print_id(previd) } # flush the last set of prices[] to stdout
' sales.txt
注意:这假设 sales.txt
是按第一个字段(数字方式)排序的;如果不正确,则最后一行应更改为'<(sort -n sales.txt)
NOTE: This assumes sales.txt
is sorted (numerically) by the first field; if this is not true then the last line should be changed to ' <(sort -n sales.txt)
这将生成:
101,50,2020-07
101,80,2020-08
101,80,2020-09
101,80,2020-10
101,80,2020-11
101,80,2020-12
201,100,2020-09
201,350,2020-10
201,350,2020-11
201,350,2020-12
这篇关于将sql逻辑移动到后端-bash的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!