awk两个正则表达式条件-结构复杂的复杂事务列表csv [英] awk two regex conditions - structure convoluted complex transactions list csv
问题描述
我的原始输入文件是预订交易清单.我对这两部分中的内容感兴趣:a)交易和b)退款. 这些始终位于CSV的底部并且结构化.
My original input files is a booking transaction list. I am interested in the lines that are in the two sections: a) transactions and b) refunds. These are always at the bottom of the CSVs and structured.
我可以通过正则表达式条件/transaction/{print}跳过节事务上方的所有行.
I can skip all lines above section transaction via regex condition /transaction/ {print}.
我想添加一列,其中包含字符串交易或退款",具体取决于csv中的部分.因此,我知道一笔交易是一笔交易还是一笔退款.像
I would like to add a column with strings "transaction or refunds" depending on the section in the csv. So I know if a cloumn is a transactions or refund. something like
IF ($2 = "transaction" || " " != "refunds"){$7=="transaction"};
IF ($2 = "refunds" || " " != "transaction"){$7=="refunds"}
我在我的gdrive上共享CSV和script.awk,希望这是可以接受的: 要构建的复杂交易列表
I share the CSV and script.awk on my gdrive and hope this is acceptable: convoluted transaction list to be structured
transaction date via Details payment fee
28-02-2015 invoice txn1 44.1 0.19
28-02-2015 invoice txn2 27.7 0.19
07-03-2015 invoice txn3 43.1 0.19
09-03-2015 invoice txn4 36.8 0.19
12-03-2015 invoice txn5 26 0.19
13-03-2015 invoice txn6 43.7 0.19
13-03-2015 invoice txn7 25.6 0.19
15-03-2015 creditcard txn8 70.8 0.19
Sum 317.8 1.52
refunds Datum via Details payment 1.52
18-12-2014 invoice txn0 16
Sum 16
我的预期结果是这样的:
My intended outcome is this:
date via Details payment fee type
28-02-2015 invoice txn1 44.1 0.19 transaction
28-02-2015 invoice txn2 27.7 0.19 transaction
07-03-2015 invoice txn3 43.1 0.19 transaction
09-03-2015 invoice txn4 36.8 0.19 transaction
12-03-2015 invoice txn5 26 0.19 transaction
13-03-2015 invoice txn6 43.7 0.19 transaction
13-03-2015 invoice txn7 25.6 0.19 transaction
15-03-2015 creditcard txn8 70.8 0.19 transaction
18-12-2014 invoice txn0 16 refund
此刻我的代码段:
BEGIN {OFS=FS=";"
print {date,payment option,detailspayment,fee,type }
/^transactions/,/^$/{
if ($3=="via) {next};
if ($6=="Sum") {next};
print $2 FS $3 FS $4 FS $5 FS $6 FS $7;
}
推荐答案
awk '
NR == 1 {
$1 = ""
print $0, "type"
type = "transaction"
next
}
$1 == "refunds" {
print ""
type = "- refund"
}
/^ / && NF > 3 {
print $0, type
}' input.txt |column -t
输出:
date via Details payment fee type
28-02-2015 invoice txn1 44.1 0.19 transaction
28-02-2015 invoice txn2 27.7 0.19 transaction
07-03-2015 invoice txn3 43.1 0.19 transaction
09-03-2015 invoice txn4 36.8 0.19 transaction
12-03-2015 invoice txn5 26 0.19 transaction
13-03-2015 invoice txn6 43.7 0.19 transaction
13-03-2015 invoice txn7 25.6 0.19 transaction
15-03-2015 creditcard txn8 70.8 0.19 transaction
18-12-2014 invoice txn0 16 - refund
我正在通过column -t
运行此命令以使各列对齐,尽管这样可以删除退款前添加的换行符.另一个区别是用于退款的费用"的破折号,这是column -t
正常工作所必需的.
I'm running this through column -t
in order to line up the columns, though that removes the added line break before the refund. Another difference is the dash used for the refund's "fee" which is necessary in order for column -t
to work correctly.
在awk代码中,如果记录数(行号,NR
)为1,则删除第一项并打印其余项和类型",然后我们继续进行下一行.如果该行以退款"开头,则我们打印空白行,然后将类型更改为退款"(由于不收费,因此用破折号表示).最后,如果我们有前导空格并且字段数(NF
)为4+,我们将打印行加类型.
In the awk code, if the number of records (line number, NR
) is 1, remove the first item and print the rest plus "type" and then we move on to the next line. If that line starts with "refunds" then we print a blank line and then alter the type to "refund" (since there's no fee, we indicate that with a dash). Finally, if we have leading spaces and the number of fields (NF
) is 4+, we print the line plus the type.
如果您在操作内的命令之间使用分号,则awk代码可以全部放在一行上.
The awk code can be all on one line if you use semicolons between commands inside the actions.
这篇关于awk两个正则表达式条件-结构复杂的复杂事务列表csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!