在awk中,我该如何使用含有printf的多种格式字符串的文件? [英] In awk, how can I use a file containing multiple format strings with printf?
问题描述
我有,我想用从文件输入如printf()在awk中的格式的情况。当我在code中的字符串设置格式我的工作,但是当我从输入加载它这是行不通的。
I have a case where I want to use input from a file as the format for printf()
in awk. My formatting works when I set it in a string within the code, but it doesn't work when I load it from input.
下面是问题的一个小例子:
Here's a tiny example of the problem:
$ # putting the format in a variable works just fine:
$ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}'
hello: world
foobar
$ # But getting the format from an input file does not.
$ echo "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
hello:\tworld\n\tfoobar
$
所以...格式的替换工作(%S
),而不是特殊字符,如制表符和换行符。任何想法,为什么发生这种情况?而且是有办法做什么来输入数据,以使其可作为格式字符串?
So ... format substitutions work ("%s
"), but not special characters like tab and newline. Any idea why this is happening? And is there a way to "do something" to input data to make it usable as a format string?
更新#1:
作为进一步的例子,请考虑以下使用bash heretext:
As a further example, consider the following using bash heretext:
[me@here ~]$ awk -vs="hello: %s\nworld: %s\n" '{printf(s, "foo", "bar");}' <<<""
hello: foo
world: bar
[me@here ~]$ awk '{s=$0; printf(s, "foo", "bar");}' <<<"hello: %s\nworld: %s\n"
hello: foo\nworld: bar\n[me@here ~]$
据我所看到的,同样的事情发生与多个不同的awk间preters,我一直没能找到解释的任何文档为什么。
As far as I can see, the same thing happens with multiple different awk interpreters, and I haven't been able to locate any documentation that explains why.
更新#2:
在code我试图取代目前看起来是这样的,在外壳嵌套循环。在present,AWK是的只有的被用于其的printf
,并可能被替换为一个壳基于 printf的
:
The code I'm trying to replace currently looks something like this, with nested loops in shell. At present, awk is only being used for its printf
, and could be replaced with a shell-based printf
:
#!/bin/sh
while read -r fmtid fmt; do
while read cid name addy; do
awk -vfmt="$fmt" -vcid="$cid" -vname="$name" -vaddy="$addy" \
'BEGIN{printf(fmt,cid,name,addy)}' > /path/$fmtid/$cid
done < /path/to/sampledata
done < /path/to/fmtstrings
输入示例是:
## fmtstrings:
1 ID:%04d Name:%s\nAddress: %s\n\n
2 CustomerID:\t%-4d\t\tName: %s\n\t\t\t\tAddress: %s\n
3 Customer: %d / %s (%s)\n
## sampledata:
5 Companyname 123 Somewhere Street
12 Othercompany 234 Elsewhere
我的希望是,我能够构建这样的事情做整个事情一起AWK单呼,而不是外壳有嵌套的循环:
My hope was that I'd be able to construct something like this to do the entire thing with a single call to awk, instead of having nested loops in shell:
awk '
NR==FNR { fmts[$1]=$2; next; }
{
for(fmtid in fmts) {
outputfile=sprintf("/path/%d/%d", fmtid, custid);
printf(fmts[fmtid], $1, $2) > outputfile;
}
}
' /path/to/fmtstrings /path/to/sampledata
显然,这是不行的,一方面是因为这个问题的实际课题,也是因为我还没有想出如何使优雅AWK加入$ 2 .. $ N成一个单一的变量。 (但是,这是一个可能的未来问题的话题。)
Obviously, this doesn't work, both because of the actual topic of this question and because I haven't yet figured out how to elegantly make awk join $2..$n into a single variable. (But that's the topic of a possible future question.)
FWIW,我使用的FreeBSD 9.2,其内置的,但我愿意用GAWK如果解决方案可以与被发现。
FWIW, I'm using FreeBSD 9.2 with its built in, but I'm open to using gawk if a solution can be found with that.
推荐答案
为什么这么漫长而复杂的例子吗?这说明这个问题:
Why so lengthy and complicated an example? This demonstrates the problem:
$ echo "" | awk '{s="a\t%s"; printf s"\n","b"}'
a b
$ echo "a\t%s" | awk '{s=$0; printf s"\n","b"}'
a\tb
在第一种情况下,字符串a \\ t%s是一个字符串文字,所以是PTED间$ P $两次 - 一次当脚本由AWK读取,然后再次执行时,所以在 \\ t
是在第一轮,然后在执行AWK扩大在格式化字符串字面标签字符。
In the first case, the string "a\t%s" is a string literal and so is interpreted twice - once when the script is read by awk and then again when it is executed, so the \t
is expanded on the first pass and then at execution awk has a literal tab char in the formatting string.
在第二种情况下AWK仍具有在格式化字符串中的字符反斜线和叔 - 因此不同的行为
In the second case awk still has the characters backslash and t in the formatting string - hence the different behavior.
您需要的东西之间的preT那些逃脱字符要做到这一点的一种方法是调用shell的printf和读取结果(按,我用双引号,我应该有一个@ EtanReiser出色的观察修正报价,由\\ 047在这里实现,以避免shell扩展):
You need something to interpret those escaped chars and one way to do that is to call the shell's printf and read the results (corrected per @EtanReiser's excellent observation that I was using double quotes where I should have had single quotes, implemented here by \047, to avoid shell expansion):
$ echo 'a\t%s' | awk '{"printf \047" $0 "\047 " "b" | getline s; print s}'
a b
如果你并不需要在变量的结果,你可以叫系统()
。
If you don't need the result in a variable, you can just call system()
.
如果您只是想逃避扩展字符,所以你不需要提供%S
在shell ARGS 的printf
电话,你只需要逃避所有的%
S(看出来已经转义%
S)。
If you just wanted the escape chars expanded so you don't need to provide the %s
args in the shell printf
call, you'd just need to escape all the %
s (watching out for already-escaped %
s).
您可以调用awk的,而不是外壳的printf
如果您preFER。
You could call awk instead of the shell printf
if you prefer.
请注意,这种方式虽然笨拙,比调用更安全的评估
这可能只是执行像 RM -rf /输入行* *
!
Note that this approach, while clumsy, is much safer than calling an eval
which might just execute an input line like rm -rf /*.*
!
从阿诺德·罗宾斯(GAWK的创造者),和Manuel科利亚(另一说AWK专家)的帮助下,这里是一个脚本,将扩大单字符转义序列:
With help from Arnold Robbins (the creator of gawk), and Manuel Collado (another noted awk expert), here is a script which will expand single-character escape sequences:
$ cat tst2.awk
function expandEscapes(old, segs, segNr, escs, idx, new) {
split(old,segs,/\\./,escs)
for (segNr=1; segNr in segs; segNr++) {
if ( idx = index( "abfnrtv", substr(escs[segNr],2,1) ) )
escs[segNr] = substr("\a\b\f\n\r\t\v", idx, 1)
new = new segs[segNr] escs[segNr]
}
return new
}
{
s = expandEscapes($0)
printf s, "foo", "bar"
}
$ awk -f tst2.awk <<<"hello: %s\nworld: %s\n"
hello: foo
world: bar
另外,这shoudl功能上等同,但不是呆子特定的:
Alternatively, this shoudl be functionally equivalent but not gawk-specific:
function expandEscapes(tail, head, esc, idx) {
head = ""
while ( match(tail, /\\./) ) {
esc = substr( tail, RSTART + 1, 1 )
head = head substr( tail, 1, RSTART-1 )
tail = substr( tail, RSTART + 2 )
idx = index( "abfnrtv", esc )
if ( idx )
esc = substr( "\a\b\f\n\r\t\v", idx, 1 )
head = head esc
}
return (head tail)
}
如果你愿意,你可以将这个概念扩展,通过改变分割()RE向八进制和十六进制转义序列
If you care to, you can expand the concept to octal and hex escape sequences by changing the split() RE to
/\\(x[0-9a-fA-F]*|[0-7]{1,3}|.)/
和离职后一个十六进制值 \\\\
:
and for a hex value after the \\
:
c = sprintf("%c", strtonum("0x" rest_of_str))
和一个八进制值:
c = sprintf("%c", strtonum("0" rest_of_str))
这篇关于在awk中,我该如何使用含有printf的多种格式字符串的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!