在awk中,我该如何使用含有printf的多种格式字符串的文件? [英] In awk, how can I use a file containing multiple format strings with printf?

查看:122
本文介绍了在awk中,我该如何使用含有printf的多种格式字符串的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有,我想用从文件输入如printf()在awk中的格式的情况。当我在code中的字符串设置格式我的工作,但是当我从输入加载它这是行不通的。

I have a case where I want to use input from a file as the format for printf() in awk. My formatting works when I set it in a string within the code, but it doesn't work when I load it from input.

下面是问题的一个小例子:

Here's a tiny example of the problem:

$ # putting the format in a variable works just fine:
$ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}'
hello:  world
        foobar
$ # But getting the format from an input file does not.
$ echo "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
hello:\tworld\n\tfoobar
$ 

所以...格式的替换工作(%S ),而不是特殊字符,如制表符和换行符。任何想法,为什么发生这种情况?而且是有办法做什么来输入数据,以使其可作为格式字符串?

So ... format substitutions work ("%s"), but not special characters like tab and newline. Any idea why this is happening? And is there a way to "do something" to input data to make it usable as a format string?

更新#1:

作为进一步的例子,请考虑以下使用bash heretext:

As a further example, consider the following using bash heretext:

[me@here ~]$ awk -vs="hello: %s\nworld: %s\n" '{printf(s, "foo", "bar");}' <<<""
hello: foo
world: bar
[me@here ~]$ awk '{s=$0; printf(s, "foo", "bar");}' <<<"hello: %s\nworld: %s\n"
hello: foo\nworld: bar\n[me@here ~]$

据我所看到的,同样的事情发生与多个不同的awk间preters,我一直没能找到解释的任何文档为什么。

As far as I can see, the same thing happens with multiple different awk interpreters, and I haven't been able to locate any documentation that explains why.

更新#2:

在code我试图取代目前看起来是这样的,在外壳嵌套​​循环。在present,AWK是的只有的被用于其的printf ,并可能被替换为一个壳基于 printf的

The code I'm trying to replace currently looks something like this, with nested loops in shell. At present, awk is only being used for its printf, and could be replaced with a shell-based printf:

#!/bin/sh

while read -r fmtid fmt; do
  while read cid name addy; do
    awk -vfmt="$fmt" -vcid="$cid" -vname="$name" -vaddy="$addy" \
      'BEGIN{printf(fmt,cid,name,addy)}' > /path/$fmtid/$cid
  done < /path/to/sampledata
done < /path/to/fmtstrings

输入示例是:

## fmtstrings:
1 ID:%04d Name:%s\nAddress: %s\n\n
2 CustomerID:\t%-4d\t\tName: %s\n\t\t\t\tAddress: %s\n
3 Customer: %d / %s (%s)\n

## sampledata:
5 Companyname 123 Somewhere Street
12 Othercompany 234 Elsewhere

我的希望是,我能够构建这样的事情做整个事情一起AWK单呼,而不是外壳有嵌套的循环:

My hope was that I'd be able to construct something like this to do the entire thing with a single call to awk, instead of having nested loops in shell:

awk '

  NR==FNR { fmts[$1]=$2; next; }

  {
    for(fmtid in fmts) {
      outputfile=sprintf("/path/%d/%d", fmtid, custid);
      printf(fmts[fmtid], $1, $2) > outputfile;
    }
  }

' /path/to/fmtstrings /path/to/sampledata

显然,这是不行的,一方面是因为这个问题的实际课题,也是因为我还没有想出如何使优雅AWK加入$ 2 .. $ N成一个单一的变量。 (但是,这是一个可能的未来问题的话题。)

Obviously, this doesn't work, both because of the actual topic of this question and because I haven't yet figured out how to elegantly make awk join $2..$n into a single variable. (But that's the topic of a possible future question.)

FWIW,我使用的FreeBSD 9.2,其内置的,但我愿意用GAWK如果解决方案可以与被发现。

FWIW, I'm using FreeBSD 9.2 with its built in, but I'm open to using gawk if a solution can be found with that.

推荐答案

为什么这么漫长而复杂的例子吗?这说明这个问题:

Why so lengthy and complicated an example? This demonstrates the problem:

$ echo "" | awk '{s="a\t%s"; printf s"\n","b"}'
a       b

$ echo "a\t%s" | awk '{s=$0; printf s"\n","b"}'
a\tb

在第一种情况下,字符串a \\ t%s是一个字符串文字,所以是PTED间$ P $两次 - 一次当脚本由AWK读取,然后再次执行时,所以在 \\ t 是在第一轮,然后在执行AWK扩大在格式化字符串字面标签字符。

In the first case, the string "a\t%s" is a string literal and so is interpreted twice - once when the script is read by awk and then again when it is executed, so the \t is expanded on the first pass and then at execution awk has a literal tab char in the formatting string.

在第二种情况下AWK仍具有在格式化字符串中的字符反斜线和叔 - 因此不同的行为

In the second case awk still has the characters backslash and t in the formatting string - hence the different behavior.

您需要的东西之间​​的preT那些逃脱字符要做到这一点的一种方法是调用shell的printf和读取结果(按,我用双引号,我应该有一个@ EtanReiser出色的观察修正报价,由\\ 047在这里实现,以避免shell扩展):

You need something to interpret those escaped chars and one way to do that is to call the shell's printf and read the results (corrected per @EtanReiser's excellent observation that I was using double quotes where I should have had single quotes, implemented here by \047, to avoid shell expansion):

$ echo 'a\t%s' | awk '{"printf \047" $0 "\047 " "b" | getline s; print s}'
a       b

如果你并不需要在变量的结果,你可以叫系统()

If you don't need the result in a variable, you can just call system().

如果您只是想逃避扩展字符,所以你不需要提供%S 在shell ARGS 的printf 电话,你只需要逃避所有的 S(看出来已经转义 S)。

If you just wanted the escape chars expanded so you don't need to provide the %s args in the shell printf call, you'd just need to escape all the %s (watching out for already-escaped %s).

您可以调用awk的,而不是外壳的printf 如果您preFER。

You could call awk instead of the shell printf if you prefer.

请注意,这种方式虽然笨拙,比调用更安全的评估这可能只是执行像 RM -rf /输入行* *

Note that this approach, while clumsy, is much safer than calling an eval which might just execute an input line like rm -rf /*.*!

从阿诺德·罗宾斯(GAWK的创造者),和Manuel科利亚(另一说AWK专家)的帮助下,这里是一个脚本,将扩大单字符转义序列:

With help from Arnold Robbins (the creator of gawk), and Manuel Collado (another noted awk expert), here is a script which will expand single-character escape sequences:

$ cat tst2.awk
function expandEscapes(old,     segs, segNr, escs, idx, new) {
    split(old,segs,/\\./,escs)
    for (segNr=1; segNr in segs; segNr++) {
        if ( idx = index( "abfnrtv", substr(escs[segNr],2,1) ) )
            escs[segNr] = substr("\a\b\f\n\r\t\v", idx, 1)
        new = new segs[segNr] escs[segNr]
    }
    return new
}

{
    s = expandEscapes($0)
    printf s, "foo", "bar"
}

$ awk -f tst2.awk <<<"hello: %s\nworld: %s\n"
hello: foo
world: bar

另外,这shoudl功能上等同,但不是呆子特定的:

Alternatively, this shoudl be functionally equivalent but not gawk-specific:

function expandEscapes(tail,   head, esc, idx) {
    head = ""
    while ( match(tail, /\\./) ) {
        esc  = substr( tail, RSTART + 1, 1 )
        head = head substr( tail, 1, RSTART-1 )
        tail = substr( tail, RSTART + 2 )
        idx  = index( "abfnrtv", esc )
        if ( idx )
             esc = substr( "\a\b\f\n\r\t\v", idx, 1 )
        head = head esc
    }

    return (head tail)
} 

如果你愿意,你可以将这个概念扩展,通过改变分割()RE向八进制和十六进制转义序列

If you care to, you can expand the concept to octal and hex escape sequences by changing the split() RE to

/\\(x[0-9a-fA-F]*|[0-7]{1,3}|.)/

和离职后一个十六进制值 \\\\

and for a hex value after the \\:

c = sprintf("%c", strtonum("0x" rest_of_str))

和一个八进制值:

c = sprintf("%c", strtonum("0" rest_of_str))

这篇关于在awk中,我该如何使用含有printf的多种格式字符串的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆