带有过程替代误区的T恤 [英] Tee with process substitution misunderstanding
问题描述
我正在尝试为LDAP条目编写漂亮的打印机,该打印机仅获取一次LDAP根记录,然后将输出通过管道传送到 tee
中,从而为每个部分调用漂亮的打印机.
为便于说明,假设我的 group_entry
函数返回特定LDAP DN的LDIF.详细信息并不重要,因此可以说它总是返回:
dn:cn = foo,dc = example,dc = comcn:foo所有者:uid = foo,dc = example,dc = com所有者:uid = bar,dc = example,dc = com成员:uid = foo,dc = example,dc = com成员:uid = baz,dc = example,dc = com成员:uid = quux,dc = example,dc = com自定义:abc123
通过一些 grep
'ing和 cut
'ing,我可以轻松地分别提取所有者和成员.然后,我可以将这些辅助DN传递到另一个LDAP搜索查询中,以获取其真实名称.举个例子,假设我有一个 pretty_print
函数,该函数在LDAP属性名称上进行了参数设置,它完成了我刚才提到的所有操作,然后使用AWK很好地格式化了所有内容:
$ group_entry |pretty_print所有者拥有者:富先生酒吧Dr Bar$ group_entry |pretty_print成员成员:富先生baz Bazzy McBazFacequux艺术家前身为Quux
这些单独工作正常,但是当我尝试将它们一起 tee
时,什么也没发生:
$ group_entry |tee>(pretty_print所有者)|pretty_print成员成员:[坐在那里等待Ctrl + C]
显然,我对它应该如何工作有一些误解,但它使我逃脱了.我在做什么错了?
编辑为了完整起见,这是我的完整脚本:
#!/usr/bin/env bash设置-eu -o pipefailLDAPSEARCH ="ldapsearch -xLLL"group_entry(){本地组="$ 1"$ {LDAPSEARCH}(&(objectClass = posixGroup)(cn = $ {group}))"}get_attribute(){本地attr ="$ 1"grep"$ {attr}:" |切-d" -f2}get_names(){#我们将空白行从LDIF条目中删除,那么我们总是有"dn"#后跟"cn"记录;我们剥去属性名称,然后#将这些行连接起来,然后进行排序.因此,我们得到以下的排序列表:#{{{distinguished_name}} {{real_name}}xargs -n1 -J%$ {LDAPSEARCH} -s base -b%cn \|grep -v"^ $" \|切-d" -f2- \|粘贴 - - \|种类}pretty_print(){本地attr ="$ 1"本地-A pretty =([成员] =成员" [所有者] =所有者")get_attribute"$ {attr}" \|get_names \|gawk -F'\ t'-v title ="$ {pretty [$ {attr}]}:"'开始{打印标题}{print-",gensub(/^ uid =([^,] +),.* $/,"\\ 1","g",$ 1),"\ t",$ 2}'}#FIXME我不知道为什么带有进程替换的Tee在这里不起作用group_entry"$ 1" |pretty_print所有者group_entry"$ 1" |pretty_print成员
您描述的行为看起来很像在C语言程序中出现的情况,该C语言程序派生并执行了另一个程序(就像shell和xargs一样)没有正确处理所有打开的文件描述符.您可能会遇到以下情况:进程 p1 不会终止,因为它正在等待观察其标准输入上的EOF,但是它永远不会终止,因为另一个进程 p2 拥有一个提供 p1 的标准输入的管道的写端的打开文件描述符,而 p2 本身正在等待 p1 终止或执行其他动作.
尽管如此,在这方面,我看不到您的管道有任何内在的错误,并且我也没有使用这种更简单的模型来重现挂起的问题...
echo"foo" |三通>(cat)|猫
...在 bash
的4.2.46版中.不过,您的 bash
版本(即使是相同版本)或 xargs
中也可能存在相关的错误,但这只是推测.我不认为您的管道应该像您所说的那样挂起,但是我不准备开始指责.
无论如何,即使您的管道没有挂起,它也没有您想要的语义,就像@chepner在注释中指出的那样. pretty_print成员
将在其标准输入上接收 tee
的输出,并且将同时包含 both 和 group_entry
和 pretty_print owner
的输出.您可以考虑以不同的方式实现它:由于tee可以通过两种以上的方式多路复用输入,因此您可以用一块石头杀死两只鸟:
group_entry"$ 1" |tee>(pretty_print所有者)>(pretty_print成员)
但是,这留下了两个 pretty_print
执行的输出混合在一起的可能性,并且还会回显 group_entry
的输出.可以想象,您可以过滤出 group_entry
输出,但是要避免混淆,您需要确保两个 pretty_print
命令按顺序运行.这为基于 tee
的输出中的任何一个阻塞,则整个管道可能会停顿.
一种解决方案是将一个或两个 pretty_print
命令的输出重定向到文件.另外,如果必须将两个输出都输出到stdout,那么我认为除了捕获 group_entry
输出并将其分别提供给每个 pretty_print
作业之外,别无选择.您可以将其捕获到文件中,但这是不必要的,并且有点混乱.考虑一下这个:
entry_lines = $(group_entry"$ 1")pretty_print所有者<<<"$ entry_lines"pretty_print成员<<<"$ entry_lines"
使用命令替换在外壳变量(包括换行符)中捕获 group_entry
的输出,并使用here字符串将其重播到每个 pretty_print
进程中./p>
I am trying to write a pretty printer for LDAP entries which only fetches the root LDAP record once and then pipes the output into tee
that invokes the pretty printer for each section.
For illustration's sake, say my group_entry
function returns the LDIF of a specific LDAP DN. The details of which aren't important, so let's say it always returns:
dn: cn=foo,dc=example,dc=com
cn: foo
owner: uid=foo,dc=example,dc=com
owner: uid=bar,dc=example,dc=com
member: uid=foo,dc=example,dc=com
member: uid=baz,dc=example,dc=com
member: uid=quux,dc=example,dc=com
custom: abc123
I can easily extract the owners and members separately with a bit of grep
'ing and cut
'ing. I can then pipe those secondary DNs into another LDAP search query to get their real names. For sake of example, let's say I have a pretty_print
function, that is parametrised on the LDAP attribute name, which does all that I just mentioned and then formats everything nicely with AWK:
$ group_entry | pretty_print owner
Owners:
foo Mr Foo
bar Dr Bar
$ group_entry | pretty_print member
Members:
foo Mr Foo
baz Bazzy McBazFace
quux The Artist Formerly Known as Quux
These work fine individually, but when I try to tee
them together, nothing happens:
$ group_entry | tee >(pretty_print owner) | pretty_print member
Members:
[Sits there waiting for Ctrl+C]
Obviously I have some misunderstanding about how this is supposed to work, but it escapes me. What am I doing wrong?
EDIT For sake of completeness, here's my full script:
#!/usr/bin/env bash
set -eu -o pipefail
LDAPSEARCH="ldapsearch -xLLL"
group_entry() {
local group="$1"
${LDAPSEARCH} "(&(objectClass=posixGroup)(cn=${group}))"
}
get_attribute() {
local attr="$1"
grep "${attr}:" | cut -d" " -f2
}
get_names() {
# We strip blank lines out of the LDIF entry, then we always have "dn"
# followed by "cn" records; we strip off the attribute name and
# concatenate those lines, then sort. So we get a sorted list of:
# {{distinguished_name}} {{real_name}}
xargs -n1 -J% ${LDAPSEARCH} -s base -b % cn \
| grep -v "^$" \
| cut -d" " -f2- \
| paste - - \
| sort
}
pretty_print() {
local attr="$1"
local -A pretty=([member]="Members" [owner]="Owners")
get_attribute "${attr}" \
| get_names \
| gawk -F'\t' -v title="${pretty[${attr}]}:" '
BEGIN { print title }
{ print "-", gensub(/^uid=([^,]+),.*$/, "\\1", "g", $1), "\t", $2 }
'
}
# FIXME I don't know why tee with process substitution doesn't work here
group_entry "$1" | pretty_print owner
group_entry "$1" | pretty_print member
The behavior you describe looks very much like a situation that can arise in a C program that forks and exec's another program (as the shell and xargs both certainly do) without properly handling all the open file descriptors. You can be left in a situation where a process p1 does not terminate because it's waiting to observe EOF on its standard input, but it never will do because another process p2 holds an open file descriptor for the write end of the pipe that provides p1's standard input, and p2 is itself waiting for p1 to terminate or perform some other action.
Nevertheless, I don't see anything inherently wrong with your pipeline in that regard, and I do not reproduce the hang with this simpler model ...
echo "foo" | tee >(cat) | cat
... in version 4.2.46 of bash
. It may be that there is nevertheless a related bug in your version of bash
(even if its the same one) or in xargs
, but that's speculative. I do not think that your pipeline should hang as you say it does, but I'm not prepared to start pointing fingers.
In any event, even if your pipeline did not hang, it does not have the semantics you want, as @chepner pointed out in comments. The pretty_print member
will receive the output of tee
on its standard input, and that will include both the output of group_entry
and the output of pretty_print owner
. You could consider implementing it differently: since tee can multiplex input more than two ways, you may kill two birds with one stone by doing this:
group_entry "$1" | tee >(pretty_print owner) >(pretty_print member)
But that leaves open the possibility that the output of the two pretty_print
executions will be intermingled, and also echos the group_entry
output. You could conceivably filter out the group_entry
output, but to avoid the intermingling, you need to ensure that the two pretty_print
commands run sequentially. That presents a problem for a tee
-based approach, because if any of tee
's outputs block then the whole pipeline can stall.
One solution would be to redirect the output of one or both pretty_print
commands to a file. Alternatively, if it is essential that both outputs go to stdout, then I see no good alternative but to capture the group_entry
output, and feed it separately to each pretty_print
job. You could capture it to a file, but that's unnecessary, and a bit messy. Consider this instead:
entry_lines=$(group_entry "$1")
pretty_print owner <<<"$entry_lines"
pretty_print member <<<"$entry_lines"
That uses command substitution to capture the output of group_entry
in a shell variable (including newlines), and uses a here string to replay it into each pretty_print
process.
这篇关于带有过程替代误区的T恤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!