修改gupdatedb(GNU Updatedb命令)以插入并行命令 [英] Modify gupdatedb (GNU updatedb command) to insert parallel command
问题描述
我正在使用安装有brew
的findutils
软件包中的工具glocate
和gupdatedb
在MacOS 10.15上工作.
I am working on MacOS 10.15 with the tool glocate
and gupdatedb
from findutils
package installed with brew
.
我想将shell命令"parallel
"集成到命令行中.进入脚本
gupdatedb
进入以便更快地建立数据库.
I would like to integrate the shell command "parallel
" into the script
gupdatedb
into order to build more fastly the database.
在脚本gupdatedb
命令的原始版本中,我得到了:
In the original version of script gupdatedb
command, I get :
: ${find:=${BINDIR}/gfind}
1)我试图在上述命令中插入parallel
命令.
1) I tried to insert the parallel
command in this command above.
通常,通过gfind
,我们可以像这样使用parallel
命令:
Usually, with gfind
, we can use parallel
command like this :
parallel --lb -j32 gfind ::: /*
选项'/*'
用于从根目录及其所有子目录中查找所有文件.
the option '/*'
is used to find all files from root directory and all its subdiretories.
所以我尝试做(对于gupdatedb
脚本):
So I tried to do (for the gupdatedb
script) :
: ${find:=/usr/local/bin/parallel -j32 ${BINDIR}/gfind}
但是在执行时,出现以下错误,我无法解释:
But at the execution, I get the following error and I can't explain it :
updatedb needs to be able to execute -j32, but cannot.
2)我也尝试通过变量传递:
num_threads=-j32
${parallel:=${BINDIR}/parallel --lb $num_threads}
: ${find:=${parallel} ${BINDIR}/gfind \{\} ::: }
: ${frcode:=${LIBEXECDIR}/gfrcode}
但是代码仍然处于锁定状态,并且不会生成数据库.
But the code remains locked and database is not generated.
如何克服这个问题,以便能够在多个线程(此处为8个线程)上执行gfind?
How can I overcome this issue to be able to execute gfind on multiple threads (here 8 threads) ?
PS1 : in this post, I make reference to another link : parallel with find explaining how to combine find
and parallel
commands.
PS2:脚本gupdatedb相对较长,因此我在下面提供了相关部分,至少我认为(我停止了使用CMD + C挂起的程序):
PS2 : the script gupdatedb is relatively long, so I give below relevant sections, at least I think (I stopped the program hanging with CMD+C) :
# The database file to build.
: ${LOCATE_DB=/usr/local/var/locate/locatedb}
# Directory to hold intermediate files.
if test -z "$TMPDIR"; then
if test -d /var/tmp; then
: ${TMPDIR=/var/tmp}
elif test -d /usr/tmp; then
: ${TMPDIR=/usr/tmp}
else
: ${TMPDIR=/tmp}
fi
fi
export TMPDIR
# The user to search network directories as.
: ${NETUSER=daemon}
# The directory containing the subprograms.
if test -n "$LIBEXECDIR" ; then
: LIBEXECDIR already set, do nothing
else
: ${LIBEXECDIR=/usr/local/Cellar/findutils/4.7.0/libexec}
fi
# The directory containing find.
if test -n "$BINDIR" ; then
: BINDIR already set, do nothing
else
: ${BINDIR=/usr/local/bin}
fi
# DEV : parallel prefix command
num_threads=-j32
${parallel:=${BINDIR}/parallel --lb $num_threads}
# The names of the utilities to run to build the database.
: ${find:=${parallel} ${BINDIR}/gfind \{\} ::: }
: ${frcode:=${LIBEXECDIR}/gfrcode}
更新1:从我的结果中,如果我注释了# checkbinary $binary
行,并且如果我尝试了第二种方法(请参阅2),我尝试了...),则会收到以下错误消息(我已激活set -x
进行调试:
UPDATE 1: From my results, If I comment the line # checkbinary $binary
and if I apply my second method (see 2) I tried...), I get the following error message (I have activated set -x
for debug :
+ version='
updatedb (GNU findutils) 4.7.0
Copyright (C) 1994-2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
'
+ LC_ALL=C
+ export LC_ALL
+ usage='Usage: /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb [--findoptions='\''-option1 -option2...'\'']
[--localpaths='\''dir1 dir2...'\''] [--netpaths='\''dir1 dir2...'\'']
[--prunepaths='\''dir1 dir2...'\''] [--prunefs='\''fs1 fs2...'\'']
[--output=dbfile] [--netuser=user] [--localuser=user]
[--dbformat] [--version] [--help]
Please see also the documentation at http://www.gnu.org/software/findutils/.
Report (and track progress on fixing) bugs in the updatedb
program via the GNU findutils bug-reporting page at
https://savannah.gnu.org/bugs/?group=findutils or, if
you have no web access, by sending email to <bug-findutils@gnu.org>.
'
+ changeto=/
+ frcode_options=
+ case "$dbformat" in
+ true
+ sort='/usr/bin/sort -z'
+ print_option=-print0
+ frcode_options=' -0'
+ :
+ : /usr/local/bin/zsh
+ : /
+ :
+ : '
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
'
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ test -z ''
++ echo /afs /amd /proc /sfs /tmp /usr/tmp /var/tmp
++ sed -e 's,^,\\(^,' -e 's, ,$\\)\\|\\(^,g' -e 's,$,$\\),'
+ PRUNEREGEX='\(^/afs$\)\|\(^/amd$\)\|\(^/proc$\)\|\(^/sfs$\)\|\(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)'
+ : /usr/local/var/locate/locatedb
+ test -z ''
+ test -d /var/tmp
+ : /var/tmp
+ export TMPDIR
+ : daemon
+ test -n ''
+ : /usr/local/Cellar/findutils/4.7.0/libexec
+ test -n ''
+ : /usr/local/bin
+ num_threads=-j32
+ /usr/local/bin/parallel --lb -j32
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
Tange, O. (2020, July 22). GNU Parallel 20200722 ('Privacy Shield').
Zenodo. https://doi.org/10.5281/zenodo.3956817
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice
To silence this citation notice: run 'parallel --citation' once.
Come on: You have run parallel 15 times. Isn't it about time
you run 'parallel --citation' once to silence the citation notice?
parallel: Warning: Input is read from the terminal. You are either an expert
parallel: Warning: (in which case: YOU ARE AWESOME!) or maybe you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
^C+ : /usr/local/bin/parallel --lb -j32 /usr/local/bin/gfind '{}' :::
+ : /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode
+ : '
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
'
+ test -n '
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
'
++ echo 9P NFS afs autofs cifs coda devfs devpts ftpfs iso9660 mfs ncpfs nfs nfs4 proc shfs smbfs sysfs
++ sed -e 's/\([^ ][^ ]*\)/-o -fstype \1/g' -e 's/-o //' -e 's/$/ -o/'
+ prunefs_exp='-fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o'
+ rm -f /usr/local/var/locate/locatedb.n
+ trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
+ cd /
+ test -n /
+ '[' '' '!=' '' ']'
+ /usr/bin/sort -z
+ /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode -0
+ : OK so far
+ true
+ test -s /usr/local/var/locate/locatedb.n
+ chmod 644 /usr/local/var/locate/locatedb.n
+ mv /usr/local/var/locate/locatedb.n /usr/local/var/locate/locatedb
+ exit 0
更新2:
@MarkStechell.我只是在目录中执行sudo gupdatedb
.
@MarkStechell. I simply do a sudo gupdatedb
in a directory.
请提供完整的命令以供您使用:您向我建议了parallel -j 32 --lb gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS
,但这似乎不起作用.
Could you give please the full command to apply : you suggested me parallel -j 32 --lb gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS
but this doesn't seem to work.
我尝试过的是:parallel -j32 --lb find {} $FINDOPTIONS * ::: */*
,但一段时间后,出现以下错误:gfind: failed to read file names from file system at or below '/': No such file or directory
:
What I have tried is : parallel -j32 --lb find {} $FINDOPTIONS * ::: */*
but after a while, I get the following error : gfind: failed to read file names from file system at or below '/': No such file or directory
:
我想索引主根目录/
中的所有文件,但/
和/System/Volume/Data/
是重复的.
I would like to index all files from main root /
but /
and /System/Volume/Data/
are duplicated.
更新3:如果子目录的数量少于我使用parallel -j32 ...
启动时使用的线程数,是否有一种方法可以指示parallel
命令来浏览所有内容sub-sub etc
sub-sub etc
目录?
UPDATE 3: if the number of subdiretories is lower than the number of threads I use when I launch with parallel -j32 ...
, is there a way to indicate to the parallel
command to explore all the sub-sub etc
sub-sub etc
directories ?
似乎make -j32
具有这种行为(也许我错了),但是在一个子目录上不只有一个进程,而这个子目录可能包含许多子目录,这很有趣.探索并从parallel -j32 ...
启动的所有32个过程中受益.然后,这将避免浪费时间不并行所有这些子目录甚至更深层次的目录.
It seems that make -j32
has this kind of behavior (maybe I am wrong) but this is very interesting to not have only one single process on a subdirectory whereas this subdirectory could contain a lot of number of sub-sub directories to explore and then benefit from all 32 processes launched by parallel -j32 ...
. Then, this would avoid wasting time to not parallelize all these sub-sub directories or even deeper.
更新4:我不知道在@MarkSetchell
建议的命令中该怎么做;例如,如果我在当前目录中有3个子目录:
UPDATE 4: I don't know what to do in the command suggested by @MarkSetchell
; for example, if I have 3 subdirectories in current directory :
# : A2
parallel -j 32 --lb gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS
尤其是要为BUNCH_OF_PATHS放置什么?
especially, what to put for BUNCH_OF_PATHS ?
我必须为此加上选项--localpaths dir1/ dir2/ dir3/
代替BUNCH_OF_PATHS
?以及带有三个点的$FINDOPTIONS ...
术语呢?
Have I got to put for this the option --localpaths dir1/ dir2/ dir3/
instead of BUNCH_OF_PATHS
? and what about the terms $FINDOPTIONS ...
with the 3 dots ?
推荐答案
更新后的答案
问题出在文件/usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
中包含A2
的行之后.目前,其格式为:
The problem is on the line after the line containing A2
in the file /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
. Currently, it is of the form:
# : A2
$find $SEARCHPATHS $FINDOPTIONS \( $prunefs_exp -type d -regex "$PRUNEREGEX" \) -prune -o $print_option
您希望它采用以下形式:
whereas you want it to be of the form:
# : A2
parallel -j 32 --lb gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS
由于您没有提供要并行搜索的路径,因此当前的路径仅为/
,这意味着无法并行执行任何操作.您需要将--localpaths
设置为一堆值得并行搜索或更广泛地修改脚本的地方.但是,说实话,我不确定为什么要加快速度,因为它应该相对很少地运行,然后仅在系统安静时运行.
As you haven't given the paths you wish to search in parallel, the paths at the moment are just /
which means nothing can be done in parallel. You will need to run with --localpaths
set to a bunch of places that are worth searching parallel or hack the script even more extensively. Though, to be honest, I am not sure why you would want to speed this up because it should only be run relatively rarely and then only at times when the system is quiet.
原始答案
转到文件/usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
的第250行,并用井号将其注释掉,如下所示:
Go to around line 250 of file /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
and comment it out with a hash sign so it looks like this:
for binary in $find $frcode
do
#checkbinary $binary
done
这篇关于修改gupdatedb(GNU Updatedb命令)以插入并行命令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!