如何使用parallel命令在MacBook上利用多核并行性? [英] How can I use the parallel command to exploit multi-core parallelism on my MacBook?

查看:657
本文介绍了如何使用parallel命令在MacBook上利用多核并行性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常在Linux和macOS上使用find命令.我刚刚发现了命令parallel,并且希望将它与find命令结合使用,因为当我们在大型目录中搜索特定文件时,find命令会花费很长时间.

I often use the find command on Linux and macOS. I just discovered the command parallel, and I would like to combine it with find command if possible because find command takes a long time when we search a specific file into large directories.

我已经搜索了此信息,但结果不够准确.似乎有很多可能的语法,但是我不能确定哪一个是相关的.

I have searched for this information but the results are not accurate enough. There appear to be a lot of possible syntaxes, but I can't tell which one is relevant.

如何将parallel命令与find命令(或任何其他命令)结合使用,以便从MacBook上拥有的全部16个内核中受益?

How do I combine the parallel command with the find command (or any other command) in order to benefit from all 16 cores that I have on my MacBook?

@OleTange中,我认为我找到了令我感兴趣的命令.

From @OleTange, I think I have found the kind of commands that interests me.

因此,要进一步了解这些命令,我​​想知道以下命令中字符{}:::的作用:

So, to know more about these commands, I would like to know the usefulness of characters {}and :::in the following command :

parallel -j8 find {} ::: *

1)这些字符是强制性的吗?

2)如何插入find命令的经典选项,例如-type f-name '*.txt?

2) How can I insert classical options of find command like -type f or -name '*.txt ?

3)目前,我已经在.zshrc中定义了函数:

3) For the moment I have defined in my .zshrc the function :

ff () {
    find $1 -type f -iname $2 2> /dev/null
}

在固定数量的工作中该怎么做(我也可以将其设置为shell参数)?

How could do the equivalent with a fixed number of jobs (I could also set it as a shell argument)?

推荐答案

当您的工作是

Parallel processing makes sense when your work is CPU bound (the CPU does the work, and the peripherals are mostly idle) but here, you are trying to improve the performance of a task which is I/O bound (the CPU is mostly idle, waiting for a busy peripheral). In this situation, adding parallelism will only add congestion, as multiple tasks will be fighting over the already-starved I/O bandwidth between them.

在macOS上,系统已经为您的所有数据建立了索引(包括文字处理文档,PDF,电子邮件等的内容);右上角的菜单栏上有一个友好的放大镜,您可以在其中访问更快,更通用的搜索,称为Spotlight. (尽管我同意缺少find的某些更复杂的控件;并且用户友好"设计在我猜到我想要的东西并且猜错了时也妨碍了我.)

On macOS, the system already indexes all your data anyway (including the contents of word-processing documents, PDFs, email messages, etc); there's a friendly magnifying glass on the menu bar at the upper right where you can access a much faster and more versatile search, called Spotlight. (Though I agree that some of the more sophisticated controls of find are missing; and the "user friendly" design gets in the way for me when it guesses what I want, and guesses wrong.)

某些Linux发行版提供了类似的功能.我希望这将成为当今使用GUI的所有规范,尽管细节在系统之间会有所不同.

Some Linux distros offer a similar facility; I would expect that to be the norm for anything with a GUI these days, though the details will differ between systems.

在任何类似Unix的系统上,更传统的解决方案是 locate 命令,执行类似但更有限的任务;它将在文件名上创建一个(非常活泼的)索引,因此您可以说

A more traditional solution on any Unix-like system is the locate command, which performs a similar but more limited task; it will create a (very snappy) index on file names, so you can say

locate fnord

非常快速地获取名称与fnord匹配的每个文件.该索引只是从昨晚开始运行find的结果的副本(或者您可以安排后端运行).该命令已经安装在macOS上,但是如果要使用它,则必须启用后端. (只需运行locate locate以获得更多说明.)

to very quickly obtain every file whose name matches fnord. The index is simply a copy of the results of a find run from last night (or however you schedule the backend to run). The command is already installed on macOS, though you have to enable the back end if you want to use it. (Just run locate locate to get further instructions.)

例如,如果您发现自己经常寻找具有特定权限集和特定所有者的文件,则可以构建类似的东西(这些不是locate记录的功能);只需每晚(或每小时一次)运行find即可将这些功能收集到数据库中-甚至只是一个文本文件-您几乎可以立即进行搜索.

You could build something similar yourself if you find yourself often looking for files with a particular set of permissions and a particular owner, for example (these are not features which locate records); just run a nightly (or hourly etc) find which collects these features into a database -- or even just a text file -- which you can then search nearly instantly.

对于并行运行的作业,您确实不需要GNU parallel,尽管它确实为许多用例提供了许多便利和增强功能.您已经有xargs -P. (起源于BSD的macOS上的xargs比GNU xargs受更多限制,这在许多Linux上都可以找到;但是它确实具有-P选项.)

For running jobs in parallel, you don't really need GNU parallel, though it does offer a number of conveniences and enhancements for many use cases; you already have xargs -P. (The xargs on macOS which originates from BSD is more limited than GNU xargs which is what you'll find on many Linuxes; but it does have the -P option.)

例如,以下是使用xargs -P运行八个并行find实例的方法:

For example, here's how to run eight parallel find instances with xargs -P:

printf '%s\n' */ | xargs -I {} -P 8 find {} -name '*.ogg'

(这假设通配符与包含单引号,换行符或其他恶名的目录不匹配; GNU xargs具有-0选项来修复大量类似的情况;然后您可以使用'%s\0'作为printf的格式字符串.)

(This assumes the wildcard doesn't match directories which contain single quotes or newlines or other shenanigans; GNU xargs has the -0 option to fix a large number of corner cases like that; then you'd use '%s\0' as the format string for printf.)

正如 parallel文档容易解释的那样,其一般语法为

As the parallel documentation readily explains, its general syntax is

parallel -options command ...

其中,{}将替换为当前输入行(如果丢失,则会在command ...的末尾隐式添加),并且(显然是可选的):::特殊标记允许您指定一个命令行上的输入源,而不是标准输入.

where {} will be replaced with the current input line (if it is missing, it will be implicitly added at the end of command ...) and the (obviously optional) ::: special token allows you to specify an input source on the command line instead of as standard input.

这些特殊标记之外的所有内容都按原样传递,因此您只需在字面上指定它们,就可以在您的内心添加find选项.

Anything outside of those special tokens is passed on verbatim, so you can add find options at your heart's content just by specifying them literally.

parallel -j8 find {} -type f -name '*.ogg' ::: */

我不会说zsh,但是针对常规POSIX sh进行了重构,您的功能可能类似于

I don't speak zsh but refactored for regular POSIX sh your function could be something like

ff () {
    parallel -j8 find {} -type f -iname "$2" ::: "$1"
}

尽管我可能会切换参数,以便您可以指定名称模式和要搜索的文件列表,àla grep.

though I would perhaps switch the arguments so you can specify a name pattern and a list of files to search, à la grep.

ff () {
    # "local" is not POSIX but works in many sh versions
    local pat=$1
    shift
    parallel -j8 find {} -type f -iname "$pat" ::: "$@"
}

但是,再次旋转磁盘以查找已经建立索引的内容可能是您应该停止做的事情,而不是为了方便.

But again, spinning your disk to find things which are already indexed is probably something you should stop doing, rather than facilitate.

这篇关于如何使用parallel命令在MacBook上利用多核并行性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆