如何使用并行命令在 MacBook 上利用多核并行性? [英] How can I use the parallel command to exploit multi-core parallelism on my MacBook?

查看:25
本文介绍了如何使用并行命令在 MacBook 上利用多核并行性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常在 Linux 和 macOS 上使用 find 命令.我刚刚发现命令 parallel,如果可能的话,我想将它与 find 命令结合起来,因为 find 命令在我们搜索时需要很长时间将特定文件放到大目录中.

I often use the find command on Linux and macOS. I just discovered the command parallel, and I would like to combine it with find command if possible because find command takes a long time when we search a specific file into large directories.

我已搜索过此信息,但结果不够准确.似乎有很多可能的语法,但我不知道哪个是相关的.

I have searched for this information but the results are not accurate enough. There appear to be a lot of possible syntaxes, but I can't tell which one is relevant.

如何将 parallel 命令与 find 命令(或任何其他命令)结合使用,以便从 MacBook 上的所有 16 个内核中受益?

How do I combine the parallel command with the find command (or any other command) in order to benefit from all 16 cores that I have on my MacBook?

@OleTange,我想我找到了我感兴趣的那种命令.

From @OleTange, I think I have found the kind of commands that interests me.

所以,要了解更多关于这些命令的信息,我想知道字符 {}::: 在以下命令中的用处:

So, to know more about these commands, I would like to know the usefulness of characters {}and :::in the following command :

parallel -j8 find {} ::: *

1) 这些字符是强制性的吗?

2) 如何插入 find 命令的经典选项,例如 -type f-name '*.txt ?

2) How can I insert classical options of find command like -type f or -name '*.txt ?

3) 目前我在 .zshrc 中定义了函数:

3) For the moment I have defined in my .zshrc the function :

ff () {
    find $1 -type f -iname $2 2> /dev/null
}

如何用固定数量的作业做等价物(我也可以将其设置为 shell 参数)?

How could do the equivalent with a fixed number of jobs (I could also set it as a shell argument)?

推荐答案

当你的工作是 CPU bound(CPU 完成工作,外围设备大多处于空闲状态)但在这里,您正在尝试提高 I/O bound (CPU 大部分时间处于空闲状态,等待繁忙的外围设备).在这种情况下,添加并行性只会增加拥塞,因为多个任务将争夺它们之间已经不足的 I/O 带宽.

Parallel processing makes sense when your work is CPU bound (the CPU does the work, and the peripherals are mostly idle) but here, you are trying to improve the performance of a task which is I/O bound (the CPU is mostly idle, waiting for a busy peripheral). In this situation, adding parallelism will only add congestion, as multiple tasks will be fighting over the already-starved I/O bandwidth between them.

在 macOS 上,系统已经为您的所有数据建立了索引(包括文字处理文档、PDF、电子邮件等的内容);右上角的菜单栏上有一个友好的放大镜,您可以在其中访问更快、更通用的搜索,称为 Spotlight.(虽然我同意缺少一些更复杂的 find 控件;并且用户友好"的设计在猜测我想要什么并且猜测错误时阻碍了我.)

On macOS, the system already indexes all your data anyway (including the contents of word-processing documents, PDFs, email messages, etc); there's a friendly magnifying glass on the menu bar at the upper right where you can access a much faster and more versatile search, called Spotlight. (Though I agree that some of the more sophisticated controls of find are missing; and the "user friendly" design gets in the way for me when it guesses what I want, and guesses wrong.)

一些 Linux 发行版提供了类似的功能;我希望这将成为当今任何带有 GUI 的东西的规范,尽管系统之间的细节会有所不同.

Some Linux distros offer a similar facility; I would expect that to be the norm for anything with a GUI these days, though the details will differ between systems.

在任何类 Unix 系统上,更传统的解决方案是 locate 命令,执行类似但更有限的任务;它会在文件名上创建一个(非常活泼的)索引,所以你可以说

A more traditional solution on any Unix-like system is the locate command, which performs a similar but more limited task; it will create a (very snappy) index on file names, so you can say

locate fnord

快速获取名称与 fnord 匹配的每个文件.该索引只是昨晚运行的 find 结果的副本(或者您安排后端运行).该命令已安装在 macOS 上,但如果要使用它,则必须启用后端.(只需运行 locate locate 即可获得进一步的说明.)

to very quickly obtain every file whose name matches fnord. The index is simply a copy of the results of a find run from last night (or however you schedule the backend to run). The command is already installed on macOS, though you have to enable the back end if you want to use it. (Just run locate locate to get further instructions.)

如果您发现自己经常寻找具有特定权限集和特定所有者的文件,您可以自己构建类似的东西,例如(这些不是 locate 记录的功能);只需运行每晚(或每小时等)find 将这些功能收集到数据库中 - 甚至只是一个文本文件 - 然后您几乎可以立即搜索.

You could build something similar yourself if you find yourself often looking for files with a particular set of permissions and a particular owner, for example (these are not features which locate records); just run a nightly (or hourly etc) find which collects these features into a database -- or even just a text file -- which you can then search nearly instantly.

对于并行运行作业,您实际上并不需要 GNU parallel,尽管它确实为许多用例提供了许多便利和增强功能;你已经有了 xargs -P.(macOS 上源自 BSD 的 xargs 比 GNU xargs 更受限制,GNU xargs 是您在许多 Linux 上可以找到的;但它确实有 -P 选项.)

For running jobs in parallel, you don't really need GNU parallel, though it does offer a number of conveniences and enhancements for many use cases; you already have xargs -P. (The xargs on macOS which originates from BSD is more limited than GNU xargs which is what you'll find on many Linuxes; but it does have the -P option.)

例如,下面是如何使用 xargs -P 运行八个并行 find 实例:

For example, here's how to run eight parallel find instances with xargs -P:

printf '%s
' */ | xargs -I {} -P 8 find {} -name '*.ogg'

(假设通配符不匹配包含单引号或换行符或其他恶作剧的目录;GNU xargs 具有 -0 选项来修复大量像这样的极端情况;那么您将使用 '%s' 作为 printf 的格式字符串.)

(This assumes the wildcard doesn't match directories which contain single quotes or newlines or other shenanigans; GNU xargs has the -0 option to fix a large number of corner cases like that; then you'd use '%s' as the format string for printf.)

parallel 文档 很容易解释,其一般语法为

As the parallel documentation readily explains, its general syntax is

parallel -options command ...

其中 {} 将替换为当前输入行(如果缺少,它将被隐式添加到 command ... 的末尾)和(显然是可选的)::: 特殊标记允许您在命令行上指定输入源,而不是作为标准输入.

where {} will be replaced with the current input line (if it is missing, it will be implicitly added at the end of command ...) and the (obviously optional) ::: special token allows you to specify an input source on the command line instead of as standard input.

这些特殊标记之外的任何内容都是逐字传递的,因此您只需逐字指定它们即可在您心中的内容中添加 find 选项.

Anything outside of those special tokens is passed on verbatim, so you can add find options at your heart's content just by specifying them literally.

parallel -j8 find {} -type f -name '*.ogg' ::: */

我不会说 zsh 但为常规 POSIX sh 重构您的函数可能类似于

I don't speak zsh but refactored for regular POSIX sh your function could be something like

ff () {
    parallel -j8 find {} -type f -iname "$2" ::: "$1"
}

虽然我可能会切换参数,以便您可以指定名称模式和要搜索的文件列表,à la grep.

though I would perhaps switch the arguments so you can specify a name pattern and a list of files to search, à la grep.

ff () {
    # "local" is not POSIX but works in many sh versions
    local pat=$1
    shift
    parallel -j8 find {} -type f -iname "$pat" ::: "$@"
}

但同样,旋转磁盘以查找已编入索引的内容可能是您应该停止做的事情,而不是促进.

But again, spinning your disk to find things which are already indexed is probably something you should stop doing, rather than facilitate.

这篇关于如何使用并行命令在 MacBook 上利用多核并行性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆