在awk中并行处理? [英] Parallel processing in awk?

查看:74
本文介绍了在awk中并行处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Awk逐行处理文件.假设每个行的操作都不依赖于其他行,是否有任何方法可以让awk一次并行处理多行?是否有其他文本处理工具可以自动利用并行性并更快地处理数据?

解决方案

唯一尝试提供awk并行实现的awk实现是 解决方案

The only awk implementation that was attempting to provide a parallel implementation of awk was parallel-awk but it looks like the project is dead now.

Otherwise, one way to parallelize awk is be to split your input in chunks and process them in parallel. However, splitting the input data would still be single threaded so might defeat the performance enhancement goal, the main issue being the standard split command is unable to split at line boundaries without reading each and every line.

If you have GNU split available, or a version that support the -n l/* option, here is one optimized way to process your file in parallel, assuming here you have 8 vCPUs:

inputfile=input.txt
outputfile=output.txt
script=script.awk
count=8

split -n l/$count $inputfile /tmp/_pawk$$
for file in /tmp/_pawk$$*; do
    awk -f script.awk $file > ${file}.out &
done
wait
cat /tmp/_pawk$$*.out > $outputfile
rm /tmp/_pawk$$*

这篇关于在awk中并行处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆