Python 代码性能随着线程化而降低 [英] Python code performance decreases with threading

查看：22 发布时间：2022/1/4 8:07:34 python multithreading performance io

本文介绍了Python 代码性能随着线程化而降低的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用 Python 编写了一个工作程序，它基本上解析了一批二进制文件，将数据提取到数据结构中.每个文件大约需要一秒钟的时间来解析，对于数千个文件来说，这意味着数小时.我已经成功地实现了线程数可调的批处理解析方法的线程版本.我在具有不同线程数的 100 个文件上测试了该方法，并对每次运行进行计时.以下是结果(0 个线程是指我的原始线程前代码，1 个线程指向新版本运行时生成了一个线程).

I've written a working program in Python that basically parses a batch of binary files, extracting data into a data structure. Each file takes around a second to parse, which translates to hours for thousands of files. I've successfully implemented a threaded version of the batch parsing method with an adjustable number of threads. I tested the method on 100 files with a varying number of threads, timing each run. Here are the results (0 threads refers to my original, pre-threading code, 1 threads to the new version run with a single thread spawned).

0 threads: 83.842 seconds
1 threads: 78.777 seconds
2 threads: 105.032 seconds
3 threads: 109.965 seconds
4 threads: 108.956 seconds
5 threads: 109.646 seconds
6 threads: 109.520 seconds
7 threads: 110.457 seconds
8 threads: 111.658 seconds

虽然与让主线程完成所有工作相比，生成一个线程会带来很小的性能提升，但增加线程数量实际上降低性能.我本来希望看到性能提升，至少有四个线程(我的机器的每个内核一个).我知道线程有相关的开销，但我不认为这对于一位数的线程会如此重要.

Though spawning a thread confers a small performance increase over having the main thread do all the work, increasing the number of threads actually decreases performance. I would have expected to see performance increases, at least up to four threads (one for each of my machine's cores). I know threads have associated overhead, but I didn't think this would matter so much with single-digit numbers of threads.

我听说过全局解释器锁"，但是当我移动到四个线程时，我确实看到了相应数量的内核在工作——两个线程两个内核在解析过程中显示活动，依此类推.

I've heard of the "global interpreter lock", but as I move up to four threads I do see the corresponding number of cores at work--with two threads two cores show activity during parsing, and so on.

我还测试了一些不同版本的解析代码，看看我的程序是否是IO绑定的.似乎不是；仅读取文件所需的时间相对较少；处理文件几乎就是全部.如果我不执行 IO 并处理文件的已读版本，我添加第二个线程会损害性能，而第三个线程会稍微提高性能.我只是想知道为什么我不能利用计算机的多核来加快速度.请发布任何我可以澄清的问题或方法.

I also tested some different versions of the parsing code to see if my program is IO bound. It doesn't seem to be; just reading in the file takes a relatively small proportion of time; processing the file is almost all of it. If I don't do the IO and process an already-read version of a file, I adding a second thread damages performance and a third thread improves it slightly. I'm just wondering why I can't take advantage of my computer's multiple cores to speed things up. Please post any questions or ways I could clarify.

Python 代码性能随着线程化而降低 [英] Python code performance decreases with threading

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python 代码性能随着线程化而降低 [英] Python code performance decreases with threading

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭