Python慢读性能问题 [英] Python slow read performance issue

查看：115 发布时间：2018/8/24 17:15:07 python performance perl io

本文介绍了Python慢读性能问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在早期的一个帖子之后，我把我的问题归结为它的问题，在从Perl脚本迁移到Python的过程中，我发现在Python中使用slurping文件存在巨大的性能问题。在Ubuntu Server上运行。

Following an earlier thread I boiled down my problem to it's bare bones, in migrating from a Perl script to a Python one I found a huge performance issue with slurping files in Python. Running this on Ubuntu Server.

注意：这不是X和Y线程我需要从根本上了解它是如何实现的，或者我是在做什么愚蠢。

NB: this is not a X vs. Y thread I need to know fundamentally if this is how it is or if I'm doing something stupid.

我创建了我的测试数据，50,000个10kb文件（这反映了我正在处理的平均文件大小）：

I created my test data, 50,000 10kb files (this mirrors the avg file size of what I'm processing):

mkdir 1
cd 1
for i in {1..50000}; do dd if=/dev/zero of=$i.xml bs=1 count=10000; done
cd ..
cp -r 1 2

创建我的2个脚本尽可能简单：

Created my 2 scripts as simply as possible:

Perl

foreach my $file (<$ARGV[0]/*.xml>){
    my $fh;
    open($fh, "< $file");
    my $contents = do { local $/; <$fh> };
    close($fh);
}

Python

import glob, sys
for file in glob.iglob(sys.argv[1] + '/*.xml'):
    with open(file) as x:
        f = x.read()

然后我清除了缓存并运行了我的2个slurp脚本，在每次运行之间我再次使用以下方法清理缓存：

I then cleared the caches and ran my 2 slurp scripts, between each run I cleaned the caches again using:

sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

然后进行监控以确保每次都从磁盘读取所有内容：

Then monitored to ensure it was reading everything from disk each time:

sudo iotop -a -u me

我在具有RAID 10磁盘的物理计算机上尝试了这一点，并在一个全新的VM上设置了VM在RAID 1 SSD上的设置，刚刚包含了我的VM的测试运行物理服务器差不多快。

I tried this on a physical machine with RAID 10 disks and on a brand new VM I setup where the VM is on RAID 1 SSDs, have just included the test runs from my VM as the physical server was much the same just faster.

$ time python readFiles.py 1
    real    5m2.493s
    user    0m1.783s
    sys     0m5.013s

$ time perl readFiles.pl 2
    real    0m13.059s
    user    0m1.690s
    sys     0m2.471s

$ time perl readFiles.pl 2
    real    0m13.313s
    user    0m1.670s
    sys     0m2.579s

$ time python readFiles.py 1
    real    4m43.378s
    user    0m1.772s
    sys     0m4.731s

当Perl运行DISK READ时，我在iotop上注意到了当运行Python DISK READ时，速度约为45 M / s，IOWAIT约为70％，而IOWAIT为97％。我不知道从哪里开始将它们煮至尽可能简单。

I noticed on iotop when Perl was running DISK READ was around 45 M/s and IOWAIT approx 70%, when running Python DISK READ was 2M/s and IOWAIT 97%. I'm not sure where to go from here having boiled them down to as simple as I can get.

如果相关

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2

$ perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi

根据要求提供更多信息

我运行了strace并获取了文件1000.xml的信息，但似乎都做了同样的事情：

I ran strace and grabbed the info for file 1000.xml but all seem to do the same things:

Perl

$strace -f -T -o trace.perl.1 perl readFiles.pl 2

32303 open("2/1000.xml", O_RDONLY)      = 3 <0.000020>
32303 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff7f6f7b90) = -1 ENOTTY (Inappropriate ioctl for device) <0.000016>
32303 lseek(3, 0, SEEK_CUR)             = 0 <0.000016>
32303 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000016>
32303 fcntl(3, F_SETFD, FD_CLOEXEC)     = 0 <0.000017>
32303 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000030>
32303 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.005323>
32303 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 1808 <0.000022>
32303 read(3, "", 8192)                 = 0 <0.000019>
32303 close(3)                          = 0 <0.000017>

Python

$strace -f -T -o trace.python.1 python readFiles.py 1

32313 open("1/1000.xml", O_RDONLY)      = 3 <0.000021>
32313 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000017>
32313 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000019>
32313 lseek(3, 0, SEEK_CUR)             = 0 <0.000018>
32313 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000018>
32313 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa18820a000 <0.000019>
32313 lseek(3, 0, SEEK_CUR)             = 0 <0.000018>
32313 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.006795>
32313 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 1808 <0.000031>
32313 read(3, "", 4096)                 = 0 <0.000018>
32313 close(3)                          = 0 <0.000027>
32313 munmap(0x7fa18820a000, 4096)      = 0 <0.000022>

我注意到的一个区别是，不确定它是否相关，是Perl似乎针对所有文件运行此在它开始打开之前，而python没有：

One difference I noticed, not sure if it's relevant, is that Perl appears to run this against all files before it starts opening them whereas python doesn't:

32303 lstat("2/1000.xml", {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000022>

还使用-c运行strace（只需拨打几个电话）：

Also ran strace with -c (just took top few calls):

Perl

$ time strace -f -c perl readFiles.pl 2
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 44.07    3.501471          23    150018           read
 12.54    0.996490          10    100011           fstat
  9.47    0.752552          15     50000           lstat
  7.99    0.634904          13     50016           open
  6.89    0.547016          11     50017           close
  6.19    0.491944          10     50008     50005 ioctl
  6.12    0.486208          10     50014         3 lseek
  6.10    0.484374          10     50001           fcntl

real    0m37.829s
user    0m6.373s
sys     0m25.042s

Python

$ time strace -f -c python readFiles.py 1
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.97    4.186173          28    150104           read
 15.58    1.518304          10    150103           fstat
 10.51    1.023681          20     50242       174 open
 10.12    0.986350          10    100003           lseek
  7.69    0.749387          15     50047           munmap
  6.85    0.667576          13     50071           close
  5.90    0.574888          11     50073           mmap

real    5m5.237s
user    0m7.278s
sys     0m30.736s

在-T打开的情况下对strace输出进行了一些解析并计算了每个文件读取的第一个8192字节，很明显这是时间的地方，下面是50000首次读取所花费的总时间一个文件后跟每次读取的平均时间。

Did some parsing of the strace output with -T turned on and counted the first 8192 byte read for each file and it's clear this is where the time is going, below is the total time spent for the 50000 first reads of a file followed by the average time for each read.

300.247128000002 (0.00600446220302379)   - Python
11.6845620000003 (0.000233681892724297)  - Perl

不确定是否有帮助！

UPDATE 2
更新Python中的代码以使用os.open和os.read并只读取一次4096字节（这对我来说可以作为我想要的信息在文件的顶部），也消除了strace中的所有其他调用：

UPDATE 2 Updated code in Python to use os.open and os.read and just do a single read of first 4096 bytes (that would work for me as info I want is in top part of file), also eliminates all the other calls in strace:

18346 open("1/1000.xml", O_RDONLY)      = 3 <0.000026>
18346 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.007206>
18346 close(3)                          = 0 <0.000024>

$ time strace -f -c python readFiles.py 1
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.39    2.388932          48     50104           read
 22.86    0.986096          20     50242       174 open
 20.72    0.893579          18     50071           close

real    4m48.751s
user    0m3.078s
sys     0m12.360s

Total Time (avg read call)
282.28626 (0.00564290374812595)

仍然没有更好......接下来我将在Azure上创建一个虚拟机并尝试另一个例子!!

Still no better...next up I'm going to create a VM on Azure and try there for another example!!

更新3 - 为此大小道歉!!

在3个设置上使用（@JFSebastian）脚本确定一些有趣的结果，剥离输出在开始时为了简洁，还删除了所有刚从缓存运行超快的测试，看起来像：

Ok some interesting results using your (@J.F.Sebastian) script on 3 setups, stripped the output at start for brevity and also removed all the tests which just run super fast from cache and look like:

0.23user 0.26system 0:00.50elapsed 99%CPU (0avgtext+0avgdata 9140maxresident)k
0inputs+0outputs (0major+2479minor)pagefaults 0swaps

Azure A2标准VM（2核3.5GB RAM磁盘未知但速度慢）

$ uname -a
Linux servername 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
$ perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
(with 41 registered patches, see perl -V for more detail)

+ /usr/bin/time perl slurp.pl 1
1.81user 2.95system 3:11.28elapsed 2%CPU (0avgtext+0avgdata 9144maxresident)k
1233840inputs+0outputs (20major+2461minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 1
1.56user 3.76system 3:06.05elapsed 2%CPU (0avgtext+0avgdata 8024maxresident)k
1232232inputs+0outputs (14major+52273minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 2
1.90user 3.11system 6:02.17elapsed 1%CPU (0avgtext+0avgdata 9144maxresident)k
1233776inputs+0outputs (16major+2465minor)pagefaults 0swaps

两者的可比第一次啜饮结果，不是确定在第二次Perl啜饮期间发生了什么？

Comparable first slurp results for both, not sure what was going on during the 2nd Perl slurp?

我的VMWare Linux VM（2核8GB RAM磁盘RAID1 SSD）

$ uname -a
Linux servername 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
$ perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
(with 41 registered patches, see perl -V for more detail)

+ /usr/bin/time perl slurp.pl 1
1.66user 2.55system 0:13.28elapsed 31%CPU (0avgtext+0avgdata 9136maxresident)k
1233152inputs+0outputs (20major+2460minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 1
2.10user 4.67system 4:45.65elapsed 2%CPU (0avgtext+0avgdata 8012maxresident)k
1232056inputs+0outputs (14major+52269minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 2
2.13user 4.11system 5:01.40elapsed 2%CPU (0avgtext+0avgdata 9140maxresident)k
1233264inputs+0outputs (16major+2463minor)pagefaults 0swaps

这次和以前一样，Perl是在第一次啜饮时速度更快，不确定在第二次Perl啜食时发生了什么，虽然之前没有看到这种行为。再次测量measure.sh并且结果完全相同或者花费几秒钟。然后，我做了正常人会做的事情并更新内核以匹配Azure机器3.13.0-35-generic并再次运行measure.sh并且对结果没有任何影响。

This time, as before, Perl is way faster on first slurp, unsure what is happening on second Perl slurp though not seen this behaviour before. Ran measure.sh again and result was exactly the same give or take a few seconds. I then did what any normal person would do and updated the kernel to match the Azure machine 3.13.0-35-generic and ran measure.sh again and made no difference to results.

出于好奇，我随后在measure.sh中交换了1和2参数，发生了一些奇怪的事情.Perl放慢速度，Python加速了！

Out of curiosity I then swapped the 1 and 2 parameter in measure.sh and something strange happened..Perl slowed down and Python sped up!

+ /usr/bin/time perl slurp.pl 2
1.78user 3.46system 4:43.90elapsed 1%CPU (0avgtext+0avgdata 9140maxresident)k
1234952inputs+0outputs (21major+2458minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 2
1.19user 3.09system 0:10.67elapsed 40%CPU (0avgtext+0avgdata 8012maxresident)k
1233632inputs+0outputs (14major+52269minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 1
1.36user 2.32system 0:13.40elapsed 27%CPU (0avgtext+0avgdata 9136maxresident)k
1232032inputs+0outputs (17major+2465minor)pagefaults 0swaps

这让我更加困惑： - （

This has just confused me even further :-(

物理服务器（32核132 GB RAM磁盘RAID10 SAS）

$ uname -a
Linux servername 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ python
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3] on linux2
$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 55 registered patches, see perl -V for more detail)

+ /usr/bin/time perl slurp.pl 1
2.22user 2.60system 0:15.78elapsed 30%CPU (0avgtext+0avgdata 43728maxresident)k
1233264inputs+0outputs (15major+2984minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 1
2.51user 4.79system 1:58.53elapsed 6%CPU (0avgtext+0avgdata 34256maxresident)k
1234752inputs+0outputs (16major+52385minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 2
2.17user 2.95system 0:06.96elapsed 73%CPU (0avgtext+0avgdata 43744maxresident)k
1232008inputs+0outputs (14major+2987minor)pagefaults 0swaps

Perl似乎每次都赢了。

Here Perl seems to win every time.

困惑

鉴于本地虚拟机的奇怪之处，当我交换目录时，这是我最能控制的机器，我将尝试使用1或2作为数据目录运行python vs perl的所有可能选项的二进制方法，并尝试多次运行它们以保持一致性但它会需要一段时间，我会有点疯狂，所以首先需要休息！我想要的只是一致性： - （

Given the oddity on my local VM, when I swapped directories, which is the machine I have most control over I'm going to try a binary approach on all the possible options of running python vs perl using 1 or 2 as the data directory and try to run them multiple times for consistency but it'll take a while and I'm going a little crazy so break may be required first! All I want is consistency :-(

更新4 - 一致性

（下面是在ubuntu-14.04.1服务器虚拟机上运行，内核是3.13.0-35-通用＃62-Ubuntu）

(Below is run on an ubuntu-14.04.1-server VM, Kernel is 3.13.0-35-generic #62-Ubuntu)

我想我发现了一些一致性，为数据目录1/2上的Python / Perl slurp运行测试尽可能地发现以下内容：

I think I've found some consistency, running the tests every way possible for Python/Perl slurp on data dir 1/2 I found the following:

Python总是很慢在创建的文件上（即由dd创建）

Python总是快速复制文件（即由cp -r创建）

Perl总是很快在创建的文件上（即由dd创建）

Perl在复制的文件上总是很慢（即由cp -r创建）

Python is always slow on created files (i.e. created by dd)
Python is always fast on copied files (i.e. created by cp -r)
Perl is always fast on created files (i.e. created by dd)
Perl is always slow on copied files (i.e. created by cp -r)

所以我看了操作系统级复制，看起来像Ubuntu'cp'的行为方式与Python相同，即原始文件速度慢，复制文件速度快。

So I looked at OS level copying and it seems like on Ubuntu 'cp' behaves in the same way as Python, i.e. slow on original files and fast on copied files.

这就是我跑的和结果，我在一台配备SATA HD和RAID10系统的机器上做了几次，结果：

This is what I ran and the results, I did this a few times on a machine with a single SATA HD and on a RAID10 system, results:

$ mkdir 1
$ cd 1
$ for i in {1..50000}; do dd if=/dev/urandom of=$i.xml bs=1K count=10; done
$ cd ..
$ cp -r 1 2
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy2c cp -r 2 2copy
    real    0m28.624s
    user    0m1.429s
    sys     0m27.558s
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy1c cp -r 1 1copy
    real    5m21.166s
    user    0m1.348s
    sys     0m30.717s

跟踪结果显示时间花在哪里

Trace results show where time is being spent

$ head trace.copy1c trace.copy2c
==> trace.copy1c <==
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 60.09    2.541250          25    100008           read
 12.22    0.516799          10     50000           write
  9.62    0.406904           4    100009           open
  5.59    0.236274           2    100013           close
  4.80    0.203114           4     50004         1 lstat
  4.71    0.199211           2    100009           fstat
  2.19    0.092662           2     50000           fadvise64
  0.72    0.030418         608        50           getdents
==> trace.copy2c <==
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.86    0.802376           8    100008           read
 13.55    0.227108           5     50000           write
 13.02    0.218312           2    100009           open
  7.36    0.123364           1    100013           close
  6.83    0.114589           1    100009           fstat
  6.31    0.105742           2     50004         1 lstat
  3.38    0.056634           1     50000           fadvise64
  1.62    0.027191         544        50           getdents

因此，复制副本似乎比复制原始文件快得多，我目前的猜测是，复制时文件在磁盘上的排列比最初创建时更好，这使得它们更有效地阅读？

So it seems copying copies is much faster than copying original files, my current guess is that when copied the files get aligned on disk better than when they were originally created making them more efficient to read?

有趣的是'rsyn'和'cp '似乎以相反的方式工作eedwise，很像Perl和Python！

Interestingly 'rsyn' and 'cp' seem to work in opposite ways speedwise, much like Perl and Python!

$ rm -rf 1copy 2copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Rsync 1"; /usr/bin/time rsync -a 1 1copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Rsync 2"; /usr/bin/time rsync -a 2 2copy
Rsync 1
    3.62user 3.76system 0:13.00elapsed 56%CPU (0avgtext+0avgdata 5072maxresident)k
    1230600inputs+1200000outputs (13major+2684minor)pagefaults 0swaps
Rsync 2
    4.87user 6.52system 5:06.24elapsed 3%CPU (0avgtext+0avgdata 5076maxresident)k
    1231832inputs+1200000outputs (13major+2689minor)pagefaults 0swaps

$ rm -rf 1copy 2copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Copy 1"; /usr/bin/time cp -r 1 1copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Copy 2"; /usr/bin/time cp -r 2 2copy
Copy 1
    0.48user 6.42system 5:05.30elapsed 2%CPU (0avgtext+0avgdata 1212maxresident)k
    1229432inputs+1200000outputs (6major+415minor)pagefaults 0swaps
Copy 2
    0.33user 4.17system 0:11.13elapsed 40%CPU (0avgtext+0avgdata 1212maxresident)k
    1230416inputs+1200000outputs (6major+414minor)pagefaults 0swaps

Python慢读性能问题 [英] Python slow read performance issue

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python慢​​读性能问题 [英] Python slow read performance issue

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python慢读性能问题 [英] Python slow read performance issue

登录关闭