python 5x比perl mySql查询慢 [英] python 5x slower than perl mySql query
问题描述
我正在将代码从perl转换为python. 即使其工作原理完全相同,在Python中也有一部分代码比perl慢5倍,我不知道为什么.
I am translating a code from perl to python. Even if it works exactly the same, there is a part of the code that is 5x slower in python than in perl and I cannot figure out why.
perl和python以及mysql数据库都在同一台机器上.
Both perl and python are in the same machine, as well as the mysql database.
代码查询数据库以下载表的所有列,然后处理每一行. 有超过500万行要处理,最大的问题是将数据从数据库检索到python处理.
The code queries a db to download all columns of a table and then process each row. There are more than 5 million rows to process and the big issue is in retrieving the data from the database to the python processing.
在此附上两个代码示例: Python:
Here I attach the two code samples: Python:
import os
import mysql.connector **<--- import mySqlDb**
import time
outDict = dict()
## DB parameters
db = mysql.connector.connect **<----- mySqlDb.connect( ...**
(host=dbhost,
user=username, # your username
passwd=passw, # your password
db=database) # name of the data base
cur = db.cursor(prepared=True)
sql = "select chr,pos,lengthofrepeat,copyNum,region from db.Table_simpleRepeat;"
cur.execute(sql)
print('\t eDiVa public omics start')
s = time.time()
sz = 1000
rows = cur.fetchall()
for row in rows:
## process out dict
print time.time() - s
cur.close()
db.close()
Perl等效脚本出现在这里:
While here comes the Perl equivalent script:
use strict;
use Digest::MD5 qw(md5);
use DBI;
use threads;
use threads::shared;
my $dbh = DBI->connect('dbi:mysql:'.$database.';host='.$dbhost.'',$username,$pass)
or die "Connection Error!!\n";
my $sql = "select chr,pos,lengthofrepeat,copyNum,region from db.Table_simpleRepeat\;";
## prepare statement and query
my $stmt = $dbh->prepare($sql);
$stmt->execute or die "SQL Error!!\n";
my $c = 0;
#process query result
while (my @res = $stmt->fetchrow_array)
{
$edivaStr{ $res[0].";".$res[1] } = $res[4].",".$res[2];
$c +=1;
}
print($c."\n");
## close DB connection
$dbh->disconnect();
这两个脚本的运行时为:
The runtime for these two scripts is:
- Pers脚本大约40s
- 〜200s(用于Python脚本)
我无法弄清楚为什么会发生这种情况[我尝试使用fetchone()或fetchmany()来查看是否存在内存问题,但运行时最多比200s减少了10%.
I cannot figure out why this happens [I tried using fetchone() or fetchmany() to see if there are memory issues but the runtime at most reduces 10% from the 200s].
我的主要问题是理解为什么两个在功能上等效的代码块之间存在如此相关的性能差异.
My main problem is understanding why there is such a relevant performance difference between the two functionally equivalent code blocks.
任何有关如何验证正在发生的事情的想法都将不胜感激.
Any idea about how can I verify what is happening would be greatly appreciated.
谢谢!
Peeyush的评论可能是一个答案,我希望他发布它,因为它使我能够找到解决方案.
Peeyush'comment could be an answer and I'd like him to post it because it allowed me to find a solution.
问题是python连接器.我只是将mySqlDb模块更改为C编译模块.这使得python代码比perl代码快一点.
The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.
我在Python代码中添加了< ----"来显示更改,以显示获得性能的难易程度.
I added the changes in the python code with a <---- "" to show how easy it has been to gain performance.
推荐答案
问题是python连接器.我只是将mySqlDb模块更改为C编译模块.这使得python代码比perl代码快一点.
The problem is the python connector. I just changed that for mySqlDb module which is a C compiled module. That made the python code slightly faster than the perl code.
我在python代码中添加了< ----"来显示更改,以显示获得性能的难易程度
I added the changes in the python code with a <---- "" to show how easy it has been to gain performance
这篇关于python 5x比perl mySql查询慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!