Python(2.5)比纯C(Mingw)更快地读取输入文件 [英] Python(2.5) reads an input file FASTER than pure C(Mingw)

查看:81
本文介绍了Python(2.5)比纯C(Mingw)更快地读取输入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的两个代码都读取相同的巨大(~35MB)文本文件。

在文件1000000行中,每行的长度< 99个字符。


稳定的结果:

Python运行~0.65s

C:~0.70s


有什么想法吗?

导入时间

t = time.time()

f = open(''D:\ \ some.txt'',''r'')

z = f.readlines()

f.close()

打印len(z)

打印time.time() - t

m =输入()

打印z [m]

#include< cstdio>

#include< cstdlib>

#include< iostream>

#include< ctime> ;


使用命名空间std;

char vs [1002000] [99];

FILE * fp = fopen(" D:\\some.txt"," r");


int main(){

int i = 0;

while(true){

if(!fgets(vs [i],999,fp))break;

++ i;

}

fclose(fp);

cout<< i<< endl;

cout<< clock()/ CLOCKS_PER_SEC<<结束;


int m;

cin> m;

cout<< vs [m];

系统(暂停);

返回0;

}

解决方案

4月26日上午11点10分,n00m< n ... @ narod.ruwrote:


Both下面的代码读取相同的巨大(~35MB)文本文件。

在文件1000000行中,每行的长度< 99个字符。


稳定的结果:

Python运行~0.65s

C:~0.70s


有什么想法吗?



是的。


Python示例中的大多数脏工作都花在紧密循环上
$ b用C语言编写的$ b。在Windows上的Python上,你的C可能比你的C更快。示例有几个原因:


1. Python是使用Microsoft的C编译器编译的,它比Mingw产生更多的b $ b b优化代码。


2. Python readline()函数已经在库中使用了很长时间,并且有时间让很多开发人员对它进行优化

表现。


3.您的纯C代码甚至不是C,更不用说纯粹的C.它是C ++。在大多数系统上,C ++ iostream库的开销比

C'的stdio要多得多。


And,最后,我们必须观察到你在没有启动的情况下测量了这些时间,这对于Python来说显然要大得多。 (

当然,我们只需指出这一点,所以它没有被误解为

你声称这个Python进程将比C ++更快地终止/>
one。)

所以,我必须遗憾地认为你的例子并不是很有意义。


导入时间

t = time.time()

f = open(''D:\\some.txt'',''r'')

z = f.readlines()

f.close()

print len(z)

打印时间.time() - t

m =输入()

打印z [m]


#include< cstdio>

#include< cstdlib>

#include< iostream>

#include< ctime>


using namespace std;

char vs [1002000] [99];

FILE * fp = fopen(" D:\\some.txt" ,r);


int main(){

int i = 0;

while(true){

if(!fgets(vs [i],9 99,fp))休息;

++我;

}

fclose(fp);

cout << i<< endl;

cout<< clock()/ CLOCKS_PER_SEC<<结束;


int m;

cin> m;

cout<< vs [m];

系统(暂停);

返回0;


}


fgets()来自C ++ iostream库???


我想如果我想出了Python读取 SLOWER比C>

我会得到另一个(不是更少)聪明的解释为什么它如此。




" n00m" < n0 ** @ berodt中的narod.ruschreef

新闻:6a **************************** ****** @ a23g2000 hsc.googlegroups.com ...


导入时间

t = time.time()

f = open(''D:\\some.txt'',''r'')

z = f.readlines()

f.close()

print len(z)

print time.time() - t

m = input()

打印z [m]


#include< cstdio>

#include< cstdlib>

#include< iostream>

#include< ctime>


using namespace std;

char vs [1002000] [ 99];

FILE * fp = fopen(" D:\\some.txt"," r");


int main (){

int i = 0;

while(true){

if(!fgets(vs [i],999,fp) ))break;

++ i;

}



首先我会重写C循环to:


int main(){

int i = 0;

while(fgets(vs [i],999,fp))

++ i;

}


但我认为差异来自于你在开始时所做的事情。

C源:


char vs [1002000] [99];


这保留了99,198,000字节,因此预计C

代码中会有大量缓存消失! br />

互联网上是否有f.readlines的实现?

有兴趣了解它们是如何实现的。我很确定他们这样做了比没有保留100meg数据更聪明



fclose(fp) ;

cout<< i<< endl;

cout<< clock()/ CLOCKS_PER_SEC<<结束;


int m;

cin> m;

cout<< vs [m];

系统(暂停);

返回0;

}


Both codes below read the same huge(~35MB) text file.
In the file 1000000 lines, the length of each line < 99 chars.

Stable result:
Python runs ~0.65s
C : ~0.70s

Any thoughts?
import time
t=time.time()
f=open(''D:\\some.txt'',''r'')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>

using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");

int main() {
int i=0;
while (true) {
if (!fgets(vs[i],999,fp)) break;
++i;
}
fclose(fp);
cout << i << endl;
cout << clock()/CLOCKS_PER_SEC << endl;

int m;
cin >m;
cout << vs[m];
system("pause");
return 0;
}

解决方案

On Apr 26, 11:10 am, n00m <n...@narod.ruwrote:

Both codes below read the same huge(~35MB) text file.
In the file 1000000 lines, the length of each line < 99 chars.

Stable result:
Python runs ~0.65s
C : ~0.70s

Any thoughts?

Yes.

Most of the dirty work in the Python example is spent in tight loop
written in C. This is very likely to be faster on Python on Windows
than your "C" example for several reasons:

1. Python is compiled with Microsoft''s C compiler, which produces more
optimal code than Mingw.

2. The Python readline() function has been in the library for a long
time and has had time for many developers to optimize it''s
performance.

3. Your "pure C" code isn''t even C, let alone pure C. It''s C++. On
most systems, the C++ iostream libraries have a lot more overhead than
C''s stdio.

And, finally, we must not fail to observe that you measured these
times without startup, which is obviousy much greater for Python. (Of
course, we only need to point this so it''s not misunderstood that
you''re claiming this Python process will terminate faster than the C++
one.)
So, I must regrettably opine that your example isn''t very meaningful.

import time
t=time.time()
f=open(''D:\\some.txt'',''r'')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]

#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>

using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");

int main() {
int i=0;
while (true) {
if (!fgets(vs[i],999,fp)) break;
++i;
}
fclose(fp);
cout << i << endl;
cout << clock()/CLOCKS_PER_SEC << endl;

int m;
cin >m;
cout << vs[m];
system("pause");
return 0;

}


fgets() from C++ iostream library???

I guess if I''d came up with "Python reads SLOWER than C"
I''d get another (not less) smart explanation "why it''s so".



"n00m" <n0**@narod.ruschreef in bericht
news:6a**********************************@a23g2000 hsc.googlegroups.com...

import time
t=time.time()
f=open(''D:\\some.txt'',''r'')
z=f.readlines()
f.close()
print len(z)
print time.time()-t
m=input()
print z[m]
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <ctime>

using namespace std;
char vs[1002000][99];
FILE *fp=fopen("D:\\some.txt","r");

int main() {
int i=0;
while (true) {
if (!fgets(vs[i],999,fp)) break;
++i;
}

first of all I would rewrite the C loop to:

int main() {
int i=0;
while (fgets(vs[i],999,fp))
++i;
}

but I think that the difference comes from what you do in the beginning of
the C source:

char vs[1002000][99];

this reserves 99,198,000 bytes so expect a lot of cache trashing in the C
code!

Is there an implementation of f.readlines on the internet somewhere?
interested to see in how they implemented it. I''m pretty sure they did it
smarter than just reserve 100meg of data :)

fclose(fp);
cout << i << endl;
cout << clock()/CLOCKS_PER_SEC << endl;

int m;
cin >m;
cout << vs[m];
system("pause");
return 0;
}


这篇关于Python(2.5)比纯C(Mingw)更快地读取输入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆