为什么这个程序在 Python 中比 Objective-C 更快? [英] Why is this program faster in Python than Objective-C?
问题描述
我对 这个小例子 Python 中用于循环遍历大型单词列表的算法.我正在编写一些工具",它们将允许我以与 Python 类似的方式对 Objective-C 字符串或数组进行切片.
I got interested in this small example of an algorithm in Python for looping through a large word list. I am writing a few "tools" that will allow my to slice a Objective-C string or array in a similar fashion as Python.
具体来说,这个优雅的解决方案引起了我的注意,它执行速度非常快,它使用字符串切片作为算法的关键元素.尝试不用切片来解决这个问题!
Specifically, this elegant solution caught my attention for executing very quickly and it uses a string slice as a key element of the algorithm. Try and solve this without a slice!
我使用下面的 Moby 单词列表复制了我的本地版本.如果您不想下载 Moby,可以使用 /usr/share/dict/words
.源只是一个大型字典般的独特单词列表.
I have reproduced my local version using the Moby word list below. You can use /usr/share/dict/words
if you do not feel like downloading Moby. The source is just a large dictionary-like list of unique words.
#!/usr/bin/env python
count=0
words = set(line.strip() for line in
open("/Users/andrew/Downloads/Moby/mwords/354984si.ngl"))
for w in words:
even, odd = w[::2], w[1::2]
if even in words and odd in words:
count+=1
print count
这个脚本将 a) 被 Python 解释;b) 读取 4.1 MB、354,983 字的 Moby 词典文件;c) 剥线;d) 将线放入一个集合中,并且;e) 并找出给定单词的偶数和几率也是单词的所有组合.这在 MacBook Pro 上执行时间约为 0.73 秒.
This script will a) be interpreted by Python; b) read the 4.1 MB, 354,983 word Moby dictionary file; c) strip the lines; d) place the lines into a set, and; e) and find all the combinations where the evens and the odds of a given word are also words. This executes in about 0.73 seconds on a MacBook Pro.
我尝试在 Objective-C 中重写相同的程序.我是这门语言的初学者,所以请放轻松,但请指出错误.
I tried to rewrite the same program in Objective-C. I am a beginner at this language, so go easy please, but please do point out the errors.
#import <Foundation/Foundation.h>
NSString *sliceString(NSString *inString, NSUInteger start, NSUInteger stop,
NSUInteger step){
NSUInteger strLength = [inString length];
if(stop > strLength) {
stop = strLength;
}
if(start > strLength) {
start = strLength;
}
NSUInteger capacity = (stop-start)/step;
NSMutableString *rtr=[NSMutableString stringWithCapacity:capacity];
for(NSUInteger i=start; i < stop; i+=step){
[rtr appendFormat:@"%c",[inString characterAtIndex:i]];
}
return rtr;
}
NSSet * getDictWords(NSString *path){
NSError *error = nil;
NSString *words = [[NSString alloc] initWithContentsOfFile:path
encoding:NSUTF8StringEncoding error:&error];
NSCharacterSet *sep=[NSCharacterSet newlineCharacterSet];
NSPredicate *noEmptyStrings =
[NSPredicate predicateWithFormat:@"SELF != ''"];
if (words == nil) {
// deal with error ...
}
// ...
NSArray *temp=[words componentsSeparatedByCharactersInSet:sep];
NSArray *lines =
[temp filteredArrayUsingPredicate:noEmptyStrings];
NSSet *rtr=[NSSet setWithArray:lines];
NSLog(@"lines: %lul, word set: %lul",[lines count],[rtr count]);
[words release];
return rtr;
}
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
int count=0;
NSSet *dict =
getDictWords(@"/Users/andrew/Downloads/Moby/mwords/354984si.ngl");
NSLog(@"Start");
for(NSString *element in dict){
NSString *odd_char=sliceString(element, 1,[element length], 2);
NSString *even_char=sliceString(element, 0, [element length], 2);
if([dict member:even_char] && [dict member:odd_char]){
count++;
}
}
NSLog(@"count=%i",count);
[pool drain];
return 0;
}
Objective-C 版本产生相同的结果,(13,341 个字),但需要将近 3 秒才能完成.对于编译语言比脚本语言慢 3 倍以上,我一定是在做一些非常错误的事情,但如果我能明白为什么,我会被诅咒的.
The Objective-C version produces the same result, (13,341 words), but takes almost 3 seconds to do it. I must be doing something atrociously wrong for a compiled language to be more than 3X slower than a scripted language, but I'll be darned if I can see why.
基本算法是一样的:读取行,剥离它们,然后将它们放在一个集合中.
The basic algorithm is the same: read the lines, strip them, and put them in a set.
我猜想慢的是 NSString 元素的处理,但我不知道替代方法.
My guess of what is slow is the processing of the NSString elements, but I do not know an alternative.
编辑
我将 Python 编辑成这样:
I edited the Python to be this:
#!/usr/bin/env python
import codecs
count=0
words = set(line.strip() for line in
codecs.open("/Users/andrew/Downloads/Moby/mwords/354984si.ngl",
encoding='utf-8'))
for w in words:
if w[::2] in words and w[1::2] in words:
count+=1
print count
为了使 utf-8 与 utf-8 NSString 在同一平面上.这将 Python 减慢到 1.9 秒.
For the utf-8 to be on the same plane as the utf-8 NSString. This slowed the Python down to 1.9 secs.
我还将切片测试切换为短路类型 建议 适用于 Python 和 obj-c 版本.现在它们接近相同的速度.我还尝试使用 C 数组而不是 NSStrings,这要快得多,但并不容易.你也失去了对 utf-8 的支持.
I also switch the slice test to short-circuit type as suggested for both the Python and obj-c version. Now they are close to the same speed. I also tried using C arrays rather than NSStrings, and this was much faster, but not as easy. You also loose utf-8 support doing that.
Python 真的很酷...
Python is really cool...
编辑 2
我发现了一个大大加快了速度的瓶颈.我没有使用 [rtr appendFormat:@"%c",[inString characterAtIndex:i]];
方法将一个字符附加到返回字符串,而是使用了这个:
I found a bottleneck that sped things up considerably. Instead of using the [rtr appendFormat:@"%c",[inString characterAtIndex:i]];
method to append a character to the return string, I used this:
for(NSUInteger i=start; i < stop; i+=step){
buf[0]=[inString characterAtIndex:i];
[rtr appendString:[NSString stringWithCharacters:buf length:1]];
}
现在我可以最后宣称 Objective-C 版本比 Python 版本快——但不会快很多.
Now I can finally claim that the Objective-C version is faster than the Python version -- but not by much.
推荐答案
请记住,Python 版本的编写是为了在 CPython 上执行时将大量繁重的工作转移到高度优化的 C 代码中(尤其是文件输入缓冲、字符串切片和哈希表查找以检查 even
和 odd
是否在 words
中.
Keep in mind that the Python version has been written to move a lot of the heavy lifting down into highly optimised C code when executed on CPython (especially the file input buffering, string slicing and the hash table lookups to check whether even
and odd
are in words
).
也就是说,您似乎在 Objective-C 代码中将文件解码为 UTF-8,但在 Python 代码中将文件保留为二进制文件.在 Objective-C 版本中使用 Unicode NSString,但在 Python 版本中使用 8 位字节字符串并不是真正公平的比较 - 如果您使用 codecs.open()
打开声明编码为 "utf-8"
的文件.
That said, you seem to be decoding the file as UTF-8 in your Objective-C code, but leaving the file in binary in your Python code. Using Unicode NSString in the Objective-C version, but 8-bit byte strings in the Python version isn't really a fair comparison - I would expect the performance of the Python version to drop noticeably if you used codecs.open()
to open the file with a declared encoding of "utf-8"
.
您还进行了完整的第二遍以去除 Objective-C 中的空行,而 Python 代码中没有这样的步骤.
You're also making a full second pass to strip the empty lines in your Objective-C, while no such step is present in the Python code.
这篇关于为什么这个程序在 Python 中比 Objective-C 更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!