在计算字母时,Haskell可以击败C吗? [英] Can Haskell beat C at counting letters?
问题描述
试图优化一个字母计数器来匹配C.我已经把它打到了2倍的赤字。
letterCount :: B.ByteString - > V.Vector Int
letterCount bs =
V.accumulate
(\ a _ - > a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len(\i - >(fromIntegral $!B.index bs i,()))
一些注意事项:
- 在我将
Data.Vector
更改为Data.Vector.Unboxed
之前,它非常慢。为什么会这样? - 我认为大部分时间都会花在
accumulate
上。我错了。 - 70%的时间用于
generate
。 - Haskell代码遭受不必要地将Word8转换为Int;此外,
()
的无用部队可能实际上或可能不会实际创建。
<
导入限定的Data.ByteString为B
导入限定的Data.Vector.Unboxed为V
import System.Environment
import Text.Printf
letterCount :: B.ByteString - > V.Vector Int
letterCount bs =
V.accumulate
(\ a _ - > a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len(\i - >(fromIntegral $!B.index bs i,()))
printCounts :: V.Vector Int - > IO()
printCounts cs =
mapM_
(uncurry $ printf%c:%d\\\
)
(zip(映射到Enum [0..255]: :String)(V.toList cs))
main :: IO()
main = do
filename< - fmap head getArgs
f< - B .readFile文件名
let counts = letterCount f
printCounts counts
竞争C代码:
#include< assert.h>
#include< stdio.h>
#include< string.h>
#include< sys / stat.h>
#include< stdlib.h>
int letcnt [256];
int * letter_count(unsigned char * s,unsigned int len)
{
int i;
memset(letcnt,0,256 * sizeof(int));
for(i = 0; i< len; i ++){
letcnt [*(s + i)] ++;
}
return(letcnt);
}
void print_counts(){
int i;
for(i = 0; i <256; i ++){
printf('%c':%d \ n,(unsigned char)i,letcnt [i]);
$ b $ // st_size
int main(int argc,char ** argv)
{
assert(argc == 2);
FILE * f = fopen(argv [1],r);
struct stat st;
stat(argv [1],& st);
off_t len = st.st_size;
unsigned char * contents = calloc(len,1);
fread(内容,len,1,f);
fclose(f);
letter_count(contents,len);
print_counts();
返回0;
}
定时;
$ time ./a.out / usr / share / dict / words> / dev / null
real 0m0.012s
用户0m0.005s
sys 0m0.005s
$ time ./lettercount / usr / share /字典/单词> / dev / null
real 0m0.017s
user 0m0.009s
sys 0m0.007s
更新
我认为性能上限取决于此错误: runST不是免费的。并不是说我相信不可能进一步优化,但只要runST带来一些开销,就不太可能接近C.
另外,根据@ Zeta的评论修正了C代码。 / p>
是的。如果使用 -fllvm
编译,则Haskell将匹配 User
时间中的C。令人惊讶的是,如果你切换到Lazy Bytestrings,那么Haskell版本将在 Real
时间的C版本上以小幅度但显着的优势胜过。
将限定的Data.ByteString.Lazy.Char8导入为B
将限定的Data.Vector.Unboxed导入为V
import System.Environment
import Text.Printf
letterCount :: B.ByteString - > V.Vector Int
letterCount bs =
V.unsafeAccumulate
(\ a_ - > a + 1)
(V.replicate 256 0)
(解析bs)
解析:: B.ByteString - > V.Vector(Int,())
parse = V.unfoldr step
where
step s = if B.null s
then Nothing
else Just(( fromIntegral。fromEnum $ B.head s,()),B.tail s)
{ - #INLINE parse# - }
printCounts :: V.Vector Int - > IO()
printCounts cs =
mapM_
(uncurry $ printf%c:%d\\\
)
(zip(映射到Enum [0..255]: :String)(V.toList cs))
main :: IO()
main = do
filename< - fmap head getArgs
f< - B .readFile文件名
let counts = letterCount f
printCounts counts
请记住编译例如:
ghc -O2 -fllvm letterCount.hs
$ p $所以,Vector + ByteString.Lazy + LLVM> C.我喜欢它!
更新
公平地更新了C代码以使用单个缓冲区,避免了预先大量分配(或任何分配),并且对缓存更友好。现在,Haskell和C代码没有显着差异,对于大型的150M输入文件,运行时间大约为190毫秒:
#include< assert.h>
#include< stdio.h>
#include< string.h>
#include< sys / stat.h>
#include< stdlib.h>
#define CHUNK 16384
int letcnt [256];
int * letter_count(unsigned char * s,unsigned int len)
{
int i;
for(i = 0; i< len; i ++){
letcnt [*(s + i)] ++;
}
return(letcnt);
}
int * letter_count_chunks(unsigned int len,FILE * f)
{
int i;
unsigned char chunk [CHUNK];
memset(letcnt,0,sizeof(letcnt));
for(i = 0; i< len - CHUNK; i + = CHUNK){
fread(chunk,CHUNK,1,f);
letter_count(chunk,CHUNK);
}
fread(chunk,len - i,1,f);
letter_count(chunk,len - i);
返回letcnt;
}
void print_counts(){
int i;
for(i = 0; i <256; i ++){
printf('%c':%d \ n,(unsigned char)i,letcnt [i]);
$ b $ // st_size
int main(int argc,char ** argv)
{
assert(argc == 2);
FILE * f = fopen(argv [1],r);
struct stat st;
stat(argv [1],& st);
off_t len = st.st_size;
letter_count_chunks(len,f);
fclose(f);
print_counts();
返回0;
}
Spoiler: Yes. See below.
Trying to optimize a letter counter to match C. I've fought it to a 2x deficit.
letterCount :: B.ByteString -> V.Vector Int letterCount bs = V.accumulate (\a _ -> a + 1) (V.replicate 256 0) letters1 where len = B.length bs letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))
Some notes:
- It was really slow until I changed
Data.Vector
toData.Vector.Unboxed
. Why is that?- I thought most of the time would be spent in
accumulate
. I was wrong.- 70% of the time is spent in
generate
.- Haskell code suffers from having to pointlessly convert Word8 to Int; Also, a useless army of
()
may or may not actually be created.Full listing:
import qualified Data.ByteString as B import qualified Data.Vector.Unboxed as V import System.Environment import Text.Printf letterCount :: B.ByteString -> V.Vector Int letterCount bs = V.accumulate (\a _ -> a + 1) (V.replicate 256 0) letters1 where len = B.length bs letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ())) printCounts :: V.Vector Int -> IO () printCounts cs = mapM_ (uncurry $ printf "%c: %d\n") (zip (map toEnum [0..255] :: String) (V.toList cs)) main :: IO () main = do filename <- fmap head getArgs f <- B.readFile filename let counts = letterCount f printCounts counts
Competing C code:
#include <assert.h> #include <stdio.h> #include <string.h> #include <sys/stat.h> #include <stdlib.h> int letcnt [256]; int* letter_count(unsigned char *s, unsigned int len) { int i; memset(letcnt, 0, 256 * sizeof(int)); for(i = 0; i < len; i++){ letcnt[*(s + i)]++; } return (letcnt); } void print_counts() { int i; for(i = 0; i < 256; i++) { printf("'%c': %d\n", (unsigned char) i, letcnt[i]); } } // st_size int main(int argc, char **argv) { assert(argc == 2); FILE* f = fopen(argv[1], "r"); struct stat st; stat(argv[1], &st); off_t len = st.st_size; unsigned char* contents = calloc(len, 1); fread(contents, len, 1, f); fclose(f); letter_count(contents, len); print_counts(); return 0; }
Timings;
$ time ./a.out /usr/share/dict/words > /dev/null real 0m0.012s user 0m0.005s sys 0m0.005s $ time ./lettercount /usr/share/dict/words > /dev/null real 0m0.017s user 0m0.009s sys 0m0.007s
Update
I think the performance ceiling is down to this bug: runST isn't free. Not that I believe it's impossible to optimize further but unlikely to approach C so long as runST imposes some overhead.
Also, fixed C-code based on @Zeta's comment.
解决方案Yes. If you compile with
-fllvm
then Haskell will match C inUser
time. The big surprise is if you switch to Lazy Bytestrings, the Haskell version will beat the C version onReal
time by a small but significant margin.import qualified Data.ByteString.Lazy.Char8 as B import qualified Data.Vector.Unboxed as V import System.Environment import Text.Printf letterCount :: B.ByteString -> V.Vector Int letterCount bs = V.unsafeAccumulate (\a _ -> a + 1) (V.replicate 256 0) (parse bs) parse :: B.ByteString -> V.Vector (Int, ()) parse = V.unfoldr step where step s = if B.null s then Nothing else Just ((fromIntegral . fromEnum $ B.head s, ()), B.tail s) {-# INLINE parse #-} printCounts :: V.Vector Int -> IO () printCounts cs = mapM_ (uncurry $ printf "%c: %d\n") (zip (map toEnum [0..255] :: String) (V.toList cs)) main :: IO () main = do filename <- fmap head getArgs f <- B.readFile filename let counts = letterCount f printCounts counts
Remember to compile like:
ghc -O2 -fllvm letterCount.hs
So, Vector + ByteString.Lazy + LLVM > C. I love it!
Update
In fairness to C I updated the C code to use a single buffer which avoids doing a big allocation up front (or any allocations at all) and will be more cache friendly. Now the Haskell and C codes show no significant difference, both about 190ms in run-time on a best-of-3 basis against a large, 150M input file:
#include <assert.h> #include <stdio.h> #include <string.h> #include <sys/stat.h> #include <stdlib.h> #define CHUNK 16384 int letcnt [256]; int* letter_count(unsigned char *s, unsigned int len) { int i; for(i = 0; i < len; i++){ letcnt[*(s + i)]++; } return (letcnt); } int* letter_count_chunks(unsigned int len, FILE* f) { int i; unsigned char chunk [CHUNK]; memset(letcnt, 0, sizeof(letcnt)); for(i = 0; i < len - CHUNK; i+= CHUNK) { fread(chunk, CHUNK, 1, f); letter_count(chunk, CHUNK); } fread(chunk, len - i, 1, f); letter_count(chunk, len - i); return letcnt; } void print_counts() { int i; for(i = 0; i < 256; i++) { printf("'%c': %d\n", (unsigned char) i, letcnt[i]); } } // st_size int main(int argc, char **argv) { assert(argc == 2); FILE* f = fopen(argv[1], "r"); struct stat st; stat(argv[1], &st); off_t len = st.st_size; letter_count_chunks(len, f); fclose(f); print_counts(); return 0; }
这篇关于在计算字母时,Haskell可以击败C吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!