在计算字母时,Haskell可以击败C吗? [英] Can Haskell beat C at counting letters?

查看:113
本文介绍了在计算字母时,Haskell可以击败C吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

剧透:是的。见下面。



试图优化一个字母计数器来匹配C.我已经把它打到了2倍的赤字。

  letterCount :: B.ByteString  - > V.Vector Int 
letterCount bs =
V.accumulate
(\ a _ - > a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len(\i - >(fromIntegral $!B.index bs i,()))

一些注意事项:


  1. 在我将 Data.Vector 更改为 Data.Vector.Unboxed 之前,它非常慢。为什么会这样?

  2. 我认为大部分时间都会花在 accumulate 上。我错了。

  3. 70%的时间用于 generate

  4. Haskell代码遭受不必要地将Word8转换为Int;此外,()的无用部队可能实际上或可能不会实际创建。



<

 导入限定的Data.ByteString为B 
导入限定的Data.Vector.Unboxed为V
import System.Environment
import Text.Printf

letterCount :: B.ByteString - > V.Vector Int
letterCount bs =
V.accumulate
(\ a _ - > a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len(\i - >(fromIntegral $!B.index bs i,()))

printCounts :: V.Vector Int - > IO()
printCounts cs =
mapM_
(uncurry $ printf%c:%d\\\

(zip(映射到Enum [0..255]: :String)(V.toList cs))

main :: IO()
main = do
filename< - fmap head getArgs
f< - B .readFile文件名
let counts = letterCount f
printCounts counts

竞争C代码:

  #include< assert.h> 
#include< stdio.h>
#include< string.h>
#include< sys / stat.h>
#include< stdlib.h>

int letcnt [256];

int * letter_count(unsigned char * s,unsigned int len)
{
int i;
memset(letcnt,0,256 * sizeof(int));
for(i = 0; i< len; i ++){
letcnt [*(s + i)] ++;
}
return(letcnt);
}

void print_counts(){
int i;
for(i = 0; i <256; i ++){
printf('%c':%d \ n,(unsigned char)i,letcnt [i]);

$ b $ // st_size
int main(int argc,char ** argv)
{
assert(argc == 2);
FILE * f = fopen(argv [1],r);
struct stat st;
stat(argv [1],& st);
off_t len = st.st_size;
unsigned char * contents = calloc(len,1);
fread(内容,len,1,f);
fclose(f);
letter_count(contents,len);
print_counts();
返回0;
}

定时;

  $ time ./a.out / usr / share / dict / words> / dev / null 

real 0m0.012s
用户0m0.005s
sys 0m0.005s

$ time ./lettercount / usr / share /字典/单词> / dev / null

real 0m0.017s
user 0m0.009s
sys 0m0.007s



更新



我认为性能上限取决于此错误: runST不是免费的。并不是说我相信不可能进一步优化,但只要runST带来一些开销,就不太可能接近C.

另外,根据@ Zeta的评论修正了C代码。 / p>

解决方案

是的。如果使用 -fllvm 编译,则Haskell将匹配 User 时间中的C。令人惊讶的是,如果你切换到Lazy Bytestrings,那么Haskell版本将在 Real 时间的C版本上以小幅度但显着的优势胜过。

 将限定的Data.ByteString.Lazy.Char8导入为B 
将限定的Data.Vector.Unboxed导入为V
import System.Environment
import Text.Printf

letterCount :: B.ByteString - > V.Vector Int
letterCount bs =
V.unsafeAccumulate
(\ a_ - > a + 1)
(V.replicate 256 0)
(解析bs)

解析:: B.ByteString - > V.Vector(Int,())
parse = V.unfoldr step
where
step s = if B.null s
then Nothing
else Just(( fromIntegral。fromEnum $ B.head s,()),B.tail s)
{ - #INLINE parse# - }

printCounts :: V.Vector Int - > IO()
printCounts cs =
mapM_
(uncurry $ printf%c:%d\\\

(zip(映射到Enum [0..255]: :String)(V.toList cs))

main :: IO()
main = do
filename< - fmap head getArgs
f< - B .readFile文件名
let counts = letterCount f
printCounts counts

请记住编译例如:

  ghc -O2 -fllvm letterCount.hs 
C.我喜欢它!

更新



公平地更新了C代码以使用单个缓冲区,避免了预先大量分配(或任何分配),并且对缓存更友好。现在,Haskell和C代码没有显着差异,对于大型的150M输入文件,运行时间大约为190毫秒:


  #include< assert.h> 
#include< stdio.h>
#include< string.h>
#include< sys / stat.h>
#include< stdlib.h>

#define CHUNK 16384

int letcnt [256];

int * letter_count(unsigned char * s,unsigned int len)
{
int i;
for(i = 0; i< len; i ++){
letcnt [*(s + i)] ++;
}
return(letcnt);
}

int * letter_count_chunks(unsigned int len,FILE * f)
{
int i;
unsigned char chunk [CHUNK];
memset(letcnt,0,sizeof(letcnt));
for(i = 0; i< len - CHUNK; i + = CHUNK){
fread(chunk,CHUNK,1,f);
letter_count(chunk,CHUNK);
}
fread(chunk,len - i,1,f);
letter_count(chunk,len - i);

返回letcnt;
}

void print_counts(){
int i;
for(i = 0; i <256; i ++){
printf('%c':%d \ n,(unsigned char)i,letcnt [i]);

$ b $ // st_size
int main(int argc,char ** argv)
{
assert(argc == 2);
FILE * f = fopen(argv [1],r);
struct stat st;
stat(argv [1],& st);
off_t len = st.st_size;
letter_count_chunks(len,f);
fclose(f);
print_counts();
返回0;
}


Spoiler: Yes. See below.

Trying to optimize a letter counter to match C. I've fought it to a 2x deficit.

letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
    V.accumulate
        (\a _ -> a + 1)
        (V.replicate 256 0)
        letters1
  where
    len = B.length bs
    letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))

Some notes:

  1. It was really slow until I changed Data.Vector to Data.Vector.Unboxed. Why is that?
  2. I thought most of the time would be spent in accumulate. I was wrong.
  3. 70% of the time is spent in generate.
  4. Haskell code suffers from having to pointlessly convert Word8 to Int; Also, a useless army of () may or may not actually be created.

Full listing:

import qualified Data.ByteString as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf

letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
    V.accumulate
        (\a _ -> a + 1)
        (V.replicate 256 0)
        letters1
  where
    len = B.length bs
    letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))

printCounts :: V.Vector Int -> IO ()
printCounts cs =
    mapM_
        (uncurry $ printf "%c: %d\n")
        (zip (map toEnum [0..255] :: String) (V.toList cs))

main :: IO ()
main = do
    filename <- fmap head getArgs
    f <- B.readFile filename
    let counts = letterCount f
    printCounts counts

Competing C code:

    #include <assert.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/stat.h>
    #include <stdlib.h>

    int letcnt [256];

    int* letter_count(unsigned char *s, unsigned int len)
    {
        int i;
        memset(letcnt, 0, 256 * sizeof(int));
        for(i = 0; i < len; i++){
            letcnt[*(s + i)]++;
        }
        return (letcnt);
    }

    void print_counts() {
        int i;
        for(i = 0; i < 256; i++) {
            printf("'%c': %d\n", (unsigned char) i, letcnt[i]);
        }
    }
    // st_size
    int main(int argc, char **argv)
    {
        assert(argc == 2);
        FILE* f = fopen(argv[1], "r");
        struct stat st;
        stat(argv[1], &st);
        off_t len = st.st_size;
        unsigned char* contents = calloc(len, 1);
        fread(contents, len, 1, f);
        fclose(f);
        letter_count(contents, len);
        print_counts();
        return 0;
    }

Timings;

$ time ./a.out /usr/share/dict/words > /dev/null

real  0m0.012s
user  0m0.005s
sys 0m0.005s

$ time ./lettercount /usr/share/dict/words > /dev/null

real  0m0.017s
user  0m0.009s
sys 0m0.007s

Update

I think the performance ceiling is down to this bug: runST isn't free. Not that I believe it's impossible to optimize further but unlikely to approach C so long as runST imposes some overhead.

Also, fixed C-code based on @Zeta's comment.

解决方案

Yes. If you compile with -fllvm then Haskell will match C in User time. The big surprise is if you switch to Lazy Bytestrings, the Haskell version will beat the C version on Real time by a small but significant margin.

import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf

letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
    V.unsafeAccumulate
        (\a _ -> a + 1)
        (V.replicate 256 0)
        (parse bs)

parse :: B.ByteString -> V.Vector (Int, ())
parse = V.unfoldr step
  where
    step s = if B.null s
        then Nothing
        else Just ((fromIntegral . fromEnum $ B.head s, ()), B.tail s)
{-# INLINE parse #-}

printCounts :: V.Vector Int -> IO ()
printCounts cs =
    mapM_
        (uncurry $ printf "%c: %d\n")
        (zip (map toEnum [0..255] :: String) (V.toList cs))

main :: IO ()
main = do
    filename <- fmap head getArgs
    f <- B.readFile filename
    let counts = letterCount f
    printCounts counts

Remember to compile like:

ghc -O2 -fllvm letterCount.hs

So, Vector + ByteString.Lazy + LLVM > C. I love it!

Update

In fairness to C I updated the C code to use a single buffer which avoids doing a big allocation up front (or any allocations at all) and will be more cache friendly. Now the Haskell and C codes show no significant difference, both about 190ms in run-time on a best-of-3 basis against a large, 150M input file:

#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <stdlib.h>

#define CHUNK 16384

int letcnt [256];

int* letter_count(unsigned char *s, unsigned int len)
{
  int i;
  for(i = 0; i < len; i++){
    letcnt[*(s + i)]++;
  }
  return (letcnt);
}

int* letter_count_chunks(unsigned int len, FILE* f)
{
  int i;
  unsigned char chunk [CHUNK];
  memset(letcnt, 0, sizeof(letcnt));
  for(i = 0; i < len - CHUNK; i+= CHUNK) {
    fread(chunk, CHUNK, 1, f);
    letter_count(chunk, CHUNK);
  }
  fread(chunk, len - i, 1, f);
  letter_count(chunk, len - i);

  return letcnt;
}

void print_counts() {
  int i;
  for(i = 0; i < 256; i++) {
    printf("'%c': %d\n", (unsigned char) i, letcnt[i]);
  }
}
// st_size
int main(int argc, char **argv)
{
  assert(argc == 2);
  FILE* f = fopen(argv[1], "r");
  struct stat st;
  stat(argv[1], &st);
  off_t len = st.st_size;
  letter_count_chunks(len, f);
  fclose(f);
  print_counts();
  return 0;
}

这篇关于在计算字母时,Haskell可以击败C吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆