为什么基于[Char]的输入比Haskell中基于[Char]的输出慢得多? [英] Why is [Char]-based input so much slower than the [Char]-based output in Haskell?

查看:128
本文介绍了为什么基于[Char]的输入比Haskell中基于[Char]的输出慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

人们不会使用 [Char] 来读取Haskell中的大量数据。一个使用 ByteString 来完成这项工作。
通常的解释是 Char s很大,列表会增加开销。



然而,这似乎不会导致任何输出问题。



例如以下程序:

  main = interact $ const $ unwords $ map show $ replicate 500000 38000000 

只需要131 ms就可以在我的电脑上运行,而下面这个:

  import Data.List 

sum':: [Int] - > Int
sum'= foldl'(+)0

main = interact $ show。总和'。地图阅读。单词

如果输入第一个程序的输出作为输入,则需要3.38秒!



使用字符串 s
的输入和输出性能之间存在如此差异的原因是什么? div class =h2_lin>解决方案

我不认为这个问题必然与I / O有关。相反,它表明 Int 的 Read 实例相当低效。



首先,考虑下面这个只处理懒惰列表的程序。在我的机器上需要4.1s(用 -O2 编译):

  main = print $ sum'$ map read $ words 
$ unwords $ map show $ replicate 500000 38000000

length 替换读取函数将时间降至0.48s:

  main = print $ sum'$ map length $ words 
$ unwords $ map show $ replicate 500000 38000000

此外,用手写版本替换读取函数的结果时间为0.52 s:

  main = print $ sum'$ map myread $ words 
$ unwords $ map show $ replicate 500000 38000000

myread :: String - > Int
myread = loop 0
其中
loop n [] = n
loop n(d:ds)= let d'= fromEnum d - fromEnum'0':: Int
n'= 10 * n + d'
in循环n'ds

我猜为什么 read 效率太低,因为它的实现使用了 Text.ParserCombinators.ReadP 模块,它可能不是读取单个整数的简单情况下最快的选择。


It is a common knowledge that one does not use [Char] to read large amounts of data in Haskell. One uses ByteStrings to do the job. The usual explanation for this is that Chars are large and lists add their overhead.

However, this does not seem to cause any problems with the output.

For example the following program:

main = interact $ const $ unwords $ map show $ replicate 500000 38000000

takes just 131 ms to run on my computer, while the following one:

import Data.List

sum' :: [Int] -> Int
sum' = foldl' (+) 0

main = interact $ show . sum' . map read . words

takes 3.38 seconds if fed the output of the first program as an input!

What is the reason for such a disparity between the input and output performance using Strings?

解决方案

I don't think that this issue necessarily has to do with I/O. Rather, it demonstrates that the Read instance for Int is pretty inefficient.

First, consider the following program which just processes a lazy list. It takes 4.1s on my machine (compiled with -O2):

main = print $ sum' $ map read $ words
        $ unwords $ map show $ replicate 500000 38000000

Replacing the read function with length drops the time down to 0.48s:

main = print $ sum' $ map length $ words
        $ unwords $ map show $ replicate 500000 38000000

Furthermore, replacing the read function with a handwritten version results in a time of 0.52s:

main = print $ sum' $ map myread $ words
        $ unwords $ map show $ replicate 500000 38000000

myread :: String -> Int
myread = loop 0
  where
    loop n [] = n
    loop n (d:ds) = let d' = fromEnum d  - fromEnum '0' :: Int
                        n' = 10 * n + d'
                    in loop n' ds

My guess as to why read is so inefficient is that its implementation uses the Text.ParserCombinators.ReadP module, which may not be the fastest choice for the simple case of reading a single integer.

这篇关于为什么基于[Char]的输入比Haskell中基于[Char]的输出慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆