Windows 上 Haskell 中的 Unicode 控制台 I/O [英] Unicode console I/O in Haskell on Windows

查看:26
本文介绍了Windows 上 Haskell 中的 Unicode 控制台 I/O的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 windows 下的 Haskell 中,让控制台 I/O 使用 Unicode 字符似乎相当困难.这是悲惨的故事:

It seems rather difficult to get console I/O to work with Unicode characters in Haskell under windows. Here is the tale of woe:

  1. (初步.)在你甚至考虑在 windows 下的控制台中进行 Unicode I/O 之前,你需要确保你使用的是可以呈现你想要的字符的控制台字体.光栅字体(默认)的覆盖率非常低(并且不允许复制粘贴它们无法表示的字符),并且 MS 提供的 truetype 选项(consolas、lucida 控制台)的覆盖率并不大(尽管这些将允许复制/粘贴他们不能代表的字符).您可能会考虑安装 DejaVu Sans Mono(按照底部的说明操作 此处;您可能需要重新启动才能工作).在排序之前,没有应用程序能够执行大量的 Unicode I/O;不仅仅是 Haskell.
  2. 完成此操作后,您会注意到某些应用程序将能够在 Windows 下执行控制台 I/O.但是让它工作仍然相当复杂.windows下写控制台基本上有两种方式.(以下内容适用于任何语言,不只是 Haskell;别担心,Haskell 稍后会进入图片!)...
  3. 选项A是使用通常的c-library风格的基于字节的i/o函数;希望操作系统会根据某种编码来解释这些字节,这些编码可以对您想要的所有奇怪而美妙的字符进行编码.例如,在 Mac OS X 上使用等效技术,其中标准系统编码通常是 UTF8,这很好用;你发送 utf8 输出,你会看到漂亮的符号.
  4. 在 Windows 上,它的效果不太好.Windows 期望的默认编码通常不会是涵盖所有 Unicode 符号的编码.因此,如果您想以这种方式或另一种方式查看漂亮的符号,您需要更改编码.一种可能性是您的程序使用 SetConsoleCP win32 命令.(因此,您需要绑定到 Win32 库.)或者,如果您不想这样做,您可以期望程序的用户为您更改代码页(然后他们将不得不调用 chcp 命令,然后再运行您的程序).
  5. 选项 B 是使用可识别 Unicode 的 win32 控制台 API 命令,例如 WriteConsoleW.在这里,您将 UTF16 直接发送到 windows,这会很好地呈现它:没有编码不匹配的危险,因为 windows 总是 期望 UTF16 具有这些功能.
  1. (Preliminary.) Before you even consider doing Unicode I/O in the console under windows, you need to make sure that you're using a console font which can render the characters you want. The raster fonts (the default) have infinitely poor coverage (and don't allow copy pasting of characters they can't represent), and the truetype options MS provides (consolas, lucida console) have not-great coverage (though these will allow copy/pasting of characters they cannot represent). You might consider installing DejaVu Sans Mono (follow the instructions at the bottom here; you may have to reboot before it works). Until this is sorted, no apps will be able to do much Unicode I/O; not just Haskell.
  2. Having done this, you will notice that some apps will be able to do console I/O under windows. But getting it to work remains quite complicated. There are basically two ways to write to the console under windows. (What follows is true for any language, not just Haskell; don't worry, Haskell will enter the picture in a bit!)...
  3. Option A is to use the usual c-library style byte-based i/o functions; the hope is that the OS will interpret these bytes according to some encoding which can encode all the weird and wonderful characters you want. For instance, using the equivalent technique on Mac OS X, where the standard system encoding is usually UTF8, this works great; you send out utf8 output, you see pretty symbols.
  4. On windows, it works less well. The default encoding that windows expects will generally not be an encoding covering all the Unicode symbols. So if you want to see pretty symbols this way, one way or another, you need to change the encoding. One possibility would be for your program to use the SetConsoleCP win32 command. (So then you need to bind to the Win32 library.) Or, if you'd rather not do that, you can expect your program's user to change the code page for you (they would then have to call the chcp command before they run your program).
  5. Option B is to use the Unicode-aware win32 console API commands like WriteConsoleW. Here you send UTF16 direct to windows, which renders it happily: there's no danger of an encoding mismatch because windows always expects UTF16 with these functions.

不幸的是,这些选项在 Haskell 中都不能很好地工作.首先,我知道没有使用选项 B 的库,所以这不是很容易.这留下了选项 A.如果您使用 Haskell 的 I/O 库(putStrLn 等),这就是该库将要做的.在现代版本的 Haskell 中,它会仔细询问 windows 当前的代码页是什么,并以正确的编码输出你的字符串.这种方法有两个问题:

Unfortunately, neither of these options works very well from Haskell. First, there are no libraries that I know of that use Option B, so that's not very easy. This leaves option A. If you use Haskell's I/O library (putStrLn and so on), this is what the library will do. In modern versions of Haskell, it will carefully ask windows what the current code page is, and output your strings in the proper encoding. There are two problems with this approach:

  • One 不是一个引人注目的人,但很烦人.如上所述,默认编码几乎不会对您想要的字符进行编码:您是用户需要更改的编码.因此,您的用户需要在运行您的程序之前chcp cp65001(您可能会发现强迫您的用户这样做很不愉快).或者您需要绑定到 SetConsoleCP 并在您的程序中执行等效操作(然后使用 hSetEncoding 以便 Haskell 库将使用新编码发送输出),这意味着您需要包装 win32 库的相关部分,使它们对 Haskell 可见.
  • 更严重的是,有一个 Windows 中的错误(解决方案:无法修复)导致 Haskell 中的错误,这意味着如果您选择了任何代码页,如 cp65001 可以覆盖所有 Unicode,Haskell 的 I/O 例程将出现故障并失败.所以本质上,即使您(或您的用户)将编码正确设置为涵盖所有精彩 Unicode 字符的某种编码,然后告诉 Haskell 使用该编码输出内容,然后做所有正确的事情",你还是输了.
  • One is not a showstopper, but is annoying. As mentioned above, the default encoding will almost never encode the characters you want: you are the user need to change to an encoding which does. Thus your user needs to chcp cp65001 before they run your program (you may find it distasteful to force your users to do this). Or you need to bind to SetConsoleCP and do the equivalent inside your program (and then use hSetEncoding so that the Haskell libraries will send output using the new encoding), which means you need to wrap the relevant part of the win32 libraries to make them Haskell-visible.
  • Much more seriously, there is a bug in windows (resolution: won't fix) which leads to a bug in Haskell which means that if you have selected any code page like cp65001 which can cover all of Unicode, Haskell's I/O routines will malfunction and fail. So essentially, even if you (or your user) set the encoding properly to some encoding which covers all the wonderful Unicode characters, and then 'do everything right' in telling Haskell to output things using that encoding, you still lose.

上面列出的错误仍未解决并列为低优先级;那里的基本结论是选项 A(在我上面的分类中)是行不通的,需要切换到选项 B 才能获得可靠的结果.目前尚不清楚解决此问题的时间表,因为它看起来需要大量工作.

The bug listed above is still unresolved and listed as low priority; the basic conclusion there is that Option A (in my classification above) is unworkable and one needs to switch to Option B to get reliable results. It is not clear what the timeframe will be for this being resolved, as it looks like some considerable work.

问题是:与此同时,任何人都可以提出一种解决方法来允许在 Windows 下的 Haskell 中使用 Unicode 控制台 I/O.

另请参阅此 python 错误跟踪器数据库条目,解决 Python 3 中的相同问题(修复已提出,但尚未被代码库接受)和 此 stackoverflow回答,在 Python 中给出了解决此问题的方法(基于我分类中的选项 B").

See also this python bug tracker database entry, grappling with the same problem in Python 3 (fix proposed, but not yet accepted into the codebase), and this stackoverflow answer, giving a workaround for this problem in Python (based on 'option B' in my classification).

推荐答案

我想我会回答我自己的问题,并列出一种可能的答案,以下是我的实际情况此刻正在做.很可能一个人可以做得更好,这就是我问这个问题的原因!但我认为向人们提供以下内容是有意义的.它基本上是从 Python 到 Haskell 的这个 python 解决方法的翻译同样的问题.它使用问题中提到的选项 B".

I thought I would answer my own question, and list as one possible answer, the following, which is what I'm actually doing at the moment. It is quite possible that one can do better, which is why I'm asking the question! But I thought it would make sense to make the following available to people. It's basically a translation from Python to Haskell of this python workaround for the same issue. It uses 'option B' mentioned in the question.

基本思路是你创建一个模块IOUtil.hs,包含以下内容,你可以import到你的代码中:

The basic idea is that you create a module IOUtil.hs, with the following content, which you can import into your code:

{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE NoImplicitPrelude #-}
module IOUtil (
  IOUtil.interact,
  IOUtil.putChar, IOUtil.putStr, IOUtil.putStrLn, IOUtil.print,
  IOUtil.getChar, IOUtil.getLine, IOUtil.getContents, IOUtil.readIO,
  IOUtil.readLn,
  ePutChar, ePutStr, ePutStrLn, ePrint,
  trace, traceIO
  ) where

#ifdef mingw32_HOST_OS

import System.Win32.Types (BOOL, HANDLE, DWORD, LPDWORD, LPWSTR, LPCWSTR, LPVOID)
import Foreign.C.Types (CWchar)
import Foreign
import Prelude hiding (getContents, putStr, putStrLn) --(IO, Read, Show, String)
--import qualified System.IO
import qualified System.IO (getContents)
import System.IO hiding (getContents, putStr, putStrLn)
import Data.Char (ord)

 {- <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
    HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
    returns INVALID_HANDLE_VALUE, NULL, or a valid handle -}

foreign import stdcall unsafe "GetStdHandle" win32GetStdHandle :: DWORD -> IO (HANDLE)

std_OUTPUT_HANDLE = -11 :: DWORD  -- all DWORD arithmetic is performed modulo 2^n
std_ERROR_HANDLE  = -12 :: DWORD

 {- <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
    DWORD WINAPI GetFileType(HANDLE hFile); -}

foreign import stdcall unsafe "GetFileType" win32GetFileType :: HANDLE -> IO (DWORD)
_FILE_TYPE_CHAR   = 0x0002 :: DWORD
_FILE_TYPE_REMOTE = 0x8000 :: DWORD

 {- <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
    BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode); -}

foreign import stdcall unsafe "GetConsoleMode" win32GetConsoleMode :: HANDLE -> LPDWORD -> IO (BOOL)
_INVALID_HANDLE_VALUE = (intPtrToPtr $ -1) :: HANDLE

is_a_console :: HANDLE -> IO (Bool)
is_a_console handle
  = if (handle == _INVALID_HANDLE_VALUE) then return False
      else do ft <- win32GetFileType handle
              if ((ft .&. complement _FILE_TYPE_REMOTE) /= _FILE_TYPE_CHAR) then return False
                else do ptr <- malloc
                        cm  <- win32GetConsoleMode handle ptr
                        free ptr
                        return cm

real_stdout :: IO (Bool)
real_stdout = is_a_console =<< win32GetStdHandle std_OUTPUT_HANDLE

real_stderr :: IO (Bool)
real_stderr = is_a_console =<< win32GetStdHandle std_ERROR_HANDLE

 {- BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
                              LPDWORD lpCharsWritten, LPVOID lpReserved); -}

foreign import stdcall unsafe "WriteConsoleW" win32WriteConsoleW
  :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO (BOOL)

data ConsoleInfo = ConsoleInfo Int (Ptr CWchar) (Ptr DWORD) HANDLE

writeConsole :: ConsoleInfo -> [Char] -> IO ()
writeConsole (ConsoleInfo bufsize buf written handle) string
  = let fillbuf :: Int -> [Char] -> IO ()
        fillbuf i [] = emptybuf buf i []
        fillbuf i remain@(first:rest)
          | i + 1 < bufsize && ordf <= 0xffff = do pokeElemOff buf i asWord
                                                   fillbuf (i+1) rest
          | i + 1 < bufsize && ordf >  0xffff = do pokeElemOff buf i word1
                                                   pokeElemOff buf (i+1) word2
                                                   fillbuf (i+2) rest
          | otherwise                         = emptybuf buf i remain
          where ordf   = ord first
                asWord = fromInteger (toInteger ordf) :: CWchar
                sub    = ordf - 0x10000
                word1' = ((shiftR sub 10) .&. 0x3ff) + 0xD800
                word2' = (sub .&. 0x3FF)             + 0xDC00
                word1  = fromInteger . toInteger $ word1'
                word2  = fromInteger . toInteger $ word2'


        emptybuf :: (Ptr CWchar) -> Int -> [Char] -> IO ()
        emptybuf _ 0 []     = return ()
        emptybuf _ 0 remain = fillbuf 0 remain
        emptybuf ptr nLeft remain
          = do let nLeft'    = fromInteger . toInteger $ nLeft
               ret          <- win32WriteConsoleW handle ptr nLeft' written nullPtr
               nWritten     <- peek written
               let nWritten' = fromInteger . toInteger $ nWritten
               if ret && (nWritten > 0)
                  then emptybuf (ptr `plusPtr` (nWritten' * szWChar)) (nLeft - nWritten') remain
                  else fail "WriteConsoleW failed.
"

    in  fillbuf 0 string

szWChar = sizeOf (0 :: CWchar)

makeConsoleInfo :: DWORD -> Handle -> IO (Either ConsoleInfo Handle)
makeConsoleInfo nStdHandle fallback
  = do handle     <- win32GetStdHandle nStdHandle
       is_console <- is_a_console handle
       let bufsize = 10000
       if not is_console then return $ Right fallback
         else do buf     <- mallocBytes (szWChar * bufsize)
                 written <- malloc
                 return . Left $ ConsoleInfo bufsize buf written handle

{-# NOINLINE stdoutConsoleInfo #-}
stdoutConsoleInfo :: Either ConsoleInfo Handle
stdoutConsoleInfo = unsafePerformIO $ makeConsoleInfo std_OUTPUT_HANDLE stdout

{-# NOINLINE stderrConsoleInfo #-}
stderrConsoleInfo :: Either ConsoleInfo Handle
stderrConsoleInfo = unsafePerformIO $ makeConsoleInfo std_ERROR_HANDLE stderr

interact     :: (String -> String) -> IO ()
interact f   = do s <- getContents
                  putStr (f s)

conPutChar ci  = writeConsole ci . replicate 1
conPutStr      = writeConsole
conPutStrLn ci = writeConsole ci . ( ++ "
")

putChar      :: Char -> IO ()
putChar      = (either conPutChar  hPutChar ) stdoutConsoleInfo

putStr       :: String -> IO ()
putStr       = (either conPutStr   hPutStr  ) stdoutConsoleInfo

putStrLn     :: String -> IO ()
putStrLn     = (either conPutStrLn hPutStrLn) stdoutConsoleInfo

print        :: Show a => a -> IO ()
print        = putStrLn . show

getChar      = System.IO.getChar
getLine      = System.IO.getLine
getContents  = System.IO.getContents

readIO       :: Read a => String -> IO a
readIO       = System.IO.readIO

readLn       :: Read a => IO a
readLn       = System.IO.readLn

ePutChar     :: Char -> IO ()
ePutChar     = (either conPutChar  hPutChar ) stderrConsoleInfo

ePutStr     :: String -> IO ()
ePutStr      = (either conPutStr   hPutStr  ) stderrConsoleInfo

ePutStrLn   :: String -> IO ()
ePutStrLn    = (either conPutStrLn hPutStrLn) stderrConsoleInfo

ePrint       :: Show a => a -> IO ()
ePrint       = ePutStrLn . show

#else

import qualified System.IO
import Prelude (IO, Read, Show, String)

interact     = System.IO.interact
putChar      = System.IO.putChar
putStr       = System.IO.putStr
putStrLn     = System.IO.putStrLn
getChar      = System.IO.getChar
getLine      = System.IO.getLine
getContents  = System.IO.getContents
ePutChar     = System.IO.hPutChar System.IO.stderr
ePutStr      = System.IO.hPutStr System.IO.stderr
ePutStrLn    = System.IO.hPutStrLn System.IO.stderr

print        :: Show a => a -> IO ()
print        = System.IO.print

readIO       :: Read a => String -> IO a
readIO       = System.IO.readIO

readLn       :: Read a => IO a
readLn       = System.IO.readLn

ePrint       :: Show a => a -> IO ()
ePrint       = System.IO.hPrint System.IO.stderr

#endif

trace :: String -> a -> a
trace string expr = unsafePerformIO $ do
    traceIO string
    return expr

traceIO :: String -> IO ()
traceIO = ePutStrLn

然后,您可以使用其中包含的 I/O 函数而不是标准库函数.他们会检测输出是否被重定向;如果不是(即,如果我们正在写入真实"控制台),那么我们将绕过通常的 Haskell I/O 函数并使用 WriteConsoleW 直接写入 win32 控制台,unicode-aware win32控制台功能.在非 windows 平台上,条件编译意味着这里的函数只是调用标准库的函数.

then, you use the I/O functions therein contained instead of the standard library ones. They will detect whether output is redirected; if not (i.e. if we're writing to a 'real' console) then we'll bypass the usual Haskell I/O functions and write directly to the win32 console using WriteConsoleW, the unicode-aware win32 console function. On non-windows platforms, conditional compilation means that the functions here just call the standard-library ones.

如果你需要打印到 stderr,你应该使用(例如)ePutStrLn,而不是 hPutStrLn stderr;我们没有定义 hPutStrLn.(定义一个是读者的练习!)

If you need to print to stderr, you should use (e.g.) ePutStrLn, not hPutStrLn stderr; we don't define a hPutStrLn. (Defining one is an exercise for the reader!)

这篇关于Windows 上 Haskell 中的 Unicode 控制台 I/O的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆