如何破解 GHCi(或 Hugs)以便打印未转义的 Unicode 字符? [英] How to hack GHCi (or Hugs) so that it prints Unicode chars unescaped?

查看:12
本文介绍了如何破解 GHCi(或 Hugs)以便打印未转义的 Unicode 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看问题:通常,在交互式 Haskell 环境中,非拉丁 Unicode 字符(构成结果的一部分)被转义打印,即使语言环境允许此类字符(与通过 putStrLn, putChar 看起来不错且可读)--示例显示 GHCi 和 Hugs98:

Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn, putChar which looks fine and readable)--the examples show GHCi and Hugs98:

$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> "hello: привет"
"hello: 108710881080107410771090"
Prelude> 'Я'
'1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__   __ __  __  ____   ___      _________________________________________
||   || ||  || ||  || ||__      Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__||  __||     Copyright (c) 1994-2005
||---||         ___||           World Wide Web: http://haskell.org/hugs
||   ||                         Bugs: http://hackage.haskell.org/trac/hugs
||   || Version: September 2006 _________________________________________

Hugs mode: Restart with command line option +98 for Haskell 98 mode

Type :? for help
Hugs> "hello: привет"
"hello: 108710881080107410771090"
Hugs> 'Я'
'1071'
Hugs> putStrLn "hello: привет"
hello: привет

Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ 

我们可以猜测这是因为 printshow 用于格式化结果,这些函数尽最大努力以规范、最大可移植的方式格式化数据——所以他们更喜欢转义奇怪的字符(也许,它甚至在 Haskell 的标准中都有说明):

We can guess that it's because print and show are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):

$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> show 'Я'
"'\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\1071'"
Hugs> :q
[Leaving Hugs]
$ 

但是,如果我们知道如何破解 GHCi 或 Hugs 以人类可读的方式打印这些字符,即直接,未转义,那就太好了.在将交互式 Haskell 环境用于教育目的时,您可以在非英语观众面前进行 Haskell 教程/演示,您希望在他们的人类语言中展示一些有关数据的 Haskell.

But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.

实际上,它不仅可用于教育目的,还可用于调试!当您有在表示其他语言单词的字符串上定义的函数时,使用非 ASCII 字符.因此,如果程序是特定于语言的,并且只有另一种语言的词作为数据才有意义,并且您的函数仅在这些词上定义,那么在 GHCi 中调试时查看这些数据很重要.

Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.

总结一下我的问题:有哪些方法可以破解现有的交互式 Haskell 环境,以便在结果中更友好地打印 Unicode?(在我的情况下,友好"意味着甚至更简单":我希望 GHCi 或 Hugs 中的 print 以简单直接的方式显示非拉丁字符,就像 putChar 所做的那样, putStrLn,即未转义.)

To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar, putStrLn, i.e. unescaped.)

(也许,除了 GHCi 和 Hugs98 之外,我还会看看现有的 Emacs 与 Haskell 交互的模式,看看它们是否可以以漂亮的、未转义的方式呈现结果.)

(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)

推荐答案

选项1(错误):

修改这行代码:

https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356

showLitChar c s | c > 'DEL' =  showChar '\' (protectEsc isDec (shows (ord c)) s)

然后重新编译 ghc.

And recompile ghc.

当 GHCi 类型检查已解析的语句时,它会以 tcRnStmt 结束,它依赖于 mkPlan(都在 https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs).这会尝试对输入的语句的几个变体进行类型检查,包括:

When GHCi type checks a parsed statement it ends up in tcRnStmt which relies on mkPlan (both in https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs). This attempts to type check several variants of the statement that was typed in including:

let it = expr in print it >> return [coerce HVal it]

具体来说:

print_it  = L loc $ ExprStmt (nlHsApp (nlHsVar printName) (nlHsVar fresh_it))
                                      (HsVar thenIOName) placeHolderType

此处可能需要更改的只是 printName(绑定到 System.IO.print).如果它改为绑定到类似 printGhci 的东西,它的实现如下:

All that might need to change here is printName (which binds to System.IO.print). If it instead bound to something like printGhci which was implemented like:

class ShowGhci a where
    showGhci :: a -> String
    ...

-- Bunch of instances?

instance ShowGhci Char where
    ...  -- The instance we want to be different.

printGhci :: ShowGhci a => a -> IO ()
printGhci = putStrLn . showGhci

然后,Ghci 可以通过将不同的实例带入上下文来更改打印的内容.

Ghci could then change what is printed by bringing different instances into context.

这篇关于如何破解 GHCi(或 Hugs)以便打印未转义的 Unicode 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆