如何破解GHCi(或拥抱),以便打印Unicode字符未转义? [英] How to hack GHCi (or Hugs) so that it prints Unicode chars unescaped?
问题描述
putStrLn
, putChar
这看起来不错,可读性强) - 这个例子显示了GHCi和Hugs98: $ ghci
GHCi,版本7.0.1:http://www.haskell.org/ghc/:?寻求帮助
Prelude> hello:привет
hello:\1087\\1088\\1080\\1074\1077\1090
Prelude> 'Я'
'\1071'
Prelude> putStrLnhello:привет
hello:привет
Prelude> :q
离开GHCi。
$ hugs -98
__ __ __ __ ____ ___ _________________________________________
|| || || || || ||拥抱98:基于Haskell 98标准
|| ___ || || __ || || __ || __ ||版权(c)1994-2005
|| --- || ___ ||万维网:http://haskell.org/hugs
|| ||错误:http://hackage.haskell.org/trac/hugs
|| ||版本:2006年9月_________________________________________
拥抱模式:使用命令行选项+98重新启动Haskell 98模式
类型:?寻求帮助
拥抱> hello:привет
hello:\1087\1088\\1080\\1074\1077\1090
拥抱> 'Я'
'\1071'
拥抱> putStrLnhello:привет
hello:привет
拥抱> :q
[离开拥抱]
$ locale
LANG = ru_RU.UTF-8
LC_CTYPE =ru_RU.UTF-8
LC_NUMERIC =ru_RU.UTF -8
LC_TIME =ru_RU.UTF-8
LC_COLLATE =ru_RU.UTF-8
LC_MONETARY =ru_RU.UTF-8
LC_MESSAGES =ru_RU .UTF-8
LC_PAPER =ru_RU.UTF-8
LC_NAME =ru_RU.UTF-8
LC_ADDRESS =ru_RU.UTF-8
LC_TELEPHONE = ru_RU.UTF-8
LC_MEASUREMENT =ru_RU.UTF-8
LC_IDENTIFICATION =ru_RU.UTF-8
LC_ALL =
$
我们可以猜测这是因为 print
和 show
被用来格式化结果,而这些函数尽力以规范的,最便携的方式格式化数据 - 所以他们宁愿逃避奇怪的字符(也许,它甚至在Haskell的一个标准中阐述):
$ ghci
GHCi,版本7.0.1:http:// www.haskell.org/ghc/:寻求帮助
Prelude> show'Я'
'\\1071'
Prelude> :q
离开GHCi。
$ hugs -98
类型:?寻求帮助
拥抱>显示'Я'
'\\1071'
拥抱> :q
[离开拥抱]
$
但仍然会很好如果我们知道如何破解GHCi或拥抱,以便用人可读的方式打印这些字符,即直接地,非转义地打印这些字符。在教育目的中使用交互式Haskell环境时,可以理解这一点,在非英语用户面前的Haskell教程/演示中,您希望以人类语言在数据上显示Haskell。
实际上,它不仅适用于教育目的,也适用于调试!如果您的函数是在表示其他语言的字符串的字符串上定义的,则为非ASCII字符。因此,如果程序是特定于语言的,而且只有其他语言的语言才是数据的意义,并且只有这些语言才能定义函数,那么在GHCi中调试以查看这些数据是非常重要的。
总结我的问题:有什么方法来破解现有的交互式Haskell环境,以便在结果中更友好地打印Unicode? (在我的例子中,Friendlier的意思是更简单:我想在GHCi或Hugs中用 print
显示非拉丁字符,就像 $除了GHCi和Hugs98之外,我还会看看现有的Emacs模式与Haskell进行交互,看看他们是否能以漂亮的,非转义的方式呈现结果。)
选项1(坏):
修改这行代码:
https://github.com/ghc/ packages-base / blob / ba98712 / GHC / Show.lhs#L356
showLitChar cs | c> '\DEL'= showChar'\\'(protectEsc isDec(shows(ord c))s)
重新编译ghc。
选项2(大量工作):
当GHCi类型检查一个解析语句结束于 tcRnStmt
,它依赖于 mkPlan
(都在 https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs )。这将尝试键入检查输入的语句的几个变体,包括:
let it = expr in print it>> ;
$ / code>
print_it = L loc $ ExprStmt(nlHsApp(nlHsVar printName)(nlHsVar fresh_it))
(HsVar thenIOName)placeHolderType
所有可能需要更改的地方是 printName
(它绑定到 System。 IO.print
)。如果它绑定到类似于
class ShowGhci a where
showGhci :: a - >字符串
...
- 一堆实例?
实例ShowGhci Char其中
... - 我们希望不同的实例。
printGhci :: ShowGhci a => a - > IO()
printGhci = putStrLn。 showghci
然后,Ghci可以通过将不同的实例带入上下文来改变打印内容。 $ b
Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn
, putChar
which looks fine and readable)--the examples show GHCi and Hugs98:
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Prelude> 'Я'
'\1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__ __ __ __ ____ ___ _________________________________________
|| || || || || || ||__ Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__|| __|| Copyright (c) 1994-2005
||---|| ___|| World Wide Web: http://haskell.org/hugs
|| || Bugs: http://hackage.haskell.org/trac/hugs
|| || Version: September 2006 _________________________________________
Hugs mode: Restart with command line option +98 for Haskell 98 mode
Type :? for help
Hugs> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Hugs> 'Я'
'\1071'
Hugs> putStrLn "hello: привет"
hello: привет
Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$
We can guess that it's because print
and show
are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> show 'Я'
"'\\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\\1071'"
Hugs> :q
[Leaving Hugs]
$
But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.
Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.
To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print
in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar
, putStrLn
, i.e. unescaped.)
(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)
Option 1 (bad):
Modify this line of code:
https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356
showLitChar c s | c > '\DEL' = showChar '\\' (protectEsc isDec (shows (ord c)) s)
And recompile ghc.
Option 2 (lots of work):
When GHCi type checks a parsed statement it ends up in tcRnStmt
which relies on mkPlan
(both in https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs). This attempts to type check several variants of the statement that was typed in including:
let it = expr in print it >> return [coerce HVal it]
Specifically:
print_it = L loc $ ExprStmt (nlHsApp (nlHsVar printName) (nlHsVar fresh_it))
(HsVar thenIOName) placeHolderType
All that might need to change here is printName
(which binds to System.IO.print
). If it instead bound to something like printGhci
which was implemented like:
class ShowGhci a where
showGhci :: a -> String
...
-- Bunch of instances?
instance ShowGhci Char where
... -- The instance we want to be different.
printGhci :: ShowGhci a => a -> IO ()
printGhci = putStrLn . showGhci
Ghci could then change what is printed by bringing different instances into context.
这篇关于如何破解GHCi(或拥抱),以便打印Unicode字符未转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!