如何破解GHCi(或拥抱),以便打印Unicode字符未转义? [英] How to hack GHCi (or Hugs) so that it prints Unicode chars unescaped?

查看:198
本文介绍了如何破解GHCi(或拥抱),以便打印Unicode字符未转义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看看这个问题:通常情况下,在交互式Haskell环境中,非拉丁Unicode字符(构成结果的一部分)被打印出来,即使语言环境允许这样的字符(而不是直接通过 putStrLn putChar 这看起来不错,可读性强) - 这个例子显示了GHCi和Hugs98:

  $ ghci 
GHCi,版本7.0.1:http://www.haskell.org/ghc/:?寻求帮助
Prelude> hello:привет
hello:\1087\\1088\\1080\\1074\1077\1090
Prelude> 'Я'
'\1071'
Prelude> putStrLnhello:привет
hello:привет
Prelude> :q
离开GHCi。
$ hugs -98
__ __ __ __ ____ ___ _________________________________________
|| || || || || ||拥抱98:基于Haskell 98标准
|| ___ || || __ || || __ || __ ||版权(c)1994-2005
|| --- || ___ ||万维网:http://haskell.org/hugs
|| ||错误:http://hackage.haskell.org/trac/hugs
|| ||版本:2006年9月_________________________________________

拥抱模式:使用命令行选项+98重新启动Haskell 98模式

类型:?寻求帮助
拥抱> hello:привет
hello:\1087\1088\\1080\\1074\1077\1090
拥抱> 'Я'
'\1071'
拥抱> putStrLnhello:привет
hello:привет

拥抱> :q
[离开拥抱]
$ locale
LANG = ru_RU.UTF-8
LC_CTYPE =ru_RU.UTF-8
LC_NUMERIC =ru_RU.UTF -8
LC_TIME =ru_RU.UTF-8
LC_COLLATE =ru_RU.UTF-8
LC_MONETARY =ru_RU.UTF-8
LC_MESSAGES =ru_RU .UTF-8
LC_PAPER =ru_RU.UTF-8
LC_NAME =ru_RU.UTF-8
LC_ADDRESS =ru_RU.UTF-8
LC_TELEPHONE = ru_RU.UTF-8
LC_MEASUREMENT =ru_RU.UTF-8
LC_IDENTIFICATION =ru_RU.UTF-8
LC_ALL =
$

我们可以猜测这是因为 print show 被用来格式化结果,而这些函数尽力以规范的,最便携的方式格式化数据 - 所以他们宁愿逃避奇怪的字符(也许,它甚至在Haskell的一个标准中阐述):

  $ ghci 
GHCi,版本7.0.1:http:// www.haskell.org/ghc/:寻求帮助
Prelude> show'Я'
'\\1071'
Prelude> :q
离开GHCi。
$ hugs -98
类型:?寻求帮助
拥抱>显示'Я'
'\\1071'
拥抱> :q
[离开拥抱]
$

但仍然会很好如果我们知道如何破解GHCi或拥抱,以便用人可读的方式打印这些字符,即直接地,非转义地打印这些字符。在教育目的中使用交互式Haskell环境时,可以理解这一点,在非英语用户面前的Haskell教程/演示中,您希望以人类语言在数据上显示Haskell。

实际上,它不仅适用于教育目的,也适用于调试!如果您的函数是在表示其他语言的字符串的字符串上定义的,则为非ASCII字符。因此,如果程序是特定于语言的,而且只有其他语言的语言才是数据的意义,并且只有这些语言才能定义函数,那么在GHCi中调试以查看这些数据是非常重要的。



总结我的问题:有什么方法来破解现有的交互式Haskell环境,以便在结果中更友好地打印Unicode? (在我的例子中,Friendlier的意思是更简单:我想在GHCi或Hugs中用 print 显示非拉丁字符,就像 $除了GHCi和Hugs98之外,我还会看看现有的Emacs模式与Haskell进行交互,看看他们是否能以漂亮的,非转义的方式呈现结果。)

解决方案

选项1(坏):

修改这行代码:

https://github.com/ghc/ packages-base / blob / ba98712 / GHC / Show.lhs#L356

  showLitChar cs | c> '\DEL'= showChar'\\'(protectEsc isDec(shows(ord c))s)

重新编译ghc。

选项2(大量工作):

当GHCi类型检查一个解析语句结束于 tcRnStmt ,它依赖于 mkPlan (都在 https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs )。这将尝试键入检查输入的语句的几个变体,包括:

  let it = expr in print it>> ; 
$ / code>



  print_it = L loc $ ExprStmt(nlHsApp(nlHsVar printName)(nlHsVar fresh_it))
(HsVar thenIOName)placeHolderType

所有可能需要更改的地方是 printName (它绑定到 System。 IO.print )。如果它绑定到类似于

 class ShowGhci a where 
showGhci :: a - >字符串
...

- 一堆实例?

实例ShowGhci Char其中
... - 我们希望不同的实例。

printGhci :: ShowGhci a => a - > IO()
printGhci = putStrLn。 showghci

然后,Ghci可以通过将不同的实例带入上下文来改变打印内容。 $ b

Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn, putChar which looks fine and readable)--the examples show GHCi and Hugs98:

$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Prelude> 'Я'
'\1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__   __ __  __  ____   ___      _________________________________________
||   || ||  || ||  || ||__      Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__||  __||     Copyright (c) 1994-2005
||---||         ___||           World Wide Web: http://haskell.org/hugs
||   ||                         Bugs: http://hackage.haskell.org/trac/hugs
||   || Version: September 2006 _________________________________________

Hugs mode: Restart with command line option +98 for Haskell 98 mode

Type :? for help
Hugs> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Hugs> 'Я'
'\1071'
Hugs> putStrLn "hello: привет"
hello: привет

Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ 

We can guess that it's because print and show are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):

$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> show 'Я'
"'\\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\\1071'"
Hugs> :q
[Leaving Hugs]
$ 

But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.

Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.

To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar, putStrLn, i.e. unescaped.)

(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)

解决方案

Option 1 (bad):

Modify this line of code:

https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356

showLitChar c s | c > '\DEL' =  showChar '\\' (protectEsc isDec (shows (ord c)) s)

And recompile ghc.

Option 2 (lots of work):

When GHCi type checks a parsed statement it ends up in tcRnStmt which relies on mkPlan (both in https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs). This attempts to type check several variants of the statement that was typed in including:

let it = expr in print it >> return [coerce HVal it]

Specifically:

print_it  = L loc $ ExprStmt (nlHsApp (nlHsVar printName) (nlHsVar fresh_it))
                                      (HsVar thenIOName) placeHolderType

All that might need to change here is printName (which binds to System.IO.print). If it instead bound to something like printGhci which was implemented like:

class ShowGhci a where
    showGhci :: a -> String
    ...

-- Bunch of instances?

instance ShowGhci Char where
    ...  -- The instance we want to be different.

printGhci :: ShowGhci a => a -> IO ()
printGhci = putStrLn . showGhci

Ghci could then change what is printed by bringing different instances into context.

这篇关于如何破解GHCi(或拥抱),以便打印Unicode字符未转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆