Clojure &ClojureScript:clojure.core/read-string、clojure.edn/read-string 和 cljs.reader/read-string [英] Clojure & ClojureScript: clojure.core/read-string, clojure.edn/read-string and cljs.reader/read-string

查看:15
本文介绍了Clojure &ClojureScript:clojure.core/read-string、clojure.edn/read-string 和 cljs.reader/read-string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不清楚所有这些读取字符串函数之间的关系.嗯,很明显 clojure.core/read-string 可以读取任何由 pr[n] 甚至 print-dup 输出的序列化字符串代码>.很明显,clojure.edn/read-string 确实读取了根据 EDN 规范格式化的字符串.

I am not clear about the relationship between all these read-string functions. Well, it is clear that clojure.core/read-string can read any serialized string that is output by pr[n] or even print-dup. It is also clear that clojure.edn/read-string does read strings that are formatted according to the EDN specification.

不过,我是从 Clojure Script 开始的,不清楚 cljs.reader/read-string 是否符合.这个问题是由于我有一个发出以这种方式序列化的 clojure 代码的 Web 服务这一事实引发的:

However, I am starting with Clojure Script, and it is not clear if cljs.reader/read-string comply with. This question has been triggered by the fact that I had a web service that was emiting clojure code serialized that way:

(with-out-str (binding [*print-dup* true] (prn tags)))

那是产生包含数据类型的对象序列化.但是,cljs.reader/read-string 无法读取.我总是收到这种类型的错误:

That was producing the object serialization which includes the datatypes. However, this was not readable by cljs.reader/read-string. I was always getting error of this type:

Could not find tag parser for = in ("inst" "uuid" "queue" "js")  Format should have been EDN (default)

起初,我以为这个错误是由 cljs-ajax 抛出的,但在犀牛 REPL 中测试了 cljs.reader/read-string 后,我得到了同样的错误,这意味着它是由 cljs.reader/read-string 本身抛出的.它是由 cljs.reader 中的 maybe-read-tagged-type 函数抛出的,但不清楚这是因为阅读器只处理 EDN 数据,还是...?

At first, I thought that this error was thrown by cljs-ajax but after testing the cljs.reader/read-string in a rhino REPL, I got the same error, which means it is thrown by cljs.reader/read-string itself. It is thrown by the maybe-read-tagged-type function in cljs.reader but it is not clear if this is because the reader only works with EDN data, or if...?

此外,从 与 Clojure 的差异 文档中,唯一提到的是:

Also, from the Differences from Clojure document, the only thing that is said is:

The read and read-string functions are located in the cljs.reader namespace

这表明它们应该具有完全相同的行为.

Which suggests that they should exactly have the same behavior.

推荐答案

总结:Clojure 是 EDN 的超集.默认情况下,当给定 Clojure 数据结构时,prprnpr-str 会生成有效的 EDN.*print-dup* 改变了这一点,使它们能够充分利用 Clojure 的强大功能,在往返后对内存中对象的相同性"提供更有力的保证.ClojureScript 只能读取 EDN,不能读取完整的 Clojure.

Summary: Clojure is a superset of EDN. By default, pr, prn and pr-str, when given Clojure data structures, produce valid EDN. *print-dup* changes that and makes them use the full power of Clojure to give stronger guarantees about the "sameness" of the objects in memory after a round-trip. ClojureScript can only read EDN, not full Clojure.

简单的解决方案:不要将*print-dup*设置为true,而只将Clojure中的纯数据传递给ClojureScript.

Easy solution: do not set *print-dup* to true, and only pass pure data from Clojure to ClojureScript.

更难的解决方案:使用标记文字,双方都有(可能共享的)关联阅读器.(不过,这仍然不会涉及 *print-dup*.)

Harder solution: use tagged literals, with a (possibly shared) associated reader on both sides. (This will still not involve *print-dup*, though.)

切向相关:EDN 的大多数用例都包含在 Transit 中,后者速度更快,尤其是在 ClojureScript 方面.

Tangentially related: most use-cases for EDN are covered by Transit, which is faster, especially on the ClojureScript side.

让我们从 Clojure 部分开始.Clojure 从一开始就有一个 clojure.core/read-string 函数,它在 Read-Eval-Print-Loop 的旧 Lispy 意义上 read 一个字符串,即它允许访问编译 Clojure 中使用的实际阅读器.[0]

Let's start with the Clojure part. Clojure had, from the start, a clojure.core/read-string function, which reads a string in the old Lispy sense of the Read-Eval-Print-Loop, i.e. it gives access to the actual reader used in the compilation of Clojure.[0]

后来,Rich Hickey &co 决定推广 Clojure 的数据表示法并发布 EDN 规范.EDN 是 Clojure 的子集;它仅限于 Clojure 语言的数据元素.

Later on, Rich Hickey & co decided to promote the data notation of Clojure and published the EDN spec. EDN is a subset of Clojure; it is limited to the data elements of the Clojure language.

由于 Clojure 是一个 Lisp,并且与所有 lisp 一样,吹捧代码就是数据就是代码"的理念,因此上述段落的实际含义可能并不完全清楚.我不确定任何地方都有详细的差异,但是仔细检查 Clojure 阅读器描述 和前面提到的 EDN 规范会发现一些差异.最明显的区别在于宏字符,尤其是 # 调度符号,它在 Clojure 中比在 EDN 中具有更多目标.例如,#(* % %) 符号是有效的 Clojure,Clojure 阅读器将其转换为等效于以下 EDN:(fn [x] (* xx)).对于这个问题特别重要的是几乎没有记录的 #= 特殊阅读器宏,它可用于在阅读器内部执行任意代码.

As Clojure is a Lisp and, like all lisps, touts the "code is data is code" philosophy, the actual implications of the above paragraph may not be completely clear. I am not sure there is a detailed diff anywhere, but a careful examination of the Clojure Reader description and the previously mentioned EDN spec shows a few differences. The most obvious differences are around macro characters and in particular the # dispatch symbol, which has many more targets in Clojure than in EDN. For example, the #(* % %) notation is valid Clojure, which the Clojure reader will turn into the equivalent of the following EDN: (fn [x] (* x x)). Of particular importance for this question is the scarcely documented #= special reader macro, which can be used to execute arbitrary code right inside the reader.

由于 Clojure 阅读器可以使用完整的语言,因此可以将代码嵌入到阅读器正在阅读的字符串中,并立即在阅读器中对其进行评估.可以在此处找到一些示例.

As the complete language is available to the Clojure reader, it is possible to embed code into the character string that the reader is reading and have it evaluated right then and there in the reader. A few examples can be found here.

clojure.edn/read-string 函数严格限于 EDN 格式,而不是整个 Clojure 语言.特别是,它的操作不受 *read-eval* 变量的影响,它无法读取所有可能的有效 Clojure 代码片段.

The clojure.edn/read-string function is strictly limited to the EDN format, not the whole Clojure language. In particular, its operation is not influenced by the *read-eval* variable and it cannot read all of the valid Clojure code fragments possible.

事实证明,由于历史原因,Clojure 阅读器是用 Java 编写的.由于它是一个重要的软件,运行良好,并且经过了几年的 Clojure 在野外使用的大量调试和实战测试,Rich Hickey 决定在 ClojureScript 编译器中重用它(这是主要原因ClojureScript 编译器在 JVM 上运行).ClojureScript 编译过程主要发生在 JVM 上,其中 Clojure 阅读器可用,因此 ClojureScript 代码由 clojure.core/read-string(或者更确切地说是它的近亲 clojure.core/read) 函数.

It turns out that the Clojure reader is, for mostly historical reasons, written in Java. As it is a significant piece of software, works well, and has been largely debugged and battle-tested by a few years of active Clojure usage in the wild, Rich Hickey decided to reuse it in the ClojureScript compiler (this is the main reason why the ClojureScript compiler runs on the JVM). The ClojureScript compilation process happens mostly on the JVM, where the Clojure reader is available, and thus ClojureScript code is parsed by the clojure.core/read-string (or rather its close cousin clojure.core/read) function.

但是您的 Web 应用程序无法访问正在运行的 JVM.为 ClojureScript 应用程序要求 Java 小程序看起来并不是一个很有前途的想法,尤其是因为 ClojureScript 的主要目标是将 Clojure 语言的范围扩展到 JVM(和 CLR)的范围之外.所以决定 ClojureScript 不能访问它自己的阅读器,因此也不能访问它自己的编译器(即没有 eval 也没有 read 或ClojureScript 中的 read-string).这个决定及其影响在这里中有更详细的讨论,由一个真正知道事情如何发生的人(我是不存在,所以这个解释的历史角度可能有些不准确).

But your web application does not have access to a running JVM. Requiring a Java applet for ClojureScript applications did not look like a very promising idea, especially as the main objective of ClojureScript was to extend the reach of the Clojure language beyond the confines of the JVM (and the CLR). So the decision was taken that ClojureScript would not have access to its own reader, and consequently would not have access to its own compiler either (i.e. there is no eval nor read nor read-string in ClojureScript). This decision and its implications are discussed in greater details here, by someone who actually knows how things happened (I was not there, so there may be some inaccuracies in the historical perspective of this explanation).

所以 ClojureScript 没有与 clojure.core/read-string 等价的东西(有些人会争辩说它因此不是真正的 lisp).尽管如此,如果有某种方式在 Clojure 服务器和 ClojureScript 客户端之间交流 Clojure 数据结构会很好,这确实是 EDN 工作的推动因素之一.正如 Clojure 在 EDN 规范发布后获得了一个受限(并且更安全)的阅读功能(clojure.edn/read-string)一样,ClojureScript 也获得了一个 EDN 阅读器标准发行版为 cljs.reader/read-string.可能有人会争辩说,这两个函数的名称(或者更确切地说是它们的命名空间)之间的一致性会更好.

So ClojureScript has no equivalent of clojure.core/read-string (and some would argue that it is therefore not a true lisp). Still, it would be nice to have some way to communicate Clojure data structures between a Clojure server and a ClojureScript client, and indeed that was one of the motivating factors in the EDN effort. Just as Clojure got a restricted (and safer) reading function (clojure.edn/read-string) after the publication of the EDN spec, ClojureScript also got an EDN reader in the standard distribution as cljs.reader/read-string. It may be argued that a little more consistency between the names of these two functions (or rather their namespace) would have been good.

在我们最终回答您最初的问题之前,我们需要再了解一点关于 *print-dup* 的上下文.请记住,*print-dup* 是 Clojure 1.0 的一部分,这意味着它早于 EDN、标记文字和记录的概念.我认为 EDN 和标记文字为 *print-dup* 的大多数用例提供了更好的替代方案.由于 Clojure 通常建立在一些数据抽象(列表、向量、集合、映射和通常的标量)之上,因此打印/读取循环的默认行为是保留数据的抽象形状(映射是map),但不是特别是它的具体类型.例如Clojure有多个map抽象的实现,比如PersistentArrayMap 用于小地图,PersistentHashMap 用于更大的地图.该语言的默认行为假定您不关心具体类型.

Before we can finally answer your original question, we need one more little piece of context regarding *print-dup*. Remember that *print-dup* was part of Clojure 1.0, which means it predates EDN, the notion of tagged literals, and records. I would argue that EDN and tagged literals offer a better alternative for most of the use-cases of *print-dup*. As Clojure is generally built on top of a few data abstractions (list, vector, set, map, and the usual scalars), the default behaviour of the print/read cycle is to preserve the abstract shape of the data (a map is a map), but not especially its concrete type. For example, Clojure has multiple implementations of the map abstraction, such as PersistentArrayMap for small maps and PersistentHashMap for bigger one. The default behaviour of the language assumes that you do not care about the concrete type.

对于您这样做的极少数情况,或者对于更专业的类型(当时使用 deftype 或 defstruct 定义),您可能需要更多地控制如何读取它们,这就是 print-dup 的用途.

For the rare cases where you do, or for the more specialized types (defined with deftype or defstruct, at the time), you might want more control about how these are read, and that is what print-dup is for.

重点是,将 *print-dup* 设置为 truepr 和 family 不会产生有效的 EDN,但实际上 Clojure数据包括一些显式的 #=(eval build-my-special-type) 形式,这些形式 不是有效的 EDN.

The point is, with *print-dup* set to true, pr and family will not produce valid EDN, but actually Clojure data including some explicit #=(eval build-my-special-type) forms, which are not valid EDN.

[0]:在lisps"中,编译器是根据数据结构明确定义的,而不是根据字符串.虽然这看起来与通常的编译器(它们确实在处理过程中确实将字符流转换为数据结构)的区别很小,但 Lisp 的定义特征是读取器发出的数据结构是语言.换句话说,编译器基本上只是该语言中始终可用的函数.这不像以前那样独特,因为大多数动态语言都支持某种形式的eval;Lisp 的独特之处在于 eval 采用数据结构,而不是字符串,这使得动态代码生成和评估变得更加容易.编译器只是另一个功能"的一个重要含义是,编译器实际上在整个语言已经定义和可用的情况下运行,并且到目前为止读取的所有代码也可用,这为 Lisp 宏系统打开了大门.

[0]: In "lisps", the compiler is explicitly defined in terms of data structures, rather than in terms of character strings. While that may seem like a small difference with usual compilers (which do indeed transform the character stream into data structures during their processing), the defining characteristic of Lisp is that the data structures that are emitted by the reader are the data structures commonly used in the language. In other words, the compiler is basically just a function available at all times in the language. This is not as unique as it used to be, as most dynamic languages support some form of eval; what is unique to Lisp is that eval takes a data structure, not a character string, which makes dynamic code generation and evaluation much easier. One important implication of the compiler being "just another function" is that the compiler actually runs with the whole language already defined and available, and all of the code read so far also available, which opens up the door to the Lisp macro system.

这篇关于Clojure &ClojureScript:clojure.core/read-string、clojure.edn/read-string 和 cljs.reader/read-string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆