您将如何解析Markdown? [英] How would you go about parsing Markdown?

查看:82
本文介绍了您将如何解析Markdown?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近了解到一个名为CommonMark的项目,该项目 正确识别并处理原始内容中的歧义 降价规范. http://commonmark.org/它具有出色的C#库 支持.

I recently learned about a project called CommonMark, which correctly identifies and deals with the ambiguities in the original Markdown specification. http://commonmark.org/ It has great C# library support.

您可以在此处找到语法.

下载后的源代码写在 Perl 中,我无意尊敬.它充满了正则表达式,并且依赖 MD5 散列来转义某些字符.这有什么不对劲!

The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!

我将要为 Markdown 的解析器进行硬编码.这有什么经验?

I'm about to hard code a parser for Markdown. What is experience with this?

如果您对Markdown的实际解析没有什么有意义的话,请抽出我的时间. (这听起来可能很刺耳,但是,是的,我正在寻找见识,而不是解决方案,即第三方库).

If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).

为了对答案有所帮助,正则表达式旨在识别模式!不解析整个语法.人们认为这样做是foobar.

To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.

  • 如果您考虑Markdown,那么它基本上是基于段落的概念.
  • 因此,一种合理的方法可能是将输入内容分成多个段落.
  • 有很多段落,例如标题,文本,列表,块引用和代码.
  • 因此,挑战在于识别这些段落以及它们在什么情况下出现.
  • If you think about Markdown, it's fundamentally based around the concept of paragraphs.
  • As such, a reasonable approach might be to split the input into paragraphs.
  • There are many kinds of paragraphs, for example, heading, text, list, blockquote, and code.
  • The challenge is thus to identify these paragraphs and in what context they occur.

一旦找到值得分享的解决方案,我会回来的.

I'll be back with a solution, once I find it's worthy to be shared.

推荐答案

我知道的唯一使用实际解析器的降价实现是 peg-markdown . 其解析器基于 peg .

The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.

毛里西奥·费尔南德斯(Mauricio Fernandez)最近发布了他的 OCaml 编写的,所以它极其简单而简短(268 SLOC 解析器 HTML发射器),但 BlueCloth ( Ruby )),尽管它尚未针对性能进行优化.由于它仅供毛里西奥本人内部使用,因此与官方Markdown的使用存在一些差异规范,但是Mauricio创建了分支,该分支还原了大部分更改.

Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.

这篇关于您将如何解析Markdown?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆