您将如何解析 Markdown? [英] How would you go about parsing Markdown?

查看:35
本文介绍了您将如何解析 Markdown?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近了解到一个名为 CommonMark 的项目,它正确识别和处理原文中的歧义降价规范.http://commonmark.org/ 它有很棒的 C# 库支持.

I recently learned about a project called CommonMark, which correctly identifies and deals with the ambiguities in the original Markdown specification. http://commonmark.org/ It has great C# library support.

您可以在此处找到语法.

下载后的源代码是用 Perl 编写的,我无意使用尊敬.它充满了正则表达式,它依赖于 MD5 哈希来转义某些字符.这有点不对劲!

The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!

我即将对 Markdown 的解析器进行硬编码.这有什么经验?

I'm about to hard code a parser for Markdown. What is experience with this?

如果您对 Markdown 的实际解析没有任何有意义的说法,请给我时间.(这听起来可能很苛刻,但是是的,我正在寻找洞察力,而不是解决方案,即第三方库).

If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).

为了帮助解答,正则表达式旨在识别模式!不要解析整个语法.人们认为这样做是 foobar.

To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.

  • 如果您考虑 Markdown,它基本上是基于段落的概念.
  • 因此,一种合理的方法可能是将输入分成多个段落.
  • 段落有很多种,例如标题、文本、列表、块引用和代码.
  • 因此,挑战在于识别这些段落以及它们出现的上下文.

一旦我发现它值得分享,我会带着解决方案回来.

I'll be back with a solution, once I find it's worthy to be shared.

推荐答案

我所知道的唯一一个使用实际解析器的 Markdown 实现是 Jon MacFarleanepeg-markdown.其解析器基于解析表达式语法 名为挂钩.

The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.

Mauricio Fernandez 最近发布了他的 Simple Markup Markdown 解析器,这是他在 OcsiBlog 网络日志引擎.因为解析器是用 OCaml 编写的,所以它极其简单且简短(268 SLOC对于 解析器HTML 发射器),但极速(比 discount(用手工优化的 C 语言编写)并且比 六百 倍href="http://Deveiate.Org/projects/BlueCloth/" rel="noreferrer">BlueCloth (Ruby)),尽管它甚至还没有针对性能进行优化.因为它仅供 Mauricio 本人在其博客内部使用,所以与 官方 Markdown 有一些偏差规范,但 Mauricio 已经创建了一个恢复大部分更改的分支.

Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.

这篇关于您将如何解析 Markdown?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆