如何将Markdown代码块与RegEx匹配? [英] How can I match a Markdown code block with RegEx?

查看:86
本文介绍了如何将Markdown代码块与RegEx匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PCRE RegEx从Markdown文档中提取 code块.对于未开始的人,因此在Markdown中定义了一个代码块:

要在Markdown中生成代码块,只需缩进代码行的每一行至少间隔4个空格或1个制表符.代码块将继续直到到达未缩进的行(或文章结尾)为止.

因此,给出以下文字:

 这是一个代码块:我需要一起捕捉这条线这是下面的代码栏(将被忽略):json这必须有三个反引号侧翼```我也喜欢内联代码",但不捕捉还有一个简短的代码块:俘虏我 

到目前为止,我有此RegEx:

 (?:[] {4,} | \ t {1,})(.+) 

但是它只捕获每行以至少四个空格或一个制表符为前缀的行.它不能捕获整个块.

我需要帮助的是如何将条件设置为捕获4个空格或1个制表符之后的所有内容,直到到达未缩进的行或文本的结尾.>

这是正在进行的在线工作:

https://www.regex101.com/r/yMQCIG/5

解决方案

您应将字符串的开始/结束标记( ^ $ m 修饰符).此外,您的测试文字的最后一个空格中只有3个前导空格:

  ^((?:(?:[] {4} | \ t).*(\ R | $))+) 

使用 \ R 和重复,您可以在每次匹配中匹配一个整块,而不是每次匹配都匹配一行.

请参见 regex101

免责声明:减价的规则比示例文本显示的要复杂.例如,当(嵌套的)列表中包含代码块时,这些代码块需要加8、12或更多的空格作为前缀.正则表达式不适合识别此类代码块或嵌入使用更广泛格式组合的markdown表示法中的其他代码块.

I am trying to extract a code block from a Markdown document using PCRE RegEx. For the uninitiated, a code block in Markdown is defined thus:

To produce a code block in Markdown, simply indent every line of the block by at least 4 spaces or 1 tab. A code block continues until it reaches a line that is not indented (or the end of the article).

So, given this text:

This is a code block:

    I need capturing along with
    this line

This is a code fence below (to be ignored):

``` json
This must have three backticks
flanking it
```

I love `inline code` too but don't capture

and one more short code block:

    Capture me

So far I have this RegEx:

(?:[ ]{4,}|\t{1,})(.+)

But it simply captures each line prefixed with at least four spaces or one tab. It doesn't capture the whole block.

What I need help with is how to set the condition to capture everything after 4 spaces or 1 tab until you either get to a line that is not indented or the end of the text.

Here's an online work in progress:

https://www.regex101.com/r/yMQCIG/5

解决方案

You should use begin/end-of-string markers (^ and $ in combination with the m modifier). Also, your test text had only 3 leading spaces in the final block:

^((?:(?:[ ]{4}|\t).*(\R|$))+)

With \R and the repetition you match one whole block with each single match, instead of a line per match.

See demo on regex101

Disclaimer: The rules of markdown are more complicated than the presented example text shows. For instance, when (nested) lists have code blocks in them, these need to be prefixed with 8, 12 or more spaces. Regular expressions are not suitable to identify such code blocks, or other code blocks embedded in markdown notation that uses the wider range of format combinations.

这篇关于如何将Markdown代码块与RegEx匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆