Markdown:如何显示预览(例如前N个字) [英] Markdown: how to show a preview (such as the first N words)

查看:117
本文介绍了Markdown:如何显示预览(例如前N个字)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Rails 4和Kramdown,但是我相信这个问题会扩展到任何具有Markdown支持的(网络)编程语言.

我正在建立一个博客网站.在概述页面上,我想显示每篇文章的开头.

由于文章可能很长,我只想显示第一部分.

一个简单的想法是在N个字符之后截断文章.一个更好的主意是在N个单词之后截断文章.

当然,当处理包含额外标记(例如markdown)的文档时,这可能会破坏内容,因此需要另一种解决方案.

如何仅显示Markdown文档的前100个单词而不破坏markdown标记?

解决方案

Let's use this **sample document** with _various_ types of [Markdown](http://daringfireball.net/projects/markdown/) markup.

现在,让我们假设您拿了前20个字符.您会得到:

Let's use this **sam

100个字符会给您:

Let's use this **sample document** with _various_ types of [Markdown](http://dar

尽管这些char长度是任意的,并且可能不是您要使用的长度,但要点是它们每个都破坏了Markdown语法.更好的方法是将文档解析为HTML,然后拆分HTML文档的开头.

当然,出于相同的原因,您可能希望使用某种HTML文档模型,而不是按原始字符长度进行拆分.为什么不简单地采用第一段呢?如果该段很长,请中断第N个字符,但仅计算正文文本中的字符,而不是组成HTML标记的字符.如何做到这一点取决于您使用的是哪个工具/库来处理HTML,而这并不是提出工具建议的地方(而且我对Ruby/Rails并不十分熟悉,更多的是Python方面的人).

请注意,我在上面给出的第二个示例中断了链接URL中间的Markdown.如果您首先将Markdown转换为HTML并仅中断对文本字符的计数,那么即使链接文本(标签)被截断,URL也将保持原样.尽管在这种情况下,最好在链接结束后截断文本.这取决于您要编写代码的复杂程度.

自然而然的下一步是问为什么不对Markdown文本执行所有操作,而不是首先将整个文档转换为HTML?您可以,但是然后您将重新实现自己的Markdown解析器...除非您碰巧使用了Markdown解析器,该解析器使您可以访问内部(通过某些插件API)或输出解析三.如果您使用的是返回解析树的解析器,则可以截断解析树,然后将其传递给渲染器.除此之外,使用解析的HTML可能是最好的选择.

无论哪种方式,让我们来看一个例子.上面示例的HTML看起来像这样:

<p>Let's use this <strong>sample document</strong> with <emphasis>various</emphasis> types of <a href="http://daringfireball.net/projects/markdown/">Markdown</a> documents</p>

现在,让我们将该文档表示为某种伪文档对象(使用JSON):

[{
    'type': 'element',
    'tag': 'p',
    'children' :
        [
            {
                'type': 'text',
                'text': "Let's use this "
            },
            {
                'type': 'element',
                'tag': 'strong',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "sample document"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': " with "
            },
            {
                'type': 'element',
                'tag': 'emphasis',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "various"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': " types of "
            },
            {
                'type': 'element',
                'tag': 'a',
                'href': 'http://daringfireball.net/projects/markdown/'
                'children': 
                    [
                        {
                            'type': text,
                            'text': "Markdown"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': "  markup."
            }
        ]
}]

现在,只需遍历该文档(及其子文档),仅对文本"类型的文本"字段中的字符数进行计数,直到达到最大值为止.然后截断文档中此后的所有其他元素.渲染文档时(使用适当的HTML渲染器),所有HTML元素都将被正确关闭.显然,确切的过程将取决于文档所包含的文档对象类型(可能取决于您使用的HTML解析器和/或Markdown解析器).

无论如何,文档被截断为20个字符将导致以下结果:

[{
    'type': 'element',
    'tag': 'p',
    'children' :
        [
            {
                'type': 'text',
                'text': "Let's use this "
            },
            {
                'type': 'element',
                'tag': 'strong',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "sampl"
                        }
                    ]
            },
        ]
}]

将其渲染为:

<p>Let's use this <strong>sampl</strong></p>

请注意,仅文本(Let's us this sampl)计为20个字符.

尽管以上示例使用字符,但您当然可以使用相同的原理并计算单词数.

I am using Rails 4 and Kramdown, but I believe that this question extends to any (web-) programming language with Markdown support.

I am making a blogging website. On the overview page, I want to show the start of each of the articles.

As an article can be very long, I only want to show the first part.

A bare-bones idea would be to just truncate the article after N characters. A slightly better idea would be to truncate the article after N words.

Of course, when dealing with a document that contains additional markup, such as markdown, this can and will break stuff, so another solution is needed.

How to show only the first, say, 100, words of a Markdown document without breaking the markdown markup?

解决方案

Let's use this **sample document** with _various_ types of [Markdown](http://daringfireball.net/projects/markdown/) markup.

Now, let's assume you take the first 20 chars. You would get:

Let's use this **sam

and 100 chars gives you:

Let's use this **sample document** with _various_ types of [Markdown](http://dar

While those char lengths are arbitrary and probably not lengths you would use, the point is that each of them break the Markdown syntax. A better approach would be parse the document to HTML, then break out the beginning of the HTML document.

Of course, you would probably want to use an HTML document model to some sort rather than splitting on raw char length for the same reasons. Why not simply take the first paragraph? If the paragraph is to long, break on the Nth char, but only counting the chars in the body text, not the chars which make up the HTML markup. How to do that would depend on which tool/library you are using to handle the HTML and this is not the place to make tool recommendations (and I'm not very familiar with Ruby/Rails - more of a Python guy).

Note that the second example I give above breaks the Markdown in the middle of a URL for a link. If you first convert the Markdown to HTML and break only counting text chars, then the URL will remain in tact even if the link text (label) gets truncated. Although, in that case, it might be better to truncate the text after the end of the link. That depends on how complicated you want to make your code.

A natural next step is to ask why not do all that with the Markdown text instead of converting the entire document to HTML first? You could, but then you would be re-implementing your own Markdown parser... unless you happen to use a Markdown parser which gives you access to the the internals (through some plug-in API) or outputs a parse three. If you are using a parser which returns a parse tree, you could truncate the parse tree, then pass it on to the renderer. Short of that, using parsed HTML is probably the best option.

Either way, lets work through an example. The HTML for the above example would look something like this:

<p>Let's use this <strong>sample document</strong> with <emphasis>various</emphasis> types of <a href="http://daringfireball.net/projects/markdown/">Markdown</a> documents</p>

Now, let's represent that document as some sort of pseudo document object (using JSON):

[{
    'type': 'element',
    'tag': 'p',
    'children' :
        [
            {
                'type': 'text',
                'text': "Let's use this "
            },
            {
                'type': 'element',
                'tag': 'strong',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "sample document"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': " with "
            },
            {
                'type': 'element',
                'tag': 'emphasis',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "various"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': " types of "
            },
            {
                'type': 'element',
                'tag': 'a',
                'href': 'http://daringfireball.net/projects/markdown/'
                'children': 
                    [
                        {
                            'type': text,
                            'text': "Markdown"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': "  markup."
            }
        ]
}]

Now, just loop through that document (and its children), only counting chars for the "text" field of "text" types until you reach your maximum. Then truncate any additional elements after that in the document. When the document is rendered (using a proper HTML renderer), all the HTML elements will be properly closed. Obviously, the exact process would depend on what sort of document object the document is contained in (which may depend on the HTML parser and/or Markdown parser you are using).

In any event, the document truncated to 20 chars would result in this:

[{
    'type': 'element',
    'tag': 'p',
    'children' :
        [
            {
                'type': 'text',
                'text': "Let's use this "
            },
            {
                'type': 'element',
                'tag': 'strong',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "sampl"
                        }
                    ]
            },
        ]
}]

Which would render as:

<p>Let's use this <strong>sampl</strong></p>

Note that the text only (Let's us this sampl) counts as 20 chars.

While the above examples use chars, you could certainly use the same principles and count words instead.

这篇关于Markdown:如何显示预览(例如前N个字)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆