使用python-markdown检查图片网址 [英] Check image urls using python-markdown

查看：77 发布时间：2020/5/6 3:48:54 python markdown

本文介绍了使用python-markdown检查图片网址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我正在创建的网站上，我正在使用 Python-Markdown 来设置新闻帖子的格式.为了避免死链接和HTTPS上的HTTPS页面内容问题，我要求编辑器将所有图像上传到站点，然后嵌入它们(我使用的是markdown编辑器，已对其进行修补，以便于轻松嵌入)这些图像使用标准markdown语法).

On a website I'm creating I'm using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems I'm requiring editors to upload all images to the site and then embed them (I'm using a markdown editor which I've patched to allow easy embedding of those images using standard markdown syntax).

但是，我想在代码中强制执行no-external-images政策.

However, I'd like to enforce the no-external-images policy in my code.

一种方法是编写一个正则表达式，以从markdown源代码中提取图像URL，甚至通过markdown渲染器运行它，并使用DOM分析器从img标签中提取所有src属性.

One way would be writing a regex to extract image URLs from the markdown sourcecode or even run it through the markdown renderer and use a DOM parser to extract all src attributes from img tags.

但是，我很好奇在解析过程中是否可以通过某种方式挂接到Python-Markdown中以提取所有图像链接或执行自定义代码(例如，如果链接是外部的，则引发异常).

However, I'm curious if there's some way to hook into Python-Markdown to extract all image links or execute custom code (e.g. raising an exception if the link is external) during parsing.

推荐答案

一种方法是在Markdown解析并构造它之后，在较低级别拦截<img>节点:

One approach would be to intercept the <img> node at a lower level just after Markdown parses and constructs it:

import re
from markdown import Markdown
from markdown.inlinepatterns import ImagePattern, IMAGE_LINK_RE

RE_REMOTEIMG = re.compile('^(http|https):.+')

class CheckImagePattern(ImagePattern):

    def handleMatch(self, m):
        node = ImagePattern.handleMatch(self, m)
        # check 'src' to ensure it is local
        src = node.attrib.get('src')
        if src and RE_REMOTEIMG.match(src):
            print 'ILLEGAL:', m.group(9)
            # or alternately you could raise an error immediately
            # raise ValueError("illegal remote url: %s" % m.group(9))
        return node

DATA = '''
![Alt text](/path/to/img.jpg)
![Alt text](http://remote.com/path/to/img.jpg)
'''

mk = Markdown()
# patch in the customized image pattern matcher with url checking
mk.inlinePatterns['image_link'] = CheckImagePattern(IMAGE_LINK_RE, mk)
result = mk.convert(DATA)
print result

输出:

ILLEGAL: http://remote.com/path/to/img.jpg
<p><img alt="Alt text" src="/path/to/img.jpg" />
<img alt="Alt text" src="http://remote.com/path/to/img.jpg" /></p>

这篇关于使用python-markdown检查图片网址的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用python-markdown检查图片网址 [英] Check image urls using python-markdown

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python-markdown检查图片网址 [英] Check image urls using python-markdown

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭