使用 Python 清理用户输入 [英] Sanitising user input using Python

查看:32
本文介绍了使用 Python 清理用户输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为基于 Python 的 Web 应用程序清理用户输入的最佳方法是什么?是否有一个函数可以删除 HTML 字符和任何其他必要的字符组合以防止 XSS还是 SQL 注入攻击?

What is the best way to sanitize user input for a Python-based web application? Is there a single function to remove HTML characters and any other necessary characters combinations to prevent an XSS or SQL injection attack?

推荐答案

这里是一个片段,它将删除所有不在白名单上的标签,以及所有不在属性白名单上的标签属性(所以你不能使用 点击).

Here is a snippet that will remove all tags not on the white list, and all tag attributes not on the attribues whitelist (so you can't use onclick).

它是http://www.djangosnippets.org/snippets/205/,在属性值上使用正则表达式以防止人们使用 href="javascript:...",以及 http://ha.ckers.org/xss.html.
(例如 <a href="ja&#x09;vascript:alert('hi')"><a href="ja vascript:alert('hi')"> 等)

It is a modified version of http://www.djangosnippets.org/snippets/205/, with the regex on the attribute values to prevent people from using href="javascript:...", and other cases described at http://ha.ckers.org/xss.html.
(e.g. <a href="ja&#x09;vascript:alert('hi')"> or <a href="ja vascript:alert('hi')">, etc.)

如您所见,它使用(很棒的)BeautifulSoup 库.

As you can see, it uses the (awesome) BeautifulSoup library.

import re
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup, Comment

def sanitizeHtml(value, base_url=None):
    rjs = r'[\s]*(&#x.{1,7})?'.join(list('javascript:'))
    rvb = r'[\s]*(&#x.{1,7})?'.join(list('vbscript:'))
    re_scripts = re.compile('(%s)|(%s)' % (rjs, rvb), re.IGNORECASE)
    validTags = 'p i strong b u a h1 h2 h3 pre br img'.split()
    validAttrs = 'href src width height'.split()
    urlAttrs = 'href src'.split() # Attributes which should have a URL
    soup = BeautifulSoup(value)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        # Get rid of comments
        comment.extract()
    for tag in soup.findAll(True):
        if tag.name not in validTags:
            tag.hidden = True
        attrs = tag.attrs
        tag.attrs = []
        for attr, val in attrs:
            if attr in validAttrs:
                val = re_scripts.sub('', val) # Remove scripts (vbs & js)
                if attr in urlAttrs:
                    val = urljoin(base_url, val) # Calculate the absolute url
                tag.attrs.append((attr, val))

    return soup.renderContents().decode('utf8')

正如其他海报所说,几乎所有 Python db 库都会处理 SQL 注入,所以这应该几乎涵盖了您.

As the other posters have said, pretty much all Python db libraries take care of SQL injection, so this should pretty much cover you.

这篇关于使用 Python 清理用户输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆