使用 python/django 从字符串中删除非 ASCII 字符 [英] Remove non-ASCII characters from a string using python / django

查看:42
本文介绍了使用 python/django 从字符串中删除非 ASCII 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在数据库中存储了一个 HTML 字符串.不幸的是,它包含诸如 ® 之类的字符我想用它们的 HTML 等价物替换这些字符,无论是在数据库本身中还是在我的 Python/Django 代码中使用查找替换.

I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB itself or using a Find Replace in my Python / Django code.

关于我如何做到这一点的任何建议?

Any suggestions on how I can do this?

推荐答案

可以使用 ASCII 字符是前 128 个字符,所以用 ord 获取每个字符的编号,如果超出范围

You can use that the ASCII characters are the first 128 ones, so get the number of each character with ord and strip it if it's out of range

# -*- coding: utf-8 -*-

def strip_non_ascii(string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)


test = u'éáé123456tgreáé@€'
print test
print strip_non_ascii(test)

结果

éáé123456tgreáé@€
123456tgre@

请注意,包含 @ 是因为它毕竟是一个 ASCII 字符.如果你想去掉一个特定的子集(比如数字和大写和小写字母),你可以限制范围查看 ASCII表

Please note that @ is included because, well, after all it's an ASCII character. If you want to strip a particular subset (like just numbers and uppercase and lowercase letters), you can limit the range looking at a ASCII table

已再次阅读您的问题后,您可能需要转义 HTML 代码,以便所有这些字符在呈现后都能正确显示.您可以在模板上使用 escape 过滤器.

EDITED: After reading your question again, maybe you need to escape your HTML code, so all those characters appears correctly once rendered. You can use the escape filter on your templates.

这篇关于使用 python/django 从字符串中删除非 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆