使用python / django从字符串中删除非ASCII字符 [英] Remove non-ASCII characters from a string using python / django

查看:195
本文介绍了使用python / django从字符串中删除非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一串HTML存储在数据库中。不幸的是它包含一些字符,如
我想用这个HTML等价物替换这些字符,无论是在DB本身还是在Python / Django代码中使用Find Replace。


$ b $有关如何做到这一点的任何建议?

解决方案

您可以使用ASCII字符是前128个字符,因此,使用 ord ,如果超出范围,请将其删除

 # -  *  - 编码:utf- 

def strip_non_ascii(string):
'''返回没有非ASCII字符的字符串'''
stripped =(c for string in string if 0< ; ord(c)< 127)
return''.join(stripped)


test =u'éáé123456tgreáé@ $'
print test
print strip_non_ascii(test)

结果

 éáé123456tgreáé@€
123456tgre @

请注意包含 @ ,因为毕竟它是一个ASCII字符。如果您要剥离特定子集(如数字和大小写字母),则可以限制范围,查看 ASCII表格



编辑:再次阅读您的问题后,也许您需要转载您的HTML代码,因此所有这些字符在呈现后都会正确显示。您可以在模板上使用 escape 过滤器。


I have a string of HTML stored in a database. Unfortunately it contains characters such as ® I want to replace these characters by their HTML equivalent, either in the DB itself or using a Find Replace in my Python / Django code.

Any suggestions on how I can do this?

解决方案

You can use that the ASCII characters are the first 128 ones, so get the number of each character with ord and strip it if it's out of range

# -*- coding: utf-8 -*-

def strip_non_ascii(string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)


test = u'éáé123456tgreáé@€'
print test
print strip_non_ascii(test)

Result

éáé123456tgreáé@€
123456tgre@

Please note that @ is included because, well, after all it's an ASCII character. If you want to strip a particular subset (like just numbers and uppercase and lowercase letters), you can limit the range looking at a ASCII table

EDITED: After reading your question again, maybe you need to escape your HTML code, so all those characters appears correctly once rendered. You can use the escape filter on your templates.

这篇关于使用python / django从字符串中删除非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆