Google App Engine TextProperty和UTF-8:何时进行编码/解码 [英] Google App Engine TextProperty and UTF-8: When to Encode/Decode

查看:114
本文介绍了Google App Engine TextProperty和UTF-8:何时进行编码/解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



db.TextProperty和UTF-8以及Unicode和Decode / Encode让我感到困惑,因为它使用了Django模板和Webapp框架。许多。我真的很感谢一些专家可以提供一些建议。我搜索了整晚,仍然有很多问题。



我在做什么:

  [utf-8 form input] => [Python,在db.TextProperty中存储] => [需要时,用英语替换日语] => [HTML,UTF-8] 

根据这个答案

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'$ / $> p> # - * - coding:utf-8 - * -

和保存为utf-8格式的所有.py文件



以下是我的代码:

 #Model.py 
class MyModel(db.Model):
content = db.TextProperty()

Main.py
def post( self):
content = cgi.escape(self.request.get('content'))
#什么是内容类型? Unicode的?海峡?或其他?
obj = MyModel(content = content)
#obj = MyModel(content = unicode(content))
#obj = MyModel(content = unicode(content,'utf-8'))
#哪一个最好?
obj.put()

#用内容中的英文单词替换一个日语单词
content = obj.content
#这里的内容类型是什么? db.Text? Unicode的?海峡?或其他?
#content = unicode(obj.content,'utf-8')#​​是否有必要?
content = content.replace(u'ひと',u'hito')

#输出到HTML
self.response.out.write(template.render(path, {'content':content})
#self.response.out.write(template.render(path,{'content':content.encode('utf-8')})

希望有些Google App Engine工程师可以看到这个问题并提供一些帮助,非常感谢!

解决方案

首先,阅读此< a>。



简而言之,无论何时处理应用程序中的文本字符串,它都应该是一个unicode字符串。应该将其编码为一个字节字符串('str'的一个实例,而不是'unicode')当你想以字节发送数据时 - 例如,通过HTTP,并且当你接收到代表文本的字节时,你应该从字节字符串解码(和你应该对包含编码文本的字节字符串进行的唯一操作是对它们进行解码或编码。

幸运的是,大多数框架都会得到这个对;例如webapp和webapp2(我可以看到你正在使用webapp)应该从所有请求方法中返回unicode字符串,并且对你传递给它们的任何字符串进行适当的编码。确保你所负责的所有字符串都是unicode,并且你应该没问题。



请注意,字节字符串可以存储任何类型的数据 - 编码文本,可执行文件,图像,随机字节,加密数据等等。如果没有元数据,比如知道它是文本和它的编码,除了存储和检索它之外,你无法对它做任何明智的操作。



永远不要尝试解码一个unicode字符串,或者编码一个字节字符串;它不会做你期望的事情,而且事情会变得非常糟糕。

关于数据存储, db.Text unicode 的子类;到所有意图和目的它是一个unicode字符串 - 它只是不同的,所以数据存储可以告诉它不应该被索引。同样, db.Blob str 的子类,用于存储字节字符串。


I am on Google App Engine 2.5 with Django Template and Webapp Frame.

The db.TextProperty and UTF-8 and Unicode and Decode/Encode have confused me so much. I would really appreciate some experts can offer some suggestions. I have googled for the whole night and still have so many questions.

What I am trying to do:

[utf-8 form input] => [Python, Store in db.TextProperty] => [When Needed, Replace Japanese with English] => [HTML, UTF-8]

According to this answer Zipping together unicode strings in Python

# -*- coding: utf-8 -*-

and all .py files saved in utf-8 format

Here is my code:

#Model.py
class MyModel(db.Model):
  content = db.TextProperty()

#Main.py
def post(self):
    content=cgi.escape(self.request.get('content'))
    #what is the type of content? Unicode? Str? or Other?
    obj = MyModel(content=content)
    #obj = MyModel(content=unicode(content))
    #obj = MyModel(content=unicode(content,'utf-8'))
    #which one is the best?
    obj.put()

#Replace one Japanese word with English word in the content
content=obj.content
#what is the type of content here? db.Text? Unicode? Str? or Other?
#content=unicode(obj.content, 'utf-8') #Is this necessary?
content=content.replace(u'ひと',u'hito')

#Output to HTML
self.response.out.write(template.render(path, {'content':content})
#self.response.out.write(template.render(path, {'content':content.encode('utf-8')})

Hope some Google App Engine engineer can see this question and offer some help. Thanks a lot!

解决方案

First, read this. And this.

In a nutshell, whenever you're dealing with a text string in your app, it should be a unicode string. You should encode into a byte string (an instance of 'str' instead of 'unicode') when you want to send data as bytes - for instance, over HTTP, and you should decode from a byte string when you receive bytes that represent text (and you know their encoding). The only operations you should ever be doing on a byte string that contains encoded text are to decode or encode them.

Fortunately, most frameworks get this right; webapp and webapp2, for instance (I can see you're using webapp) should return unicode strings from all the request methods, and encode any strings you pass to them appropriately. Make sure all the strings you're responsible for are unicode, and you should be fine.

Note that a byte string can store any sort of data - encoded text, an executable, an image, random bytes, encrypted data, and so forth. Without metadata, such as the knowledge that it's text and what encoding it's in, you cannot sensibly do anything with it other than store and retrieve it.

Don't ever try to decode a unicode string, or encode a byte string; it will not do what you expect, and things will go horribly wrong.

Regarding the datastore, db.Text is a subclass of unicode; to all intents and purposes it is a unicode string - it's only different so the datastore can tell it shouldn't be indexed. Likewise, db.Blob is a subclass of str, for storing byte strings.

这篇关于Google App Engine TextProperty和UTF-8:何时进行编码/解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆