内部字符串编码 [英] internal string encoding

查看:184
本文介绍了内部字符串编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解ASP经典如何在内部处理字符串。我GOOGLE和调试,但我仍然不知道如何串连接是ASP脚本中的codeD。

I'm trying to understand how ASP classic handles strings internally. I've googled and debugged, but I still don't know how a string is encoded within the ASP script.

请参阅下面的插图。

转化,使所有字符串变量具有相同的编码,无论输入数据源是什么?

大多数ASP-页面保存在磁盘上为UTF-8。他们不过是#包括保存与另一种编码ASP-文件。前端-页面顶部我设置响应编码为单向code。

Most ASP-pages are saved on disk as utf-8. They do however #include asp-files that are saved with another encoding. A the top of front-end-pages I set the Response encoding to unicode.

response.codepage = 65001   //unicode
reponse.charset = 'utf-8'

推荐答案

所有值得首先考虑到这两个UTF-8和Windows 1252(以及ISO-8859-1等)都是基于US-ASCII。在所有这些codepages的前128个字符是相同的。使用完全相同的双字节值,所有占用只是一个字节。

First of all its worth considering that the both UTF-8 and Windows-1252 (and ISO-8859-1 and others) are based on US-ASCII. The first 128 characters in all of these codepages are identical. Use exactly the same byte value and all occupy just one byte.

在很多情况下,绝大部分的内容是US-ASCII范围内,因此其很难说有任何区别。频繁整个文件只是使用US-ASCII字符,因此文件是尽管choosen编码相同(保存也许是BOM在文件的开始)。

In many cases the vast majority of the content is within the US-ASCII range so its hard to tell there is any difference between. Frequently the whole file is just using US-ASCII characters and hence the files are identical despite choosen encoding (save perhaps the BOM at the start of the file).

Basic脚本处理

首先,处理器结合了所有的包括一个ASP文件和包括那些包含。这样做是很简单地依次与包括文件的内容替换包括标记被引用。这是在字节级不试图将文件转换不同编码的纯粹做。

First the processor combines an ASP file with all its includes and the includes of those includes. This is done very simply sequentially replacing the include markers with the content of the include file being referenced. This is done purely at the byte level not attempt is made to convert files of different encodings.

接着该文件的合并版本被解析。符号化,编译连成一个紧密的友好interperter文件。它在这一点上的文件内容(剧本code块外的东西)块都变成了的Response.Write 的一种特殊形式。其特殊在于,在点脚本执行将达到这些特殊的写处理器简单的拷贝的逐字字节的文件中找到直接输出流,再没有试图将任何编码。

Next the combined version of the file is parsed. tokenized, "compiled" even into a tight interperter friendly file. Its at this point that chunks of content in the file (the stuff outside of script code blocks) are turned into a special form of Response.Write. Its special in that at the point script execution would reach these special writes the processor simply copies verbatim the bytes as found in the file directly to the output stream, again no attempt is made to convert any encodings.

脚本code和字符编码

在ASP处理器只是不与任何东西,是不是ASCII应付好。您的所有code,特别是在你的code你的字符串只应ASCII。

The ASP processor just doesn't cope well with anything that isn't ASCII. All your code and especially your string literals in your code should only be in ASCII.

什么可以是一个有点混乱,一旦脚本是执行所有的字符串变量使用统一code编码存储。

What can be a bit confusing once a script is executing all string variables are stored using Unicode encoding.

在code写的内容使用了正确的的Response.Write 方法,这是那里的响应。codePAGE 生效。它将连接code单向code字符串脚本提供了对响应code页面将其添加到输出流之前。

When code writes content the response using the proper Response.Write method this is where the Response.CodePage comes into effect. It will encode the unicode string the script provides to the response code page before adding it to the output stream.

什么是Response.Charset的效果

它增加了字符集属性到内容类型 HTTP标头。也就是说,它没有其他影响。如果设置这个字符集,而发送不同的一个,因为无论是你的回应。codePAGE不匹配,或因为文件的字节内容不在该编码,那么你可以预期的问题。

It adds the CharSet attribute to the Content-Type http header. That is it, it has no other impact. If set this one character set but send different one because either your Response.CodePage doesn't match it or because the byte content of the files are not in that encoding then you can expect problems.

输入编码

事情变得非常的混乱在这里。当表单数据发送到服务器有形式的URL编码标准没有规定申报使用code页面。浏览器可告知使用何种编码,他们将默认为html页面的字符集包含的形式,但没有任何机制选择与服务器通信。

Things get really messy here. When form data is posted to the server there is no provision in the form url encoding standard to declare the code page used. Browser can be told what encoding to use and they will default to the charset of the html page contain the form, but there is no mechanism to communicate that choice to the server.

ASP取认为,张贴表单域的codePAGE将同其对发送响应的codePAGE。花点时间来吸收这....这意味着,相当计数器intuatively的响应。codePAGE 值对由的Request.Form 。出于这个原因,其重要的是得到正确的codePAGE年初成立,做一些形式的处理,然后就发送一个响应可能会导致意想不到的结果前,后设置codePAGE。

ASP takes the view that the codepage of posted form fields would be the same as the codepage of the response its about to send. Take a moment to absorb that.... This means that quite counter intuatively the Response.CodePage value has an impact on the strings returned by Request.Form. For this reason its important to get the correct codepage set early, doing some form processing and then setting the codepage later just before sending a response can lead to unexpected results.

经典的网页看起来很好,但在数据库中的数据被损坏疑难杂症

一个常见的​​问题这种行为的结果是,开发人员已设置字符集=UTF-8,但在像视窗1252左codePAGE。

One common gotcha this behaviour results in is where the developer has set CharSet="UTF-8" but left the codepage at something like "Windows-1252".

什么最终发生的是用户输入这是在UTF-8编码发送到服务器文本,但剧本code把它读成1252这个腐败字符串获取存储在数据库中。随后的网页,看这些数据,从数据库拉腐败字符串。这个字符串然后被使用的Response.Write编码1252发送,但目标页面被告知它的UTF-8。这有扭转腐败的效果,一切都看起来不错给用户。

What ends up happening is the user enters text which is sent to the server in UTF-8 encoding but the script code reads it as 1252. This corrupt string gets stored in the database. A subsequent web page looks at this data, the corrupt string it pulled from the DB. This string is then sent by response.write using 1252 encoding but the destination page is told its UTF-8. This has the effect of reversing the corruption and everything looks fine to the user.

然而,当其它组分,说一个报告生成器,生成从数据库内容则是因为它是数据出现损坏。

However when other components, say a report generator, creates content from the database then the data appears corrupt because it is.

底线

您已经做了正确的事情,得到集及codePAGE早期的设置保持一致。在其他文件可能不会被保存为UTF-8,您将有问题,如果在他们非ASCII内容;否则,您就可以了。

You are already doing the correct thing, get that CharSet and CodePage set early and consistently. Where other files may not be saved as UTF-8 you will have problems if there is non-ascii content in them but otherwise you would be fine.

许多包括平均价格纯粹是code,没有内容,并自认为code应该是纯粹的ASCII码及其编码并不重要。

Many include asps are purely code with no content and since that code ought to be purely in ascii its encoding doesn't really matter.

这篇关于内部字符串编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆