简单的高ascii字符编码 [英] Simple high-ascii character encoding

查看:81
本文介绍了简单的高ascii字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我有一个Html文档声明它使用了utf-8字符

set。由于这个文件可以通过网络界面编辑,我需要确保

肯定比可能意外输入的高ascii字符

在文档提供时正确表示。我的编程

语言允许我获得任何单个角色的ascii值

所以当保存更改时我正在做的是查看每个角色
内容中的
以及字符的ascii值> 127然后我

用''& #AsciiValue;''替换''character''。


我对字符集不太满意文件编码

机制所以我想知道,这是一个明智的想法吗?


TIA


Chandy

解决方案

ch **** @otalise.co.uk 写道:

我有一个Html文档声明它使用utf-8字符
设置。


这样做是否正确?证明它,向我们显示URL! :-)

由于这个文件可以通过网络界面编辑,我需要确保
肯定比可能意外输入的高ascii字符
正确表示文件是提供服务。


没有高ascii字符。 Ascii在127停止,一直停止,并且将永远停止。


如果您的文档经过UTF-8编码,那么表格数据将通过
页面上的
表格也会显示为UTF-8编码,但自然它将按照表格数据编码的规定_also_进行编码。

我的编程语言允许我获得任何单个角色的ascii值
所以当保存更改时我正在做的是查看内容中的每个角色
如果字符的ascii值> 127然后我用''& #AsciiValue;''替换''character''。




为什么你会这样做,因为没有Ascii值

大于127以及你的表单数据处理程序以UTF-8编码获取数据

的事实?当页面本身是UTF-8编码时,用

字符引用替换它会有什么意义呢?


On Thu,25 2005年8月 ch****@totalise.co.uk

标题下写道:

简单的高ascii字符编码


嗯。在HTML上下文中应该是什么意思?

我有一个Html文档,声明它使用了utf-8字符集。


术语再次! utf-8不是字符集,而是unicode的字符

编码方案。我不能帮助它,回过头来,MIME选择

属性名称为charset =为此,当前术语

非常具有误导性,但utf-8仍然不是字符集。

由于此文档可通过网络编辑界面我需要确保比可能意外输入的高ascii字符更确定


我想你会从摆脱这个过时的术语中受益

" high-ascii"。 ASCII是一个7位代码,只包含95个可显示的
字符,而HTML的文档字符集是Unicode,

包含比ASCII更多的字符。 />

现代操作系统经常为这些

非ASCII字符的大范围定义输入方法......

在文档中正确表示送达。


详细信息取决于您的操作系统和编辑应用程序,但是现代操作系统不需要记住存储utf-8,并将其提供给它们。

我的编程语言允许我得到任何
个别字符的ascii值


但是大多数字符都不是ASCII,所以他们怎么能有一个ascii值? HTML中的字符表示并不难,但是如果你想让

感觉,你必须小心使用这些条款。
>
所以当保存更改时我正在做的是查看内容中的每个字符
以及字符的ascii值> 127


没有ASCII字符,其值大于127!

然后我将''character''替换为''& #AsciiValue;'' 。




*是*没有大于127的ASCII值。


将非ASCII字符表示为& #number ; ,使用他们的角色
Unicode中的
数字,是一种可行的方法 - 但是如果你的价格很大,那么
就有很多。


我有一份经过同行评审的清单:我会建议你按照惯例工作,并选择一个

似乎符合你的需求。

http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist

希望这有点帮助。


是的,显然我对那些比这更好理解的人没有意义。

我这样做:)好吧,语言返回标准的整数值为

以及''扩展''ascii字符(详见
http://www.asciitable.com/)。我的文件不公开,但是开始




<!DOCTYPE html PUBLIC" - // W3C // DTD XHTML 1.0 Strict // EN" ;

" http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

< html lang =" en">

< head>

< meta http-equiv =" Content-Type"含量=" text / html的;字符集= UTF-8英寸/>


系统正在将英文内容发布到网上,但对于全球消费而言,这可能是b $ b。一般来说,我必须代表的额外字符

将是& reg;,& copy;和贸易;并且

一些带重音的字母,但是我想通过改变
的字符来避免必须查找ascii值 - > Html实体的


& #Value;当它看起来有一个价值,使其符合

标准ascii范围。当我阅读你提到的文件时,我会再问一个问题''这是明智的''




谢谢!


Chandy


Hi,

I have an Html document that declares that it uses the utf-8 character
set. As this document is editable via a web interface I need to make
sure than high-ascii characters that may be accidentally entered are
properly represented when the document is served. My programming
language allows me to get the ascii value for any individual character
so what I am doing when a change is saved is to look at each character
in the content and if the ascii value for a character > 127 then I
replace ''character'' with ''&#AsciiValue;''.

I am not very well up on character sets and document encoding
mechanisms so I would like to know, is this a sensible idea?

TIA

Chandy

解决方案

ch****@totalise.co.uk wrote:

I have an Html document that declares that it uses the utf-8 character
set.
Does it do that properly? Prove it, show us the URL! :-)
As this document is editable via a web interface I need to make
sure than high-ascii characters that may be accidentally entered are
properly represented when the document is served.
There are no high-ascii characters. Ascii stops at 127, has always
stopped, and will always stop.

If your document is adequately UTF-8 encoded, then form data sent via a
form on the page will appear as UTF-8 encoded, too, though naturally it
will _also_ be encoded as specified for form data encoding in general.
My programming
language allows me to get the ascii value for any individual character
so what I am doing when a change is saved is to look at each character
in the content and if the ascii value for a character > 127 then I
replace ''character'' with ''&#AsciiValue;''.



Why would you do that, given the fact that there are no Ascii values
greater than 127 and the fact that your form data handler gets the data
in UTF-8 encoding? What would be the point in replacing it by a
character reference, when the page itself is UTF-8 encoded?


On Thu, 25 Aug 2005 ch****@totalise.co.uk wrote under the
heading:

Simple high-ascii character encoding
Hmmm. What''s that supposed to mean in an HTML context?
I have an Html document that declares that it uses the utf-8 character
set.
Terminology again! utf-8 is not a "character set", but a character
encoding scheme of unicode. I can''t help it that, way back, MIME chose
the attribute name of "charset=" for this, which in current terminology
is very misleading, but utf-8 still isn''t a "character set".
As this document is editable via a web interface I need to make
sure than high-ascii characters that may be accidentally entered
I think you''d benefit from getting rid of this obsolete term
"high-ascii". ASCII is a 7-bit code, containing a mere 95 displayable
characters, whereas the document character set of HTML is Unicode,
containing vastly more characters than ASCII.

Modern OSes often define input methods for wide ranges of these
non-ASCII characters...
are properly represented when the document is served.
Details depend on your OS and editing application, but modern OSes don''t
mind storing utf-8, and serving them out as such.
My programming language allows me to get the ascii value for any
individual character
But most of the characters aren''t in ASCII, so how could they have
an "ascii value"? Character representation in HTML isn''t hard, but
you *do* have to use the terms with some care, if you want to make
sense.
so what I am doing when a change is saved is to look at each character
in the content and if the ascii value for a character > 127
There ARE no ASCII characters with a value above 127 !
then I replace ''character'' with ''&#AsciiValue;''.



There *are* no ASCII values greater than 127.

Representing non-ASCII characters as &#number; , using their character
number in Unicode, is a feasible approach - but rather voluminous if you
have many of them.

I have a checklist that''s been quite widely peer-reviewed: I''d
recommend that you work your way down the scenarios, and pick one that
seems to fit your needs.

http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist

Hope this helps a bit.


Yep, clearly I make no sense to people who understand this better than
I do :) Okay, the langauge returns integer values for the standard as
well as ''extended'' ascii characters (as detailed, for example, on
http://www.asciitable.com/). My document is not public but starts
with:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

The system is publishing content in english to the web but is
poentially for world-wide consumption. Generally the extra characters
I have to represent will be items like &reg;, &copy; and &trade; and
some accented letters, but I was wanting to avoid having to have a
lookup of ascii value->Html Entity by just changing the character for
&#Value; when it seemed to have a value that put it outwith the
standard ascii range. I''ll re-ask the question ''is this sensible''
while I read through the document you referred to.

Thanks!

Chandy


这篇关于简单的高ascii字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆