HTML页面上的奇怪字符 [英] Weird charactors on HTML page

查看:89
本文介绍了HTML页面上的奇怪字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Last.fm API来获取艺术家的一些信息。我将信息保存在DB中,然后显示在我的网页上。
但是像(双引号)之类的字符显示为“。

i am using Last.fm API to fetch some info of artists .I save info in DB and then display on my webpage. But characters like " (double quote) are shown as “ .

示例艺术家信息 http://www.last.fm/music/David+Penn

我得到了第一行,来自西班牙马德里的制片人,编曲家,dj和音乐家。他有自己的唱片公司 Zen Records,以及。

and i got the first line as " Producer, arranger, dj and musician from Madrid-Spain. He has his own record company “Zen Recordsâ€, and ".

我的Db是UTF-8,但我不知道为什么这个错误仍然会出现。

Mine Db is UTF-8 but i dunno why this error is still coming .

推荐答案

您应该一直使用UTF-8通过。检查:

You should be using UTF-8 all the way through. Check that:


  1. 您与数据库的连接为UTF-8(使用 mysql_set_charset );

您正在输出的页面被标记为UTF-8(< meta http-equiv = Content-Type content = text / html; charset = utf-8> );

the pages you're outputting are marked as UTF-8 (<meta http-equiv="Content-Type" content="text/html;charset=utf-8">);

当您输出字符串时从数据库中,您可以使用 htmlspecialchars()而不是 htmlentities()对它们进行HTML编码。

when you output strings from the database, you HTML-encode them using htmlspecialchars() and not htmlentities().

htmlentities HTML编码所有非ASCII字符,默认情况下假定您在ISO-8859-1中传递字节。因此,如果将其传递为 编码为UTF-8(字节0xE2、0x80、0x9C),则会得到& acirc;# 128;&#156; ,而不是预期的&#8220; 。可以通过传入 utf-8 作为可选的 $ charset 参数来解决此问题。

htmlentities HTML-encodes all non-ASCII characters, and by default assumes you are passing it bytes in ISO-8859-1. So if you pass it " encoded as UTF-8 (bytes 0xE2, 0x80, 0x9C), you'd get &acirc;&#128;&#156;, instead of the expected &ldquo; or &#8220;. This can be fixed by passing in utf-8 as the optional $charset argument.

但是,通常更容易使用 htmlspecialchars(),因为这将非ASCII字符保留为原始字节,而不是HTML实体引用。这样会导致页面输出较小,因此,只要您确定所生成的HTML会保留其字符集信息(通常可以依赖,除非在邮件中发送HTML片段之类的上下文中),否则它是首选。 )。

However it's usually easier to just use htmlspecialchars() instead, as this leaves non-ASCII characters alone, as raw bytes instead of HTML entity references. This results in a smaller page output, so is preferable as long as you're sure the HTML you're producing will keep its charset information (which you can usually rely on, except in context like sending snippets of HTML in a mail or something).

htmlspecialchars()确实具有可选的 $ charset 参数也很重要,但是将其设置为 utf-8 也不重要,因为这不会导致默认ISO-8859-1字符集的行为发生变化。如果您要使用Shift-JIS之类的老式多字节编码来生成输出,则不必担心正确设置此参数,但是今天这种情况已经很少见了,因为大多数理智的人都优先使用UTF-8。

htmlspecialchars() does have an optional $charset argument too, but setting it to utf-8 is not critical since that results in no change of behaviour over the default ISO-8859-1 charset. If you are producing output in old-school multibyte encodings like Shift-JIS you do have to worry about setting this argument correctly, but today that's quite rare as most sane people use UTF-8 in preference.

这篇关于HTML页面上的奇怪字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆