为什么当以HTML形式提交时,引号变成有趣的字符? [英] Why do quotes turn into funny characters when submitted in an HTML form?

查看:182
本文介绍了为什么当以HTML形式提交时,引号变成有趣的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个HTML表单,有些用户正在从MS Word复制/粘贴文本。当有单引号或双引号时,它们会被翻译成有趣的字符,如:

I have an HTML form, and some users are copy/pasting text from MS Word. When there are single quotes or double quotes, they get translated into funny characters like:

'€™和’™

'€™ and ’

数据库列是排序规则utf8_general_ci。

The database column is collation utf8_general_ci.

如何获得相应的字符?

修改:
问题解决。以下是我如何修正它:

Problem solved. Here's how I fixed it:

Ran mysql_query(SET NAMES'utf8'); 从数据库。 (感谢Donal的下面的评论)。

Ran mysql_query("SET NAMES 'utf8'"); before adding/retreiving from the database. (thanks to Donal's comment below).

有点奇怪,php函数 urlencode($ text)

And somewhat odd, the php function urlencode($text) was applied when displaying, so that had to be removed.

我还确保页面的头和ajax请求/响应都是utf8。

I also made sure that the headers for the page and the ajax request/response were all utf8.

推荐答案

这看起来像unicode(UTF-8最有可能)字符被解释为iso-8859-1的经典案例。沿途有几个地方,人物可能被毁坏。首先,客户端的浏览器必须发送数据。如果无法将字符正确转换为页面的字符编码,则可能会损坏数据。然后服务器读取数据并将字节解码为字符。如果客户端和服务器不同意使用的编码,那么字符将被破坏。然后将数据存储在数据库中;再次有腐败的潜力。最后,当数据写在页面上(用于显示到浏览器)时,如果页面不能充分显示其编码,浏览器可能会误解字节。

This looks like a classic case of unicode (UTF-8 most likely) characters being interpreted as iso-8859-1. There are a couple places along the way where the characters can get corrupted. First, the client's browser has to send the data. It might corrupt the data if it can't convert the characters properly to the page's character encoding. Then the server reads the data and decodes the bytes into characters. If the client and server disagree about the encoding used then the characters will be corrupted. Then the data is stored in the database; again there is potential for corruption. Finally, when the data is written on the page (for display to the browser) the browser may misinterpret the bytes if the page doesn't adequately indicate it's encoding.

需要确保您在使用UTF-8。网页的默认值为iso-8859-1,因此您的网页应该使用Content-Type标头或元标记。

You need to ensure that you are using UTF-8 throughout. The default for web pages is iso-8859-1, so your web pages should be served with the Content-Type header or the meta tag

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

(确保您确实以该编码提供文字)。

(make sure you really are serving the text in that encoding).

通过在过程的所有部分使用UTF-8,您将避免所有工作的Web浏览器和数据库出现问题。

By using UTF-8 along all parts of the process you will avoid problems with all working web browsers and databases.

这篇关于为什么当以HTML形式提交时,引号变成有趣的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆