删除 unicode 项目符号字符 [英] Removing unicode bullet character

查看:67
本文介绍了删除 unicode 项目符号字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个我认为与 unicode 文本有关的问题.当用户输入具有 unicode 项目符号字符的字符串时,mysql 无法保存该字段(尽管更新查询的其余部分有效).这是我一直在努力解决的问题.

I'm having an issue that i believe is related to unicode text. When the user enters a string that has the unicode bullet character, mysql is not able to save that field (the rest of the update query works though). Here's how i've been trying to deal with it.

$str = "·关闭服务器";

$str = preg_replace("\u2022", "•", $str);

...但是这仍然不起作用.

...however this is still not working.

推荐答案

这里有很多事情可能会出错,因为数据库、表单提交和源代码字符串文字都涉及到.我假设您想使用 UTF-8,因为对于任何其他典型的编码(CP1252、Latin1),当您想使用 json_ 或接受超过 200 个不同的字符时,您会被搞砸.

So many things can go wrong here, because database, form submits and source code string literals are all involved. I'll assume you want to use UTF-8, because with any other typical encoding (CP1252, Latin1) you'll be screwed when you want to use json_ or accept more than ~200 different characters.

首先要做的是删除任何类型的转换等代码,这些代码是为了尝试修复编码问题而编写的.比如utf8_encodehtmlentitites*_replace..随便什么.

The first thing to do is remove any kind of conversion etc code that was written with the intention of trying to fix encoding issues. Such as utf8_encode, htmlentitites, *_replace.. whatever.

源编码.

$str = "· Close up the server";

写上面的时候,PHP源文件需要物理编码为UTF-8.如果您使用的是 Windows,则必须明确执行或配置此操作.UTF-8 不会在 Windows 上神奇地发生.

When writing the above, the PHP source file needs to be physically encoded in UTF-8. If you are on Windows, you must explicitly do or configure this. UTF-8 doesn't happen magically on Windows.

表单提交

当用户提交表单时,有效负载将采用您声明页面的任何编码.你可以这样声明:

When user submits a form, the payload will be in whatever encoding you declared the page to be. You can declare it like so:

header("Content-Type: text/html; charset=utf-8");

但实际上任何人都可以向您的服务器提交任意字节,因此您应该在继续之前验证输入是否为 UTF-8.mb_check_encoding 很好.

But anyone can actually submit arbitrary bytes to your server, so you should validate the input is in UTF-8 before proceeding. mb_check_encoding is good.

数据库

由于此时您的数据是以 UTF-8 格式输入的,因此您的输入字符串是 UTF-8 格式.您必须在连接到数据库后通过指定连接编码来指定它.

Since at this point your data is coming in as UTF-8, your input strings are in UTF-8. You must specify this after connecting to the database, by specifying a connection encoding.

mysql_set_charset("utf8"); //After making the connection, and before any queries
//or $mysqli->set_charset( "utf8");

这使数据库以 UTF-8 读取您的输入,并以 UTF-8 编码其输出.您还需要将列/表/数据库设置为 UTF-8.

This makes the database read your input in UTF-8, and encode its output in UTF-8. You would also want to set your columns/tables/databases to UTF-8 as well.

Unicode 转义序列 \uxxxx\uhhhh\ullll\Uxxxxxxxx 在 PHP 中不受支持.

Unicode escape sequences \uxxxx or \uhhhh\ullll or \Uxxxxxxxx are not supported in PHP.

这篇关于删除 unicode 项目符号字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆