替换Unicode字符 [英] Replace unicode character

查看:226
本文介绍了替换Unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用另一个字符替换字符串中的某个字符.它们是相当晦涩的拉丁字符.我想将字符(十六进制)259替换为4d9,所以我尝试了以下方法:

I am trying to replace a certain character in a string with another. They are quite obscure latin characters. I want to replace character (hex) 259 with 4d9, so I tried this:

str_replace("\x02\x59","\x04\xd9",$string);

这不起作用.我该怎么办?

This didn't work. How do I go about this?

**其他信息.

感谢bobince,已经成功了.虽然,我也想替换大写的schwa,但由于某种原因它不起作用.我将U + 018F(Ə)计算为UTF-8 0xC68F,并将其替换为U + 04D8(0xD398):

Thanks bobince, that has done the trick. Although, I want to replace the uppercase schwa also and it is not working for some reason. I calculated U+018F (Ə) as UTF-8 0xC68F and this is to be replaced with U+04D8 (0xD398):

$string = str_replace("\xC9\x99", "\xD3\x99", $_POST['string_with_schwa']); //lc 259->4d9
$string = str_replace( "\xC6\8F", "\xD3\x98" , $string); //uc 18f->4d8

我正在将Ə"复制到文本框中并发布.第一个str_replace可以在小写字母上正常工作,但是在第二个str_replace中不能检测到大写字母,这很奇怪.它仍为U + 018F.猜猜我可以通过strtolower运行字符串,但这应该可以工作.

I am copying the 'Ə' into a textbox and posting it. The first str_replace works fine on the lowercase, but does not detect the uppercase in the second str_replace, strange. It remains as U+018F. Guess I could run the string through strtolower but this should work though.

推荐答案

U + 0259拉丁文小写字母Schwa仅在UTF-16BE编码中编码为字节序列0x02,0x59.您不太可能使用UTF-16BE编码的字节字符串,因为它不是与ASCII兼容的编码,并且几乎没有人使用它.

U+0259 Latin Small Letter Schwa is only encoded as the byte sequence 0x02,0x59 in the UTF-16BE encoding. It is very unlikely you will be working with byte strings in the UTF-16BE encoding as it's not an ASCII-compatible encoding and almost no-one uses it.

您要使用的编码(唯一同时支持Latin Schwa和Cyrillic Sc​​hwa的ASCII超集编码,因为它支持所有Unicode字符)是

The encoding you want to be working with (the only ASCII-superset encoding to support both Latin Schwa and Cyrillic Schwa, as it supports all Unicode characters) is UTF-8. Ensure your input is in UTF-8 format (if it is coming from form data, serve the page containing the form as UTF-8). Then, in UTF-8, the character U+0259 is represented using the byte sequence 0xC9,0x99.

str_replace("\xC9\x99", "\xD3\x99", $string);

如果确保在文本编辑器中将.php文件另存为UTF-8-no-BOM,则可以跳过转义并直接说:

If you make sure to save your .php file as UTF-8-no-BOM in the text editor, you can skip the escaping and just directly say:

str_replace('ə', 'ә', $string);

这篇关于替换Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆