删除utf-8字符串中的控制字符 [英] removing control characters in utf-8 string

查看：155 发布时间：2017/8/16 21:15:35 javascript php encoding utf-8

本文介绍了删除utf-8字符串中的控制字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我在客户端（输入后）删除控制字符（tab，cr，lf，\v和所有其他不可见的字符），但是由于客户端不能被信任，所以我必须在服务器中删除它们

So I am removing control characters (tab, cr, lf, \v and all other invisible chars) in the client side (after input) but since the client cannot be trusted, I have to remove them in the server too.

所以根据这个链接 http：//www.utf8-chartable。 de /

控制字符从x00到1F，从7F到9F。
因此我的客户端（javascript）控件的char去除功能是：

the control characters are from x00 to 1F and from 7F to 9F. thus my client (javascript) control char removal function is:

return s.replace(/[\x00-\x1F\x7F-\x9F]/g, "");

我的php（服务器）控件的字符删除功能是：

and my php (server) control char removal function is:

$s = preg_replace('/[\x00-\x1F\x7F-\x9F]/', '', $s);

现在，这似乎在PHP中创建了国际utf8字符（如ζ（xCF x82））的问题（因为x82是在第二个序列组内），javascript等价物不会产生任何问题。

Now this seems to create problems with international utf8 chars such as ς (xCF x82) in PHP only (because x82 is inside the second sequence group), the javascript equivalent does not create any problems.

现在我的问题是，我应该从7F到9F中删除控制字符？对于我的理解，从127到159（7F到9F）的序列显然可以是有效的UTF-8字符串的一部分？

Now my question is, should I remove the control characters from 7F to 9F? To my understanding those the sequences from 127 to 159 (7F to 9F) obviously can be part of a valid UTF-8 string?

也可能我不应该过滤00到31控制字符，因为这些字符中的一些可能会出现在一些奇怪的（japanese？chinese？）但是有效的utf-8字符？

also, maybe I shouldn't even filter the 00 to 31 control characters because also some of those characters can appear in some weird (japanese? chinese?) but valid utf-8 characters ?

删除utf-8字符串中的控制字符 [英] removing control characters in utf-8 string

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

删除utf-8字符串中的控制字符 [英] removing control characters in utf-8 string

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭