如何删除特殊字符? [英] How can I delete special characters?

查看:38
本文介绍了如何删除特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在练习使用 Ruby 和正则表达式来删除某些不需要的字符.例如:

I'm practicing with Ruby and regex to delete certain unwanted characters. For example:

input = input.gsub(/<\/?[^>]*>/, '')

对于特殊字符,例如 ☻ 或 ™:

and for special characters, example ☻ or ™:

input = input.gsub('&#', '')

这里只剩下数字了,好吧.但这仅在用户输入特殊字符作为代码时才有效,如下所示:

This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:

&#153;

我的问题:如果用户输入没有代码的特殊字符,我如何删除特殊字符,如下所示:

My question: How I can delete special characters if the user enters a special character without code, like this:

™ ☻

推荐答案

首先,我认为定义什么构成正确输入"并删除其他所有内容可能更容易.例如:

First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:

input = input.gsub(/[^0-9A-Za-z]/, '')

如果这不是您想要的(您想支持非拉丁字母等),那么我认为您应该列出要删除的字形(如 ™ 或 ☻),并将它们删除一个-一个,因为很难通过编程方式区分中文、阿拉伯语等字符和象形文字.

If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.

最后,您可能希望通过转换为 HTML 转义序列或从 HTML 转义序列来规范化您的输入.

Finally, you might want to normalize your input by converting to or from HTML escape sequences.

这篇关于如何删除特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆