从 Ruby 中的字符串中删除非 UTF 字符? [英] Delete non-UTF characters from a string in Ruby?

查看:50
本文介绍了从 Ruby 中的字符串中删除非 UTF 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从 ruby​​ 字符串中删除非 UTF8 字符?我有一个字符串,其中包含例如xC2".我想从字符串中删除该字符,使其成为有效的 UTF8.

How do I delete non-UTF8 characters from a ruby string? I have a string that has for example "xC2" in it. I want to remove that char from the string so that it becomes a valid UTF8.

这个:

text.gsub!(/\xC2/, '')

返回错误:

incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)

我也在看 text.unpack('U*') 和 string.pack,但没有找到任何地方.

I was looking at text.unpack('U*') and string.pack as well, but did not get anywhere.

推荐答案

您可以为此使用 encode.text.encode('UTF-8', :invalid => :replace, :undef => :replace)

You can use encode for that. text.encode('UTF-8', :invalid => :replace, :undef => :replace)

有关更多信息,请查看 Ruby-Docs

For more info look into Ruby-Docs

这篇关于从 Ruby 中的字符串中删除非 UTF 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆