将整数转换为UTF-8(韩文) [英] Converting integers to UTF-8 (Korean)

查看：337 发布时间：2020/7/13 5:06:18 ruby utf-8

本文介绍了将整数转换为UTF-8(韩文)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在运行Ruby 1.9.2，并尝试修复一些损坏的UTF-8文本输入，其中文本实际上是"\\354\\203\\201\\355\\221\\234\\353\\252\\205"，并将其更改为正确的韩文"상표명"

I'm running Ruby 1.9.2 and trying to fix some broken UTF-8 text input where the text is literally "\\354\\203\\201\\355\\221\\234\\353\\252\\205" and change it into its correct Korean "상표명"

但是，搜索了一段时间并尝试了几种方法后，我仍然感到胡言乱语. 这很令人困惑，因为第3行上的转义字符示例可以正常工作

However after searching for a while and trying a few methods I still get out gibberish. It's confusing as the escaped characters example on line 3 works fine

# encoding: utf-8
puts "상표명" # Target string
# Output: "상표명"

puts "\354\203\201\355\221\234\353\252\205" # Works with escaped characters like this
# Output: "상표명"

# Real input is a string
input = "\\354\\203\\201\\355\\221\\234\\353\\252\\205"

# After some manipulation got it into an array of numbers
puts [354, 203,201,355,221,234,353,252,205].pack('U*').force_encoding('UTF-8')
# Output: ŢËÉţÝêšüÍ (gibberish)

我确定必须在某个地方回答过这个问题，但是我没有找到它.

I'm sure this must have been answered somewhere but I haven't managed to find it.

推荐答案

这是您要获取UTF-8韩文的步骤:

This is what you want to do to get your UTF-8 Korean text:

s = "\\354\\203\\201\\355\\221\\234\\353\\252\\205"
k = s.scan(/\d+/).map { |n| n.to_i(8) }.pack("C*").force_encoding('utf-8')
# "상표명"

这是它的工作方式:

输入的字符串很好而且很常规，因此我们可以使用 scan 退出个人号码.
然后使用 pack('C*') 以获取字节字符串.此字符串将具有BINARY编码(又称为ASCII-8BIT).
我们碰巧知道字节确实代表了UTF-8，因此我们可以通过 force_encoding('utf-8') .

The input string is nice and regular so we can use scan to pull out the individual number.
Then a map with to_i(8) to convert the octal values (as noted by Henning Makholm) to integers.
Now we need to convert our list of integers to bytes so we pack('C*') to get a byte string. This string will have the BINARY encoding (AKA ASCII-8BIT).
We happen to know that the bytes really do represent UTF-8 so we can force the issue with force_encoding('utf-8').

您缺少的主要内容是pack格式； 'U'的意思是"UTF-8字符"，并且期望一个Unicode代码点的数组，每个Unicode代码点都由一个整数表示，'C'期望的是字节数组，这就是我们所拥有的.

The main thing that you were missing was your pack format; 'U' means "UTF-8 character" and would expect an array of Unicode codepoints each represented by a single integer, 'C' expects an array of bytes and that's what we had.

这篇关于将整数转换为UTF-8(韩文)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将整数转换为UTF-8(韩文) [英] Converting integers to UTF-8 (Korean)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将整数转换为UTF-8(韩文) [英] Converting integers to UTF-8 (Korean)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭