如何在Perl中将gbk编码转换为utf-8编码 [英] how to convert from gbk encoding to utf-8 encoding in Perl

查看:206
本文介绍了如何在Perl中将gbk编码转换为utf-8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的问题,我不知道如何在Perl中解决。我知道如何从utf-8转换为GBK,例如从 e4b8ad 转换为 d6d0 。但我不知道如何向后退,即给定 d6d0 ,我如何知道 e4b8ad



请启发我!非常感谢。

解决方案

当您有十六进制数字时, pack 是你的朋友。以下是REPL会话。注意:




  • 要反转方向,将十六进制数字打包成八位字节,从GB字节解码为字符串,将字符串编码为UTF- 8个八位字节,将十进制字节解压缩成十六进制数字。

  • GBK被取代。使用GB18030(由Perl提供的 Encode :: HanExtra )已被强制 strong>五年。






  $ use编码qw(解码编码);使用Encode :: HanExtra;使用Devel :: Peek qw(Dump); 

$'e4b8ad'
e4b8ad#hex digits

$ pack('H *','e4b8ad')


$ Dump(pack('H *','e4b8ad'))
SV = PV(0x3657680)at 0x36b7188
REFCNT = 1
FLAGS =(PADTMP,POK,pPOK)
PV = 0x36c0768\344\270\255\0#UTF-8编码数据八位字节
CUR = 3
LEN = 8

$ decode('UTF-8',pack('H *','e4b8ad'))


$转储(解码('UTF-8',pack *','e4b8ad')))
SV = PV(0x326c3a0)at 0x36a50c8
REFCNT = 1
FLAGS =(TEMP,POK,pPOK,UTF8)
PV = 0x3698a48 \344\270\255\0 [UTF8\x {4e2d}]#字符串
CUR = 3
LEN = 8

$ encode('GB18030',decode('UTF-8',pack('H *','e4b8ad')))
\xd6\xd0

$转储(编码('GB18030',解码('UTF-8',包('H *','e4b8ad'))))
SV = PV(0x36a2da0) x36b6d98
REFCNT = 1
FLAGS =(TEMP,POK,pPOK)
PV = 0x36db3e8\326\320\0 GB18030编码数据的八位字节
CUR = 2
LEN = 8

$ unpack('H *',encode('GB18030',decode('UTF-8',pack('H *','e4b8ad') )))
d6d0#十六进制数


I have a simple question which I do not know how to solve in Perl. I know how to convert from utf-8 to GBK, for example, from e4b8ad to d6d0. But I am not sure how to go backward, i.e. given d6d0, how do I know e4b8ad.

Please enlighten me! Many thanks.

解决方案

When you have hex digits, pack is your friend. Following is a REPL session. Notes:

  • To reverse the direction, pack the hex digits into octets, decode from GB octets to character string, encode character string to UTF-8 octets, unpack octets into hex digits.
  • GBK is superseded. Use of GB18030 (provided by Encode::HanExtra in Perl) has been mandatory for five years already.

$ use Encode qw(decode encode); use Encode::HanExtra; use Devel::Peek qw(Dump);

$ 'e4b8ad'
e4b8ad                                  # hex digits

$ pack('H*', 'e4b8ad')
中

$ Dump(pack('H*', 'e4b8ad'))
SV = PV(0x3657680) at 0x36b7188
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK)
  PV = 0x36c0768 "\344\270\255"\0           # octets of UTF-8 encoded data
  CUR = 3
  LEN = 8

$ decode('UTF-8', pack('H*', 'e4b8ad'))
中

$ Dump(decode('UTF-8', pack('H*', 'e4b8ad')))
SV = PV(0x326c3a0) at 0x36a50c8
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK,UTF8)
  PV = 0x3698a48 "\344\270\255"\0 [UTF8 "\x{4e2d}"]     # character string
  CUR = 3
  LEN = 8

$ encode('GB18030', decode('UTF-8', pack('H*', 'e4b8ad')))
"\xd6\xd0"

$ Dump(encode('GB18030', decode('UTF-8', pack('H*', 'e4b8ad'))))
SV = PV(0x36a2da0) at 0x36b6d98
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK)
  PV = 0x36db3e8 "\326\320"\0               # octets of GB18030 encoded data
  CUR = 2
  LEN = 8

$ unpack('H*', encode('GB18030', decode('UTF-8', pack('H*', 'e4b8ad'))))
d6d0                            # hex digits

这篇关于如何在Perl中将gbk编码转换为utf-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆