perl:将字符串转换为utf-8进行json解码 [英] perl: convert a string to utf-8 for json decode

查看:330
本文介绍了perl:将字符串转换为utf-8进行json解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在爬网网站并从其JSON收集信息.结果保存在哈希中.但是有些页面给我"JSON字符串中的UTF-8格式错误的字符"错误.我注意到"cafe"中的最后一个字母将产生错误.我认为这是由于字符类型的混合.因此,现在我正在寻找一种将所有类型的字符转换为utf-8的方法(希望有一种完美的方法).我尝试了utf8 :: all,但它根本不起作用(也许我没有正确执行).我是菜鸟请帮助,谢谢.

I'm crawling a website and collecting information from its JSON. The results are saved in a hash. But some of the pages give me "malformed UTF-8 character in JSON string" error. I notice that the the last letter in "cafe" will produce error. I think it is because of the mix of character types. So now I'm looking for a way to convert all types of character to utf-8 (hope there is a way perfect like that). I tried utf8::all, it just doesn't work (maybe I didn't do it right). I'm a noob. Please help, thanks.

UPDATA

好吧,在我阅读文章"

Well, after I read the article "Know the difference between character strings and UTF-8 strings" Posted by brian d foy. I solve the problem with the codes:

use utf8;
use Encode qw(encode_utf8);
use JSON;


my $json_data = qq( { "cat" : "Büster" } );
$json_data = encode_utf8( $json_data );

my $perl_hash = decode_json( $json_data );

希望这对其他人有所帮助.

Hope this help some one else.

推荐答案

decode_json期望JSON已使用UTF-8编码.

decode_json expects the JSON to have been encoded using UTF-8.

虽然您的源文件是使用UTF-8编码的,但您还是可以使用use utf8;(需要)对Perl进行解码.这意味着您的字符串包含Unicode字符,而不是代表这些字符的UTF-8字节.

While your source file is encoded using UTF-8, you have Perl decode it by using use utf8; (as you should). This means your string contains Unicode characters, not the UTF-8 bytes that represent those characters.

如所示,可以在将字符串传递给decode_json之前对其进行编码.

As you've shown, you could encode the string before passing it to decode_json.

use utf8;
use Encode qw( encode_utf8 );
use JSON   qw( decode_json );

my $data_json = qq( { "cat" : "Büster" } );
my $data = decode_json(encode_utf8($data_json));

但是您可以简单地告诉JSON字符串已被解码.

But you could simply tell JSON that the string is already decoded.

use utf8;
use JSON qw( );

my $data_json = qq( { "cat" : "Büster" } );
my $data = JSON->new->utf8(0)->decode($data_json);

这篇关于perl:将字符串转换为utf-8进行json解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆