使用散列键使用XML :: LibXML编写xml文件时,会出现编码错误 [英] Getting encoding error when using hash keys to write xml files with XML::LibXML

查看:190
本文介绍了使用散列键使用XML :: LibXML编写xml文件时,会出现编码错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与这个问题有关:
当我取消注释#utf8: :$($ name); 行或注释掉 $ hash {'müller'} ='洋红色'; p>

 #!/ usr / bin / env perl 
使用警告;
使用5.014;
使用utf8;
binmode STDOUT,':encoding(utf-8)';
使用XML :: LibXML;

#从文件中读入的哈希值:
#...
我的%hash =('müller'=>'绿色','schneider'=>'蓝色','bäcker'=>'红色');
#...

#更改或添加东西
$ hash {'müller'} ='洋红色'

#写入Hash到xml文件
我的$ doc = XML :: LibXML :: Document-> new('1.0','UTF-8');
my $ root = $ doc-> createElement('my_test');

为我的$名称(键%hash){
#utf8 :: upgrade($ name);
我的$ tag = $ doc-> createElement('item');
$ tag-> setAttribute('name'=> $ name);
我的$ tag_color = $ doc-> createElement('color');
$ tag_color-> appendTextNode($ hash {$ name});
$ tag-> appendChild($ tag_color);
$ root-> appendChild($ tag);
}
$ doc-> setDocumentElement($ root);
说$ doc-> serialize(1);
$ doc-> toFile('my_test.xml',1);






输出:

 错误:字符串不在UTF-8 
编码错误:由于转换错误,输出转换失败,字节0xFC 0x6C 0x6C 0x65
I / O错误:编码器错误
<?xml version =1.0encoding =ISO-8859-1?>
< my_test>
< item name =m
i18n错误:由于转换错误,输出转换失败,字节0xFC 0x6C 0x6C 0x65
I / O错误:编码器错误


解决方案

根据XML :: LibXML,是否'müller'eq' müller'是true或false取决于字符串在内部的存储方式,这是一个错误,具体来说,将含义分配给UTF8标志被称为Unicode Bug,XML :: LibXML是记录在此页面的编码支持部分中。 / p>

错误是已知,但是由于向后兼容性原因,它不能被干净地修复。Perl提供了两个工具来解决Unicode Bug的实例:

  utf8 :: upgrade($ sv);#切换到UTF8 = 1存储格式
utf8 :: downgrade( $ sv);#切换到UTF8 = 0存储格式

前者将成为适当的工具在这里使用。

  sub _up {my($ s)= @_; UTF8 :: ugprade($ S); $ s} 
$ tag_color-> appendTextNode(_up $ hash {$ name});

注意:您可以使用 utf8 :: upgrade 即使你不做使用utf8; 。如果您的源代码是UTF-8,则只能使用使用utf8;


This question is related to this question: Hash keys encoding: Why do I get here with Devel::Peek::Dump two different results?
When I uncomment the # utf8::upgrade( $name ); line or comment out the $hash{'müller'} = 'magenta'; line it works.

#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use XML::LibXML;

# Hash read in from a file:
# ... 
my %hash = ( 'müller' => 'green', 'schneider' => 'blue', 'bäcker' => 'red' );
# ...

# change or add something
$hash{'müller'} = 'magenta';

# writing Hash to xml file
my $doc = XML::LibXML::Document->new('1.0', 'UTF-8' );
my $root = $doc->createElement( 'my_test' );

for my $name ( keys %hash ) {
    # utf8::upgrade( $name );
    my $tag = $doc->createElement( 'item' );
    $tag->setAttribute( 'name' => $name );
    my $tag_color = $doc->createElement( 'color' );
    $tag_color->appendTextNode( $hash{$name} );
    $tag->appendChild( $tag_color );
    $root->appendChild( $tag );
}
$doc->setDocumentElement($root);
say $doc->serialize( 1 );
$doc->toFile( 'my_test.xml', 1 );


Output:

error : string is not in UTF-8  
encoding error : output conversion failed due to conv error, bytes 0xFC 0x6C 0x6C 0x65  
I/O error : encoder error  
<?xml version="1.0" encoding="ISO-8859-1"?>  
<my_test>  
  <item name="m    
i18n error : output conversion failed due to conv error, bytes 0xFC 0x6C 0x6C 0x65
I/O error : encoder error

解决方案

According to XML::LibXML, whether 'müller' eq 'müller' is true or false depends on how the strings have been stored internally. That's a bug. Specifically, assigning meaning to the UTF8 flag is known as "The Unicode Bug", and XML::LibXML is documented to do exactly that in the "encodings support" section of this page.

The bug is known, but it can't be fixed cleanly for backwards compatibility reasons. Perl provides two tools to work around instances of The Unicode Bug:

utf8::upgrade( $sv );    # Switch to the UTF8=1 storage format
utf8::downgrade( $sv );  # Switch to the UTF8=0 storage format

The former would be be the appropriate tool to use here.

sub _up { my ($s) = @_; utf8::ugprade($s); $s }
$tag_color->appendTextNode( _up $hash{$name} );

Note: You can use utf8::upgrade even if you don't do use utf8;. Only use use utf8; if your source code is UTF-8.

这篇关于使用散列键使用XML :: LibXML编写xml文件时,会出现编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆