如何使Mason2 UTF-8干净? [英] How to make Mason2 UTF-8 clean?

查看:122
本文介绍了如何使Mason2 UTF-8干净?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

重新构造问题,因为

  • @optional asked me
  • it wasn't clear and linked one HTML::Mason based solution Four easy steps to make Mason UTF-8 Unicode clean with Apache, mod_perl, and DBI , what caused confusions
  • the original is 4 years old and meantime (in 2012) the "poet" is created

评论:这个问题已经赢得了热门问题徽章",所以我可能不是唯一没有希望的人. :)

不幸的是,演示 full 问题栈导致了一个很长的问题,这非常梅森特定.

Unfortunately, demonstrating the full problem stack leads to an very long question and it is very Mason specific.

首先,仅提供意见的部分:)

First, the opinions-only part :)

我使用HTML :: Mason已有很长时间了,现在尝试使用Mason2. 诗人梅森 是CPAN中最先进的框架. 没发现什么可比性,开箱即用的东西写得很干净,但很容易被黑:)/web-apps, 包括许多电池(日志,缓存,配置管理,基于本地PGSI的等等)

I'm using HTML::Mason over ages, and now trying to use Mason2. The Poet and Mason are the most advanced frameworks in the CPAN. Found nothing comparamble, what out-of-box allows write so clean /but very hackable :)/ web-apps, with many batteries included (logging, cacheing, config-management, native PGSI based, etc...)

不幸的是,作者并不关心其余的单词, 例如默认情况下,它仅基于ascii, 没有任何手册,常见问题或以下建议:如何与unicode结合使用

Unfortunately, the author doesn't care about the rest of the word, e.g. by default, it is only ascii based, without any manual, faq or advices about: how to use it with unicode

现在,事实.演示创建一个诗人应用程序:

Now the facts. Demo. Create an poet app:

poet new my #the "my" directory is the $poet_root
mkdir -p my/comps/xls
cd my/comps/xls

,然后在dhandler.mc中添加以下内容(将降低两个基本问题的准确性)

and add into the dhandler.mc the following (what will demostrating the two basic problems)

<%class>
    has 'dwl';
    use Excel::Writer::XLSX;
</%class>
<%init>
    my $file = $m->path_info;

    $file =~ s/[^\w\.]//g;
    my $cell = lc join ' ', "ÅNGSTRÖM", "in the", $file;

    if( $.dwl ) {
        #create xlsx in the memory
        my $excel;
        open my $fh, '>', \$excel or die "Failed open scalar: $!";
        my $workbook  = Excel::Writer::XLSX->new( $excel );
        my $worksheet = $workbook->add_worksheet();
        $worksheet->write(0, 0, $cell);
        $workbook->close();

        #poet/mason output
        $m->clear_buffer;
        $m->res->content_type("application/vnd.ms-excel");
        $m->print($excel);
        $m->abort();
    }
</%init>
<table border=1>
<tr><td><% $cell %></td></tr>
</table>
<a href="?dwl=yes">download <% $file %></a>

并运行该应用

../bin/run.pl

转到 http://0:5000/xls/hello.xlsx ,您会得到:

+----------------------------+
| ÅngstrÖm in the hello.xlsx |
+----------------------------+
download hello.xlsx

点击下载hello.xlsx ,您将获得hello.xlsx在下载中.

Clicking the download hello.xlsx, you will get hello.xlsx in the downloads.

上面的内容降级了第一个问题, 例如组件的源代码不在use utf8;下, 因此lc不能理解字符.

The above demostrating the first problem, e.g. the component's source arent "under" the use utf8;, so the lc doesn't understand characters.

第二个问题如下,尝试 [ http://0:5000/xls/hélló.xlsx] ,或 http://0:5000/xls/h%C3%A9ll% C3%B3.xlsx 您会看到:

The second problem is the following, try the [http://0:5000/xls/hélló.xlsx] , or http://0:5000/xls/h%C3%A9ll%C3%B3.xlsx and you will see:

+--------------------------+
| ÅngstrÖm in the hll.xlsx |
+--------------------------+
download hll.xlsx
#note the wrong filename

当然,输入(path_info)不会被解码,该脚本适用于utf8编码的八位位组,而不适用于perl字符.

Of course, the input (the path_info) isn't decoded, the script works with the utf8 encoded octets and not with perl characters.

所以,告诉perl-通过将use utf8;添加到<%class%>中,结果是源位于utf8中"

So, telling perl - "the source is in utf8", by adding the use utf8; into the <%class%>, results

+--------------------------+
| �ngstr�m in the hll.xlsx |
+--------------------------+
download hll.xlsx

添加use feature 'unicode_strings'(或use 5.014;)更糟:

+----------------------------+
| �ngstr�m in the h�ll�.xlsx |
+----------------------------+
download h�ll�.xlsx

当然,该源现在包含宽字符,在输出中需要Encode::encode_utf8.

Of course, the source now contains wide characters, it needs Encode::encode_utf8 at the output.

一个人可以尝试使用以下过滤器:

One could try use an filter such:

<%filter uencode><% Encode::encode_utf8($yield->()) %></%filter>

并过滤整个输出:

% $.uencode {{
<table border=1>
<tr><td><% $cell %></td></tr>
</table>
<a href="?dwl=yes">download <% $file %></a>
% }}

但这只能部分帮助,因为需要注意<%init%><%perl%>块中的编码. 在很多地方对Perl代码的内部进行编码/解码(读取:不在边界)会导致代码不稳定.

but this helps only partially, because need care about the encoding in the <%init%> or <%perl%> blocks. Encoding/decoding inside of the perl code at many places, (read: not at the borders) leads to an spagethy code.

应在某处清楚地进行编码/解码 诗人/

The encoding/decoding should be clearly done somewhere at the Poet/Mason borders - of course, the Plack operates on the byte level.

部分解决方案.

很高兴, Poet 巧妙地允许修改它的(和梅森的)部分,所以, 在$poet_root/lib/My/Mason中,您可以将Compilation.pm修改为:

Happyly, the Poet cleverly allows modify it's (and Mason's) parts, so, in the $poet_root/lib/My/Mason you could modify the Compilation.pm to:

override 'output_class_header' => sub {
    return join("\n",
        super(), qq(
        use 5.014;
        use utf8;
        use Encode;
        )
    );
};

将要插入的前导码插入每个梅森组件中的内容. (不要忘记触摸每个组件,或者只是从$poet_root/data/obj中删除已编译的对象.)

what will insert the wanted preamble into every Mason component. (Don't forget touch every component, or simply remove the compiled objects from the $poet_root/data/obj).

您还可以尝试在边界处处理请求/响应, 通过将$poet_root/lib/My/Mason/Request.pm编辑为:

Also you could try handle the request/responses at the borders, by editing the $poet_root/lib/My/Mason/Request.pm to:

#found this code somewhere on the net
use Encode;
override 'run' => sub {
    my($self, $path, $args) = @_;

    #decode values - but still missing the "keys" decode
    foreach my $k (keys %$args) {
        $args->set($k, decode_utf8($args->get($k)));
    }

    my $result = super();

    #encode the output - BUT THIS BREAKS the inline XLS
    $result->output( encode_utf8($result->output()) );
    return $result;
};

对所有内容进行编码是错误的策略,它会破坏,例如XLS.

Encode everything is an wrong strategy, it breaks e.g. the XLS.

因此,(我在2011年问了最初的问题)四年之后,仍然不知道:(如何正确使用

So, 4 years after (i asked the original question in 2011) still don't know :( how to use correctly the unicode in the Mason2 applications and still doesn't exists any documentation or helpers about it. :(

主要问题是: -哪里(应该通过Moose的方法修改器修改哪些方法)以及如何正确解码输入和输出(在Poet/Mason应用中)

The main questions are: - where (what methods should be modified by Moose's method modifiers) and how correctly decode the inputs and where the output (in the Poet/Mason app.)

  • ,但仅是文本的,例如text/plaintext/html等...
  • 执行上述无意外"-例如什么将简单地工作. ;)
  • but only textual ones, e.g. text/plain or text/html and such...
  • a do the above "surprise free" - e.g. what will simply works. ;)

有人可以提供真实代码帮助吗-我应该在上面进行哪些修改?

Could someone please help with real code - what i should modify in the above?

推荐答案

好的,我已经在Firefox上进行了测试. HTML可以正确显示UTF-8,而不会留下zip,因此应该可以在任何地方使用.

OK, I've tested this with Firefox. The HTML displays the UTF-8 correctly and leaves the zip alone, so should work everywhere.

如果从poet new My开始应用补丁,则需要patch -p1 -i...path/to/thisfile.diff.

If you start with poet new My to apply the patch you need patch -p1 -i...path/to/thisfile.diff.

diff -ruN orig/my/comps/Base.mc new/my/comps/Base.mc
--- orig/my/comps/Base.mc   2015-05-20 21:48:34.515625000 -0700
+++ new/my/comps/Base.mc    2015-05-20 21:57:34.703125000 -0700
@@ -2,9 +2,10 @@
 has 'title' => (default => 'My site');
 </%class>

-<%augment wrap>
-  <html>
+<%augment wrap><!DOCTYPE html>
+  <html lang="en-US">
     <head>
+      <meta charset="utf-8">
       <link rel="stylesheet" href="/static/css/style.css">
 % $.Defer {{
       <title><% $.title %></title>
diff -ruN orig/my/comps/xls/dhandler.mc new/my/comps/xls/dhandler.mc
--- orig/my/comps/xls/dhandler.mc   1969-12-31 16:00:00.000000000 -0800
+++ new/my/comps/xls/dhandler.mc    2015-05-20 21:53:42.796875000 -0700
@@ -0,0 +1,30 @@
+<%class>
+    has 'dwl';
+    use Excel::Writer::XLSX;
+</%class>
+<%init>
+    my $file = $m->path_info;
+    $file = decode_utf8( $file );
+    $file =~ s/[^\w\.]//g;
+    my $cell = lc join ' ', "ÅNGSTRÖM", "in the", $file ;
+    if( $.dwl ) {
+        #create xlsx in the memory
+        my $excel;
+        open my $fh, '>', \$excel or die "Failed open scalar: $!";
+        my $workbook  = Excel::Writer::XLSX->new( $fh );
+        my $worksheet = $workbook->add_worksheet();
+        $worksheet->write(0, 0, $cell);
+        $workbook->close();
+
+        #poet/mason output
+        $m->clear_buffer;
+        $m->res->content_type("application/vnd.ms-excel");
+        $m->print($excel);
+        $m->abort();
+    }
+</%init>
+<table border=1>
+<tr><td><% $cell %></td></tr>
+</table>
+<p> <a href="%c3%85%4e%47%53%54%52%c3%96%4d%20%68%c3%a9%6c%6c%c3%b3">ÅNGSTRÖM hélló</a>
+<p> <a href="?dwl=yes">download <% $file %></a>
diff -ruN orig/my/lib/My/Mason/Compilation.pm new/my/lib/My/Mason/Compilation.pm
--- orig/my/lib/My/Mason/Compilation.pm 2015-05-20 21:48:34.937500000 -0700
+++ new/my/lib/My/Mason/Compilation.pm  2015-05-20 21:49:54.515625000 -0700
@@ -5,11 +5,13 @@
 extends 'Mason::Compilation';

 # Add customizations to Mason::Compilation here.
-#
-# e.g. Add Perl code to the top of every compiled component
-#
-# override 'output_class_header' => sub {
-#      return join("\n", super(), 'use Foo;', 'use Bar qw(baz);');
-# };
-
+override 'output_class_header' => sub {
+    return join("\n",
+        super(), qq(
+        use 5.014;
+        use utf8;
+        use Encode;
+        )
+    );
+};
 1;
\ No newline at end of file
diff -ruN orig/my/lib/My/Mason/Request.pm new/my/lib/My/Mason/Request.pm
--- orig/my/lib/My/Mason/Request.pm 2015-05-20 21:48:34.968750000 -0700
+++ new/my/lib/My/Mason/Request.pm  2015-05-20 21:55:03.093750000 -0700
@@ -4,20 +4,27 @@

 extends 'Mason::Request';

-# Add customizations to Mason::Request here.
-#
-# e.g. Perform tasks before and after each Mason request
-#
-# override 'run' => sub {
-#     my $self = shift;
-#
-#     do_tasks_before_request();
-#
-#     my $result = super();
-#
-#     do_tasks_after_request();
-#
-#     return $result;
-# };
+use Encode qw/ encode_utf8 decode_utf8 /;

-1;
\ No newline at end of file
+override 'run' => sub {
+    my($self, $path, $args) = @_;
+    foreach my $k (keys %$args) {
+        my $v = $args->get($k);
+        $v=decode_utf8($v);
+        $args->set($k, $v);
+    }
+    my $result = super();
+    my( $ctype, $charset ) = $self->res->headers->content_type_charset;
+    if( ! $ctype ){
+        $ctype = 'text/html';
+        $charset = 'UTF-8';
+        $self->res->content_type( "$ctype; $charset");
+        $result->output( encode_utf8(''.( $result->output())) );
+    } elsif( ! $charset and $ctype =~ m{text/(?:plain|html)} ){
+        $charset = 'UTF-8';
+        $self->res->content_type( "$ctype; $charset");
+        $result->output( encode_utf8(''.( $result->output())) );
+    }
+    return $result;
+};
+1;

这篇关于如何使Mason2 UTF-8干净?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆