如何解码可引用的字符(从引用到char)? [英] How to decode quotable chars (from quotable to a char)?

查看:176
本文介绍了如何解码可引用的字符(从引用到char)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 quoted-printables 的文字。以下是这样一个文本的示例(来自维基百科文章):

I have a text with quoted-printables. Here is an example of such a text (from a wikipedia article):


如果你相信真相= 3Dbeauty,那么肯定= 20 =

数学是哲学最美丽的分支。 p>

If you believe that truth=3Dbeauty, then surely=20=
mathematics is the most beautiful branch of philosophy.

我正在寻找一个Java类,它将编码形式解码为字符,例如 = 20 空间。

I am looking for a Java class, which decode the encoded form to chars, e.g., =20 to a space.

更新:感谢精英绅士,我知道我需要使用QuotedPrintableCodec:

UPDATE: Thanks to The Elite Gentleman, I know that I need to use QuotedPrintableCodec:

import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;

public class QuotedPrintableCodecTest { 
private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws DecoderException
    {
        QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );           
    }
}   

然而,我继续收到以下异常:

However I keep getting the following exception:

org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
    at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
    at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)

我做错了什么?

更新2:我有发现此问题@ SO ,并了解 MimeUtility

UPDATE 2: I have found this question @ SO and learn about MimeUtility:

import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;

public class QuotedPrintableCodecTest {
    private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws MessagingException, IOException  
    {
        InputStream is = new ByteArrayInputStream(TXT.getBytes());

            BufferedReader br = new BufferedReader ( new InputStreamReader(  MimeUtility.decode(is, "quoted-printable") ));         
            StringWriter writer = new StringWriter(); 

            String line;
            while( (line = br.readLine() ) != null )
            {
                writer.append(line);
            }
            System.out.println("INPUT:  "  + TXT);
            System.out.println("OUTPUT: " +  writer.toString() );       
    }
    }

然而输出仍然不完美, =':

However the output still is not perfect, it contains '=' :

INPUT:  If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.

现在我做错了什么?

推荐答案

Apache Commons Codec QuotedPrintableCodec 类是RFC 1521引用可打印部分的实现。

Apache Commons Codec QuotedPrintableCodec class does is the implementation of the RFC 1521 Quoted-Printable section.

更新,您可引用的可打印字符串错误,维基百科的示例使用软线路断开。

Update, Your quoted-printable string is wrong, as the example on Wikipedia uses Soft-line breaks.

换行符:

Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
      that encoded lines be no more than 76 characters long. If longer
      lines are to be encoded with the Quoted-Printable encoding, 'soft'
      line breaks must be used. An equal sign as the last character on a
      encoded line indicates such a non-significant ('soft') line break
      in the encoded text. Thus if the "raw" form of the line is a
      single unencoded line that says:

          Now's the time for all folk to come to the aid of
          their country.

      This can be represented, in the Quoted-Printable encoding, as

          Now's the time =
          for all folk to come=
           to the aid of their country.

      This provides a mechanism with which long lines are encoded in
      such a way as to be restored by the user agent.  The 76 character
      limit does not count the trailing CRLF, but counts all other
      characters, including any equal signs.

所以你的文字应该如下:

So your text should be made as follows:

private static final String CRLF = "\r\n";
private static final String S = "If you believe that truth=3Dbeauty, then surely=20=" + CRLF + "mathematics is the most beautiful branch of philosophy.";

Javadoc明确指出:

The Javadoc clearly states:


引用可打印规范的规则#3,#4和#5没有实现
,因为完整的可引用规范不会将
很好地放入字节[面向编解码器框架。一旦
可以完成编解码器可编译器框架就绪。
背后的动机是以部分形式提供编解码器,它已经可以以
的方式进入不需要引用可打印行
格式化的应用程序(规则#3,#4, #5),例如Q编​​解码器。

Rules #3, #4, and #5 of the quoted-printable spec are not implemented yet because the complete quoted-printable spec does not lend itself well into the byte[] oriented codec framework. Complete the codec once the steamable codec framework is ready. The motivation behind providing the codec in a partial form is that it can already come in handy for those applications that do not require quoted-printable line formatting (rules #3, #4, #5), for instance Q codec.

还有一个错误记录对于Apache QuotedPrintableCodec,因为它不支持软线路断开。

And there is a bug logged for Apache QuotedPrintableCodec as it doesn't support the soft-line breaks.

这篇关于如何解码可引用的字符(从引用到char)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆