如何解码可引用的字符（从引用到char）？ [英] How to decode quotable chars (from quotable to a char)?

查看：176 发布时间：2017/8/16 23:37:40 java encoding

本文介绍了如何解码可引用的字符（从引用到char）？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含 quoted-printables 的文字。以下是这样一个文本的示例（来自维基百科文章）：

I have a text with quoted-printables. Here is an example of such a text (from a wikipedia article):

如果你相信真相= 3Dbeauty，那么肯定= 20 =

数学是哲学最美丽的分支。 p>

If you believe that truth=3Dbeauty, then surely=20=
mathematics is the most beautiful branch of philosophy.

我正在寻找一个Java类，它将编码形式解码为字符，例如 = 20 空间。

I am looking for a Java class, which decode the encoded form to chars, e.g., =20 to a space.

更新：感谢精英绅士，我知道我需要使用QuotedPrintableCodec：

UPDATE: Thanks to The Elite Gentleman, I know that I need to use QuotedPrintableCodec:

import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;

public class QuotedPrintableCodecTest { 
private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws DecoderException
    {
        QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );           
    }
}

然而，我继续收到以下异常：

However I keep getting the following exception:

org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
    at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
    at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)

我做错了什么？

更新2：我有发现此问题@ SO ，并了解 MimeUtility ：

UPDATE 2: I have found this question @ SO and learn about MimeUtility:

import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;

public class QuotedPrintableCodecTest {
    private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws MessagingException, IOException  
    {
        InputStream is = new ByteArrayInputStream(TXT.getBytes());

            BufferedReader br = new BufferedReader ( new InputStreamReader(  MimeUtility.decode(is, "quoted-printable") ));         
            StringWriter writer = new StringWriter(); 

            String line;
            while( (line = br.readLine() ) != null )
            {
                writer.append(line);
            }
            System.out.println("INPUT:  "  + TXT);
            System.out.println("OUTPUT: " +  writer.toString() );       
    }
    }

然而输出仍然不完美， ='：

However the output still is not perfect, it contains '=' :

INPUT:  If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.

现在我做错了什么？

推荐答案

Apache Commons Codec QuotedPrintableCodec 类是RFC 1521引用可打印部分的实现。

Apache Commons Codec QuotedPrintableCodec class does is the implementation of the RFC 1521 Quoted-Printable section.

更新，您可引用的可打印字符串错误，维基百科的示例使用软线路断开。

Update, Your quoted-printable string is wrong, as the example on Wikipedia uses Soft-line breaks.

换行符：

Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
      that encoded lines be no more than 76 characters long. If longer
      lines are to be encoded with the Quoted-Printable encoding, 'soft'
      line breaks must be used. An equal sign as the last character on a
      encoded line indicates such a non-significant ('soft') line break
      in the encoded text. Thus if the "raw" form of the line is a
      single unencoded line that says:

          Now's the time for all folk to come to the aid of
          their country.

      This can be represented, in the Quoted-Printable encoding, as

          Now's the time =
          for all folk to come=
           to the aid of their country.

      This provides a mechanism with which long lines are encoded in
      such a way as to be restored by the user agent.  The 76 character
      limit does not count the trailing CRLF, but counts all other
      characters, including any equal signs.

所以你的文字应该如下：

So your text should be made as follows:

private static final String CRLF = "\r\n";
private static final String S = "If you believe that truth=3Dbeauty, then surely=20=" + CRLF + "mathematics is the most beautiful branch of philosophy.";

Javadoc明确指出：

The Javadoc clearly states:

引用可打印规范的规则＃3，＃4和＃5没有实现
，因为完整的可引用规范不会将
很好地放入字节[面向编解码器框架。一旦
可以完成编解码器可编译器框架就绪。
背后的动机是以部分形式提供编解码器，它已经可以以
的方式进入不需要引用可打印行
格式化的应用程序（规则＃3，＃4，＃5），例如Q编解码器。

Rules #3, #4, and #5 of the quoted-printable spec are not implemented yet because the complete quoted-printable spec does not lend itself well into the byte[] oriented codec framework. Complete the codec once the steamable codec framework is ready. The motivation behind providing the codec in a partial form is that it can already come in handy for those applications that do not require quoted-printable line formatting (rules #3, #4, #5), for instance Q codec.

还有一个错误记录对于Apache QuotedPrintableCodec，因为它不支持软线路断开。

And there is a bug logged for Apache QuotedPrintableCodec as it doesn't support the soft-line breaks.

这篇关于如何解码可引用的字符（从引用到char）？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何解码可引用的字符（从引用到char）？ [英] How to decode quotable chars (from quotable to a char)?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何解码可引用的字符（从引用到char）？ [英] How to decode quotable chars (from quotable to a char)?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭