在Swift中解码带引号的可打印消息 [英] Decoding quoted-printable messages in Swift

查看:96
本文介绍了在Swift中解码带引号的可打印消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带引号的可打印字符串,例如费用为= C2 = A31,000".如何将其转换为费用为1,000英镑".

I have a quoted-printable string such as "The cost would be =C2=A31,000". How do I convert this to "The cost would be £1,000".

此刻我只是在手动转换文本,这并不涵盖所有情况.我确定只有一行代码可以帮助解决这个问题.

I'm just converting text manually at the moment and this doesn't cover all cases. I'm sure there is just one line of code that will help with this.

这是我的代码:

func decodeUTF8(message: String) -> String
{
    var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)

    return newMessage
}

谢谢

推荐答案

一种简单的方法是利用(NS)String方法 stringByRemovingPercentEncoding为此. 这是在观察到的 解码quoted-printables , 所以第一个解决方案主要是将答案翻译成 到Swift的那个线程.

An easy way would be to utilize the (NS)String method stringByRemovingPercentEncoding for this purpose. This was observed in decoding quoted-printables, so the first solution is mainly a translation of the answers in that thread to Swift.

这个想法是将引号可打印的"= NN"编码替换为 百分比编码为%NN",然后使用现有方法删除 百分比编码.

The idea is to replace the quoted-printable "=NN" encoding by the percent encoding "%NN" and then use the existing method to remove the percent encoding.

继续线是分开处理的. 另外,必须首先对输入字符串中的百分比字符进行编码, 否则,它们将被视为百分比的主角 编码.

Continuation lines are handled separately. Also, percent characters in the input string must be encoded first, otherwise they would be treated as the leading character in a percent encoding.

func decodeQuotedPrintable(message : String) -> String? {
    return message
        .stringByReplacingOccurrencesOfString("=\r\n", withString: "")
        .stringByReplacingOccurrencesOfString("=\n", withString: "")
        .stringByReplacingOccurrencesOfString("%", withString: "%25")
        .stringByReplacingOccurrencesOfString("=", withString: "%")
        .stringByRemovingPercentEncoding
}

该函数返回一个可选字符串,该字符串为nil用于无效输入. 无效的输入可以是:

The function returns an optional string which is nil for invalid input. Invalid input can be:

  • 一个"="字符,后跟两个十六进制数字, 例如"= XX".
  • 未解码为有效UTF-8序列的"= NN"序列, 例如"= E2 = 64".
  • A "=" character which is not followed by two hexadecimal digits, e.g. "=XX".
  • A "=NN" sequence which does not decode to a valid UTF-8 sequence, e.g. "=E2=64".

示例:

if let decoded = decodeQuotedPrintable("=C2=A31,000") {
    print(decoded) // £1,000
}

if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
    print(decoded) // "Hello … world!"
}


更新1:上面的代码假定邮件使用的是UTF-8 如大多数示例中所示,用于引用非ASCII字符的编码:C2 A3是用于£"的UTF-8编码,E2 80 A4是用于的UTF-8编码.


Update 1: The above code assumes that the message uses the UTF-8 encoding for quoting non-ASCII characters, as in most of your examples: C2 A3 is the UTF-8 encoding for "£", E2 80 A4 is the UTF-8 encoding for .

如果输入为"Rub=E9n",则该消息正在使用 Windows-1252 编码. 要正确解码,您必须替换

If the input is "Rub=E9n" then the message is using the Windows-1252 encoding. To decode that correctly, you have to replace

.stringByRemovingPercentEncoding

作者

.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)

还有一些方法可以从"Content-Type"中检测编码 标头字段,例如进行比较 https://stackoverflow.com/a/32051684/1187415 .

There are also ways to detect the encoding from a "Content-Type" header field, compare e.g. https://stackoverflow.com/a/32051684/1187415.

更新2: stringByReplacingPercentEscapesUsingEncoding 方法被标记为已弃用,因此上述代码将始终生成 编译器警告.不幸的是,似乎没有其他方法 由Apple提供.

Update 2: The stringByReplacingPercentEscapesUsingEncoding method is marked as deprecated, so the above code will always generate a compiler warning. Unfortunately, it seems that no alternative method has been provided by Apple.

因此,这是一种全新的,完全独立的解码方法, 不会引起任何编译器警告.这次我写了 作为String的扩展方法.解释注释在 代码.

So here is a new, completely self-contained decoding method which does not cause any compiler warning. This time I have written it as an extension method for String. Explaining comments are in the code.

extension String {

    /// Returns a new string made by removing in the `String` all "soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding. 
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences. 
        return self
            .stringByReplacingOccurrencesOfString("=\r\n", withString: "")
            .stringByReplacingOccurrencesOfString("=\n", withString: "")
            .decodeQuotedPrintableSequences(enc)
    }

    /// Helper function doing the real work.
    /// Decode all "=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {

        var result = ""
        var position = startIndex

        // Find the next "=" and copy characters preceding it to the result:
        while let range = rangeOfString("=", range: position ..< endIndex) {
            result.appendContentsOf(self[position ..< range.startIndex])
            position = range.startIndex

            // Decode one or more successive "=HH" sequences to a byte array:
            let bytes = NSMutableData()
            repeat {
                let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
                if hexCode.characters.count < 2 {
                    return nil // Incomplete hex code
                }
                guard var byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.appendBytes(&byte, length: 1)
                position = position.advancedBy(3)
            } while position != endIndex && self[position] == "="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.appendContentsOf(dec)
        }

        // Copy remaining characters to the result:
        result.appendContentsOf(self[position ..< endIndex])

        return result
    }
}

示例用法:

if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // £1,000
}

if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // "Hello … world!"
}

if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
    print(decoded) // Rubén
}


针对Swift 4(及更高版本)的更新:

extension String {

    /// Returns a new string made by removing in the `String` all "soft line
    /// breaks" and replacing all quoted-printable escape sequences with the
    /// matching characters as determined by a given encoding.
    /// - parameter encoding:     A string encoding. The default is UTF-8.
    /// - returns:                The decoded string, or `nil` for invalid input.

    func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {

        // Handle soft line breaks, then replace quoted-printable escape sequences.
        return self
            .replacingOccurrences(of: "=\r\n", with: "")
            .replacingOccurrences(of: "=\n", with: "")
            .decodeQuotedPrintableSequences(encoding: enc)
    }

    /// Helper function doing the real work.
    /// Decode all "=HH" sequences with respect to the given encoding.

    private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {

        var result = ""
        var position = startIndex

        // Find the next "=" and copy characters preceding it to the result:
        while let range = range(of: "=", range: position..<endIndex) {
            result.append(contentsOf: self[position ..< range.lowerBound])
            position = range.lowerBound

            // Decode one or more successive "=HH" sequences to a byte array:
            var bytes = Data()
            repeat {
                let hexCode = self[position...].dropFirst().prefix(2)
                if hexCode.count < 2 {
                    return nil // Incomplete hex code
                }
                guard let byte = UInt8(hexCode, radix: 16) else {
                    return nil // Invalid hex code
                }
                bytes.append(byte)
                position = index(position, offsetBy: 3)
            } while position != endIndex && self[position] == "="

            // Convert the byte array to a string, and append it to the result:
            guard let dec = String(data: bytes, encoding: enc) else {
                return nil // Decoded bytes not valid in the given encoding
            }
            result.append(contentsOf: dec)
        }

        // Copy remaining characters to the result:
        result.append(contentsOf: self[position ..< endIndex])

        return result
    }
}

示例用法:

if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
    print(decoded) // £1,000
}

if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
    print(decoded) // "Hello … world!"
}

if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
    print(decoded) // Rubén
}

这篇关于在Swift中解码带引号的可打印消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆