将 utf-8 字符转换为 Scandic 字母 [英] Converting utf-8 characters to scandic letters

查看:45
本文介绍了将 utf-8 字符转换为 Scandic 字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力尝试编码一个字符串,其中 scandic 字母为 utf-8 格式.例如,我想转换以下细绳:test_string = "\xc3\xa4\xc3\xa4abc"进入形式:test_string = "ääabc"最终目标是通过 API 将此字符串发送到 Slack-channel.我做了一些测试,发现 Slack 可以正确处理扫描字母.我尝试了以下命令:test_string= test_string.encode('latin1').decode('utf-8')但这根本不会改变字符串.

同样适用于更暴力的方法:

def simple_scand_convert(string):string = string.replace("\xc3\xa4", "ä")

同样,这根本不会改变字符串.我可以从哪里寻找解决方案的任何提示或材料?

解决方案

我无法重现您从传入的 webhook 中读取汤消息 代码片段;因此,我的回答是基于硬编码数据,并展示了 Python 特定 文本编码 raw_unicode_escapeunicode_escape 详细工作:

test_string = \\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4"# 硬编码打印('测试字符串',测试字符串)打印('.encode(raw_unicode_escape")',test_string.encode('raw_unicode_escape'))打印('.decode(unicode_escape")',test_string.encode('raw_unicode_escape').decode('unicode_escape'))print('.encode("latin1").decode() ',test_string.encode('raw_unicode_escape').decode('unicode_escape').编码('latin1').解码('utf-8'))

输出:\SO\68069394.py

<块引用>

test_string \xc3\xa5\xc3\xa4___åä.encode("raw_unicode_escape") b'\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4'.decode( "unicode_escape") åä___åä.encode("latin1").decode()

I am struggling with trying to encode a string where scandic letters are in utf-8 format. For example, I would like to convert following string: test_string = "\xc3\xa4\xc3\xa4abc" Into the form of : test_string = "ääabc" The end goal is to send this string to Slack-channel via API. I did some testing, and figured out that Slack handles scandic letters properly. I have tried the following command: test_string= test_string.encode('latin1').decode('utf-8') but this does not change the string at all.

Same goes for the more brute-force method:

def simple_scand_convert(string):
   string = string.replace("\xc3\xa4", "ä")

Again, this does not change the string at all. Any tips or materials from where I could look for the solution?

解决方案

I can't reproduce your reading the soup message from an incoming webhook code snippet; therefore, my answer is based on hard-coded data, and shows how Python specific text encodings raw_unicode_escape and unicode_escape work in detail:

test_string = "\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4"    # hard-coded
print('test_string                  ', test_string)
print('.encode("raw_unicode_escape")',
  test_string.encode( 'raw_unicode_escape'))
print('.decode(    "unicode_escape")',
  test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
print('.encode("latin1").decode()   ', 
  test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape').
              encode( 'latin1').decode( 'utf-8'))

Output: \SO\68069394.py

test_string                   \xc3\xa5\xc3\xa4___åä
.encode("raw_unicode_escape") b'\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4'
.decode(    "unicode_escape") åä___åä
.encode("latin1").decode()    åä___åä

这篇关于将 utf-8 字符转换为 Scandic 字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆