通过 python-ldap 使用来自 Active Directory 的 unicode 编码字符串 [英] Working with unicode encoded Strings from Active Directory via python-ldap

查看:45
本文介绍了通过 python-ldap 使用来自 Active Directory 的 unicode 编码字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经想到了这个问题,但经过一些测试后,我决定创建一个包含更具体信息的新问题:

I already came up with this problem, but after some testing I decided to create a new question with some more specific Infos:

我正在从我们的 Active Directory 中使用 python-ldap(和 Python 2.7)读取用户帐户.这确实工作得很好,但我有特殊字符的问题.在控制台上打印时,它们确实看起来像 UTF-8 编码的字符串.目标是将它们写入 MySQL 数据库,但我从一开始就没有将这些字符串转换为正确的 UTF-8.

I am reading user accounts with python-ldap (and Python 2.7) from our Active Directory. This does work well, but I have problems with special chars. They do look like UTF-8 encoded strings when printed on the console. The goal is to write them into a MySQL DB, but I don't get those strings into proper UTF-8 from the beginning.

示例(fullentries 是包含所有 AD 条目的数组):

Example (fullentries is my array with all the AD entries):

fullentries[23][1].decode('utf-8', 'ignore')    
print fullentries[23][1].encode('utf-8', 'ignore')
print fullentries[23][1].encode('latin1', 'ignore')
print repr(fullentries[23][1])

手动插入字符串的第二个测试如下:

A second test with a string inserted by hand as follows:

testentry = "Mxc3xbcller"
testentry.decode('utf-8', 'ignore')
print testentry.encode('utf-8', 'ignore')
print testentry.encode('latin1', 'ignore')
print repr(testentry)

第一个例子的输出是:

Mxc3xbcller
Mxc3xbcller
u'M\xc3\xbcller'

如果我尝试用 .replace('\\','\) 替换双反斜杠,则输出保持不变.

If I try to replace the double backslashes with .replace('\\','\) the output remains the same.

第二个例子的输出:

Müller
M�ller
'Mxc3xbcller'

有没有办法让 AD 输出正确编码?我已经阅读了很多文档,但都说明 LDAPv3 为您提供严格的 UTF-8 编码字符串.Active Directory 使用 LDAPv3.

Is there any way to get the AD output properly encoded? I already read a lot of documentation, but it all states that LDAPv3 gives you strictly UTF-8 encoded strings. Active Directory uses LDAPv3.

我的旧问题这个主题在这里:Writing UTF-8 String到 MySQL 与 Python

My older question this topic is here: Writing UTF-8 String to MySQL with Python

添加了代表信息

推荐答案

首先,要知道 print 到 Windows 控制台通常是乱码的步骤,因此对于您的测试,您应该 print repr(s) 以查看字符串中的精确字节.

First, know that printing to a Windows console is often the step that garbles data, so for your tests, you should print repr(s) to see the precise bytes you have in your string.

您需要了解来自 AD 的数据是如何编码的.同样,print repr(s) 会让你看到数据的内容.

You need to find out how the data from AD is encoded. Again, print repr(s) will let you see the content of the data.

更新:

好的,看起来您不知何故收到了奇怪的字符串.可能有办法让它们变得更好,但无论如何你都可以适应,尽管它并不漂亮:

OK, it looks like you're getting strange strings somehow. There might be a way to get them better, but you can adapt in any case, though it isn't pretty:

u.decode('unicode_escape').encode('iso8859-1').decode('utf8')

您可能想看看是否可以以更自然的格式获取数据.

You might want to look into whether you can get the data in a more natural format.

这篇关于通过 python-ldap 使用来自 Active Directory 的 unicode 编码字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆