Python字典包含编码值 [英] Python Dictionary Contains Encoded Values
问题描述
我有一个熊猫数据框oParameterData
,它已使用Hive ODBC连接在Hadoop上构建了查询.我正在用它填充名为oParameter
I have a pandas data frame oParameterData
which I have built querying on Hadoop using Hive ODBC connection. I am using it to populate a Python dictionary called oParameter
import pyodbc
import pandas
oConnexionString = 'Driver={ClouderaHive};[...]'
oConnexion = pyodbc.connect(oConnexionString, autocommit=True)
oConnexion.setencoding(encoding='utf-8')
oQueryParameter = "select * from my_db.my_table;"
oParameterData = pandas.read_sql(oQueryParameter, oConnexion)
oCursor = oConnexion.cursor()
for oRow in oParameterData.index:
oParameter = {}
oParameter['pTableName'] = oParameterData.loc[oRow,'game']
oParameter['pDataPartition'] = oParameterData.loc[oRow,'partition']
oParameter['pDataLocation'] = oParameterData.loc[oRow,'data_path']
oParameter['pAvroSchemaURL'] = oParameterData.loc[oRow,'schema_path']
当我打印整个词典时,我将看到以下内容:
When I print the whole dictionary I have the following:
>>> print(oParameter)
>>> {'pDataLocation': '/\x00d\x00a\x00t\x00a\x00/\x00d\x00a\x00t\x00a\x00l\x00a\x00k\x00e\x00/\x00t\x00m\x00p\x00/\x00k\x00a\x00f\x00k\x00a\x00d\x00u\x00m\x00p\x00e\x00r\x00/\x00d\x00a\x00t\x00a\x00/\x00H\x00e\x00r\x00o\x00/\x00c\x00o\x00n\x00t\x00e\x00x\x00t\x00.\x00s\x00t\x00a\x00r\x00t\x00.\x00G\x00a\x00m\x00e\x00M\x00o\x00d\x00e\x00\x00/\x00v\x00=\x001\x00.\x00x\x00', 'pAvroSchemaURL': '/\x00d\x00a\x00t\x00a\x00/\x00d\x00a\x00t\x00a\x00l\x00a\x00k\x00e\x00/\x00t\x00m\x00p\x00/\x00k\x00a\x00f\x00k\x00a\x00d\x00u\x00m\x00p\x00e\x00r\x00/\x00d\x00a\x00t\x00a\x00/\x00H\x00e\x00r\x00o\x00/\x00c\x00o\x00n\x00t\x00e\x00x\x00t\x00.\x00s\x00t\x00a\x00r\x00t\x00.\x00G\x00a\x00m\x00e\x00M\x00o\x00d\x00e\x00\x00/\x00c\x00o\x00n\x00t\x00e\x00x\x00t\x00.\x00s\x00t\x00a\x00r\x00t\x00.\x00G\x00a\x00m\x00e\x00M\x00o\x00d\x00e\x00_\x001\x00.\x00x\x00.\x00a\x00v\x00s\x00c\x00', 'pTableName': 'h\x00e\x00r\x00o\x00_c\x00o\x00n\x00t\x00e\x00x\x00t\x00', 'pDataPartition': 'd\x00t\x00'}
但是当我一键打印键和值时,它们会正确显示:
But when I print Keys and Values one by one they display properly:
>>> print(oParameter['pTableName'])
>>> 'hero_game_context_gamemode'
>>> print(oParameter['pDataPartition'])
>>> 'dt'
能否请您解释为什么以及如何对字典进行正确编码? 我在此处描述的后续查询中使用这些参数: Hive ParseException(在Drop Table语句中) 而且我猜测查询由于此编码问题而失败.
Could you please explain why and how to have the dictionary properly encoded? I am using these parameters in subsequent queries described here: Hive ParseException in Drop Table Statement and I am guessing the queries fail due to this encoding issue.
推荐答案
进一步研究之后,我发现使用pyodbc连接到Hadoop时编码设置不正确.
After investigating further, I found out the encoding was not correctly set when connecting to Hadoop using pyodbc.
我正在这样连接:
import pyodbc
import pandas
oConnexionString = 'Driver={ClouderaHive};[...]'
oConnexion = pyodbc.connect(oConnexionString, autocommit=True)
oConnexion.setencoding(encoding='utf-8')
我更改为这样连接:
import pyodbc
import pandas
oConnexionString = 'Driver={ClouderaHive};[...]'
oConnexion = pyodbc.connect(oConnexionString, autocommit=True)
oConnexion.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
oConnexion.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
oConnexion.setencoding(encoding='utf-8')
现在,当我从数据框中构建字典时,它会正确显示.
Now when I build my dictionary from the data frame it displays properly.
这篇关于Python字典包含编码值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!