在字典密钥中将Unicode编码为日语 [英] Encoding Unicode in the Dictionary Key to Japanese

查看：117 发布时间：2020/5/5 14:38:43 python dictionary unicode

本文介绍了在字典密钥中将Unicode编码为日语的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚开始通过Python2使用日语进行文本聚类.但是，当我根据这些日语单词/术语创建字典时，字典键变为unicode而不是日语.代码如下:

I just started working on text clustering in Japanese through Python2. However, when I created the dictionary based on these Japanese words/terms, the dictionary keys become unicode instead of Japanese. The codes are as follows:

# load data
allWrdMat10 = pd.read_csv("../../data/allWrdMat10.csv.gz", 
encoding='CP932') 


## Set X as CSR Sparse Matrix
X = np.array(allWrdMat10)
X = sp.csr_matrix(X)

## create dictionary
dict_index = {t:i for i,t in enumerate(allWrdMat10.columns)}

freqrank = np.array(dict_index.values()).argsort()
X_transform = X[:, freqrank < 1000].transpose().toarray()

allWrdMat10.columns的结果仍然是日语，如下所示:

The results of allWrdMat10.columns are still Japanese as follows:

Index([u'?', u'．', u'・', u'％', u'０', u'１', u'１０月', u'１１月', u'１２
月', u'１つ',
...
u'瀋陽', u'疆', u'盧', u'籠', u'絆', u'胚', u'諫早', u'趙', u'鉉', u'鎔
基'],dtype='object', length=8655)

但是，dict_index.keys()的结果如下:

[u'\u77ed\u9283',
 u'\u5efa\u3066',
 u'\u4f0a',
 u'\u5e73\u5b89',
 u'\u6025\u9a30',
 u'\u897f\u65e5\u672c',
 u'\u5e03\u9663',
 ...]

有什么办法可以将日语单词/术语保留在字典键中?还是有什么办法可以将unicode转换回日语单词/词条?谢谢.

Is there any way I can keep the Japanese words/terms in the dictionary keys? Or is there any way I can convert the unicodes back to Japanese words/terms? Thanks.

在字典密钥中将Unicode编码为日语 [英] Encoding Unicode in the Dictionary Key to Japanese

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在字典密钥中将Unicode编码为日语 [英] Encoding Unicode in the Dictionary Key to Japanese

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭