在python中设置区域编码 [英] Set locale encoding in python

查看:123
本文介绍了在python中设置区域编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从python代码中调用一个java程序,如下所示:

  subprocess.check_output([java ,-classpath,/Users/feralvam/Programas/semanticvectors-3.4/semanticvectors-3.4.jar:/Users/feralvam/Programas/lucene-3.5.0/lucene-core-3.5.0.jar:/Users /feralvam/Programas/lucene-3.5.0/contrib/demo/lucene-demo-3.5.0.jar:,pitt.search.semanticvectors.CompareTerms,-queryvectorfile,/ Users / feralvam / termvectors。 bin,term1,term2])

term1和term2是从文本读取的字符串文件是UTF-8编码。



当我从PyDev(Eclipse 3.7.2中的版本2.5)运行此命令时,我得到以下输出:
(这里,term1=Eles和term2=é)

  2012年6月26日11:20 :55 AM pitt.search.semanticvectors.CompareTerms main 
INFO:从文件打开查询向量存储:/Users/feralvam/termvectors.bin
2012年6月26日上午11:20:55 pitt.search。 semanticvectors.CompareTerms main
信息:无法打开Lucene索引
2012年6月26日11:20:55 am pitt.search.semanticvectors.CompareTerms main
INFO:没有Lucene索引用于查询项加权,所以所有查询条款将具有相同的重量。
没有找到'Eles'的向量
'Eles'的向量
没有找到'??'的向量
'??'$ b的向量$ b 2012年6月26日上午11:20:55 pitt.search.semanticvectors.CompareTerms main
INFO:输出Eles与??的相似度...

但是,如果我从终端运行相同的命令,我得到:

  2012年6月26日11:30:26 am pitt.search.semanticvectors.CompareTerms main 
INFO:从文件打开查询向量存储:/Users/feralvam/termvectors.bin
2012年6月26日上午11:30:26 pitt.search.semanticvectors.CompareTerms main
信息:无法打开Lucene索引
6月26日,2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO:没有Lucene索引用于查询词权重,所以所有查询词都具有相同的权重。
没有找到'Eles'的矢量
'Eles'的矢量
找到'é'
的矢量2012年06月26日上午11:30:26 pitt.search .semanticvectors.CompareTerms main
INFO:输出Eles与é的相似度...



假设SemanticVector如何工作,问题是在第二种情况下,使用正确的编码传递term2,但在第一种情况下不会发生。



现在,使用以下命令:

  print locale.getpreferredencoding ),sys.getdefaultencoding()

我得到以下信息:US-ASCII utf-8(in PyDev)和UTF-8 ascii(在终端)



所以我认为发生的是,它使用US-ASCII编码来传递参数,因此,结果是错误的,因为这些单词没有正确的编码。
顺便说一句,我使用的是python 2.7。



有没有办法改变这个?



感谢您提供任何帮助。

解决方案

您可以在LANG环境变量中传递区域设置名称你开始这个过程。
设置如下:

  env = os.environ.copy()
env ['LANG' ] ='en_US.UTF-8'
subprocess.check_output(...,env = env)


I'm calling a java program from my python code in the following way:

subprocess.check_output(["java", "-classpath", "/Users/feralvam/Programas/semanticvectors-3.4/semanticvectors-3.4.jar:/Users/feralvam/Programas/lucene-3.5.0/lucene-core-3.5.0.jar:/Users/feralvam/Programas/lucene-3.5.0/contrib/demo/lucene-demo-3.5.0.jar:", "pitt.search.semanticvectors.CompareTerms", "-queryvectorfile","/Users/feralvam/termvectors.bin",term1,term2])

"term1" and "term2" are strings read from a text file that is in UTF-8 encoding.

When I run this command from PyDev (version 2.5 in Eclipse 3.7.2) I get the following output: (here, "term1" = "Eles" and "term2" = "é")

Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: Opened query vector store from file: /Users/feralvam/termvectors.bin
Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: Couldn't open Lucene index at 
Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: No Lucene index for query term weighting, so all query terms will have same weight.
Didn't find vector for 'Eles'
No vector for 'Eles'
Didn't find vector for '??'
No vector for '??'
Jun 26, 2012 11:20:55 AM pitt.search.semanticvectors.CompareTerms main
INFO: Outputting similarity of "Eles" with "??" ...

But if I run the same command from the terminal, I get:

Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: Opened query vector store from file: /Users/feralvam/termvectors.bin
Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: Couldn't open Lucene index at 
Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: No Lucene index for query term weighting, so all query terms will have same weight.
Didn't find vector for 'Eles'
No vector for 'Eles'
Found vector for 'é'
Jun 26, 2012 11:30:26 AM pitt.search.semanticvectors.CompareTerms main
INFO: Outputting similarity of "Eles" with "é" ...

Leaving aside how SemanticVector works, the problem is that in the second case "term2" is passed with the correct encoding, but that doesn't happen in the first case.

Now, using this command:

print locale.getpreferredencoding(), sys.getdefaultencoding()

I get the following information: US-ASCII utf-8 (in PyDev) and UTF-8 ascii (in terminal)

So what I think is happening is that it's using the US-ASCII encoding for passing the arguments and, therefore, the result is wrong because the words don't have the proper encoding. By the way, I'm using python 2.7.

Is there any way to change this?

Thanks in advance of any help you could give.

解决方案

You can pass the locale name in the LANG environment variable when you starts the process. Make something like:

env = os.environ.copy()
env['LANG'] = 'en_US.UTF-8'
subprocess.check_output( ..., env = env)

这篇关于在python中设置区域编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆