Python加载UTF-8 JSON [英] Python Load UTF-8 JSON

查看:276
本文介绍了Python加载UTF-8 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下JSON(为了简单起见,我只使用一个,但在现实中有100个条目):

  {
Active:false,
Book:US Derivat。London,MikeÜbersax/ Michael Jealous,
ExpirationDate:2006-10-12,
Isin:CH0013096497,
IssueDate:2001-10-09,
KbForXMonths:0,
KbPeriodDay ,
KbType:Prozent,
KbYear:0.5,
Keyinvest_IssueRetro:0.50%,
Keyinvest_RecurringRetro:1.00%pro
Keyinvest_RetroPayment:每月,
LastImportDate:2008-12-31,
LiberierungDate:1900-01-01,
b $ bNominalCcy:USD,
NominalStueck:5,000,
PrimaryCCR:0,
QuoteType:Nominal,
RealValor:0,
Remarks:,
RwbeProductId_CCR:034900,
RwbeProductId_EFS:034900,
SecName :Cliquet GROI on Nasdaq,
SecType:EQ,
SubscriptionEndDate:1900-01-01,
TerminationDate: 19,
TradingCcy:USD,
Valor:1309649
}

$ b b

我试图读取这个JSON为了保存为.csv(以便我可以将其导入到数据库)



但是当我试图写这个JSON数据作为csv像这样:

 与codecs.open('EFSDUMP.csv' ,'w','utf-8-sig')as csv_file:
content_writer = csv.writer(csv_file,delimiter =',')
content_writer.writerow(data.values $ b

我得到一个错误:

  UnicodeEncodeError:'ascii'编解码器无法编码字符u'\xdc'在位置25:序数不在范围内(128)

这是因为JSON中有一个变音符号(见属性Book)。



JSON像这样:

  data = json.loads(open('EFSDUMP.json')。read()。decode utf-8-sig'))

有趣的是:

 列印资料

这个:

  {u'PrimaryCCR':u'0',u'SecType':u'EQ',u'Valor ':1309649,u'KbType':u'Prozent',u'Book':u'US Derivat。 London,Mike \xdcbersax / Michael Jealous',u'Keyinvest_RecurringRetro':u'1.00%pro rata temporis',u'TerminationDate':u'2003-10-19',u'RwbeProductId_CCR':u'034900',u 'subscriptionEndDate':u'1900-01-01',u'ExpirationDate':u'2006-10-12',u'Keyinvest_RetroPayment':u'Every month',u'Keyinvest_IssueRetro':u'0.50%',u 'QuoteType':u'Nominal',u'KbYear':u'0.5',u'LastImportDate':u'2008-12-31',u'Remarks':u'',u'RealValor':u'0 ',u'SecName':u'Cliquet GROI on Nasdaq',u'Active':False,u'KbPeriodDay':u'Period',u'Isin':u'CH0013096497',u'LiberierungDate':u'1900 -01-01',u'IssueDate':u'2001-10-09',u'KbForXMonths':u'0',u'NominalCcy':u'USD',u'RwbeProductId_EFS':u'034900', u'TradingCcy':u'USD',u'NominalStueck':u'5,000'} 

umlaut变成了一个'\xdc'



但是当我这样做:

 打印数据['Book'] 

含义我直接访问属性,

  US Derivat。伦敦,迈克Übersax/迈克尔嫉妒

所以umlaut是一个实际的umlaut。



我很确定JSON是没有BOM的UTF-8(Notepad ++声称这样)



我已经尝试了所有的这里的建议没有任何成功:
Python加载json文件与UTF-8 BOM头



如何正确读取UTF-8 JSON文件,以便能够将其写为.csv?



任何帮助都非常感谢。



Python版本:2.7.2


<在Python 2中, csv 模块不支持编写Unicode。你需要在这里手动编码,否则你的Unicode值是使用ASCII编码的(这就是为什么你得到编码异常)。



这也意味着你需要手动写入UTF-8 BOM,但只有当你真的需要它。 UTF-8只能以单向写入,不需要字节顺序标记来读取UTF-8文件。 Microsoft喜欢将其添加到文件中,以使检测文件编码的任务对他们的工具更容易,但是UTF-8 BOM实际上可能使其他工具更难以正常工作,因为它们不会忽略额外的初始字符。 p>

使用:

  with open('EFSDUMP.csv','wb 'as as csv_file:
csv_file.write(codecs.BOM_UTF8)
content_writer = csv.writer(csv_file)
content_writer.writerow([unicode(v).encode('utf8')for v in data.values()])

注意这将写入你的值)顺序。 unicode()调用将在编码之前将非字符串类型转换为unicode字符串。



你已经加载JSON数据就好了。

$ c $ b。

I have the following JSON (for simplicity's sake I'll only use one but there are 100 entries in reality):

{
    "Active": false, 
    "Book": "US Derivat. London, Mike Übersax/Michael Jealous", 
    "ExpirationDate": "2006-10-12", 
    "Isin": "CH0013096497", 
    "IssueDate": "2001-10-09", 
    "KbForXMonths": "0", 
    "KbPeriodDay": "Period", 
    "KbType": "Prozent", 
    "KbYear": "0.5", 
    "Keyinvest_IssueRetro": "0.50%", 
    "Keyinvest_RecurringRetro": "1.00% pro rata temporis", 
    "Keyinvest_RetroPayment": "Every month", 
    "LastImportDate": "2008-12-31", 
    "LiberierungDate": "1900-01-01", 
    "NominalCcy": "USD", 
    "NominalStueck": "5,000", 
    "PrimaryCCR": "0", 
    "QuoteType": "Nominal", 
    "RealValor": "0", 
    "Remarks": "", 
    "RwbeProductId_CCR": "034900", 
    "RwbeProductId_EFS": "034900", 
    "SecName": "Cliquet GROI on Nasdaq", 
    "SecType": "EQ", 
    "SubscriptionEndDate": "1900-01-01", 
    "TerminationDate": "2003-10-19", 
    "TradingCcy": "USD", 
    "Valor": 1309649
}

I'm trying to read this JSON in order to save it as a .csv (so that I can import it into a database)

However when i try to write this JSON data as a csv like so:

with codecs.open('EFSDUMP.csv', 'w', 'utf-8-sig') as csv_file:
    content_writer = csv.writer(csv_file, delimiter=',')
    content_writer.writerow(data.values())

I get an error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 25: ordinal not in range(128)

That is because there's an umlaut in the JSON (see attribute "Book").

I try to read the JSON like this:

data = json.loads(open('EFSDUMP.json').read().decode('utf-8-sig'))

What's interesting is that this:

print data

Gives me this:

{u'PrimaryCCR': u'0', u'SecType': u'EQ', u'Valor': 1309649, u'KbType': u'Prozent', u'Book': u'US Derivat. London, Mike \xdcbersax/Michael Jealous', u'Keyinvest_RecurringRetro': u'1.00% pro rata temporis', u'TerminationDate': u'2003-10-19', u'RwbeProductId_CCR': u'034900', u'SubscriptionEndDate': u'1900-01-01', u'ExpirationDate': u'2006-10-12', u'Keyinvest_RetroPayment': u'Every month', u'Keyinvest_IssueRetro': u'0.50%', u'QuoteType': u'Nominal', u'KbYear': u'0.5', u'LastImportDate': u'2008-12-31', u'Remarks': u'', u'RealValor': u'0', u'SecName': u'Cliquet GROI on Nasdaq', u'Active': False, u'KbPeriodDay': u'Period', u'Isin': u'CH0013096497', u'LiberierungDate': u'1900-01-01', u'IssueDate': u'2001-10-09', u'KbForXMonths': u'0', u'NominalCcy': u'USD', u'RwbeProductId_EFS': u'034900', u'TradingCcy': u'USD', u'NominalStueck': u'5,000'}

Clearly the umlaut became a '\xdc'

However when I do this:

print data['Book']

Meaning I access the attribute directly, I get:

US Derivat. London, Mike Übersax/Michael Jealous

So the umlaut is an actual umlaut again.

I'm pretty sure that the JSON is UTF-8 without BOM (Notepad++ claims so)

I have already tried all of the suggestions here without any success: Python load json file with UTF-8 BOM header

How can I properly read the UTF-8 JSON file in order to be able to write it as .csv?

Any help is greatly appreciated.

Python version: 2.7.2

解决方案

In Python 2, the csv module does not support writing Unicode. You need to encode it manually here, as otherwise your Unicode values are encoded for you using ASCII (which is why you got the encoding exception).

This also means you need to write the UTF-8 BOM manually, but only if you really need it. UTF-8 can only be written one way, a Byte Order Mark is not needed to read UTF-8 files. Microsoft likes to add it to files to make the task of detecting file encodings easier for their tools, but the UTF-8 BOM may actually make it harder for other tools to work correctly as they won't ignore the extra initial character.

Use:

with open('EFSDUMP.csv', 'wb') as csv_file:
    csv_file.write(codecs.BOM_UTF8)
    content_writer = csv.writer(csv_file)
    content_writer.writerow([unicode(v).encode('utf8') for v in data.values()])

Note that this'll write your values in arbitrary (dictionary) order. The unicode() call will convert non-string types to unicode strings first before encoding.

To be explicit: you've loaded the JSON data just fine. It is the CSV writing that failed for you.

这篇关于Python加载UTF-8 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆