Python将奇怪的Unicode写入CSV [英] Python Writing Weird Unicode to CSV

查看:107
本文介绍了Python将奇怪的Unicode写入CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python newspaper3k 包提取文章信息,然后将其写入CSV文件.正确下载信息后,我的CSV输出出现问题.尽管我努力阅读unicode,但我认为我并不完全了解unicode.

I'm attempting to extract article information using the python newspaper3k package and then write to a CSV file. While the info is downloaded correctly, I'm having issues with the output to CSV. I don't think I fully understand unicode, despite my efforts to read about it.

from newspaper import Article, Source
import csv

first_article = Article(url="http://www.bloomberg.com/news/articles/2016-09-07/asian-stock-futures-deviate-as-s-p-500-ends-flat-crude-tops-46")

first_article.download()
if first_article.is_downloaded:
    first_article.parse()
    first_article.nlp

article_array = []
collate = {}

collate['title'] = first_article.title
collate['content'] = first_article.text
collate['keywords'] = first_article.keywords
collate['url'] = first_article.url
collate['summary'] = first_article.summary
print(collate['content'])
article_array.append(collate)

keys = article_array[0].keys()
with open('bloombergtest.csv', 'w') as output_file:
    csv_writer = csv.DictWriter(output_file, keys)
    csv_writer.writeheader()
    csv_writer.writerows(article_array)

output_file.close()

当我打印collat​​e ['content'](它是first_article.text)时,控制台会很好地输出文章的内容.一切正确显示,撇号和所有.当我写CVS时,内容单元格文本中包含奇数字符.例如:

When I print collate['content'], which is first_article.text, the console outputs the article's content just fine. Everything shows up correctly, apostrophes and all. When I write to the CVS, the content cell text has odd characters in it. For example:

归根结底,欧洲的经济状况不佳,通货膨胀看起来并不令人振奋,并且存在许多政治风险可以忽略.

“At the end of the day, Europe’s economy isn’t in great shape, inflation doesn’t look exciting and there are a bunch of political risks to reckon with.

到目前为止,我已经尝试过:

So far I have tried:

with open('bloombergtest.csv', 'w', encoding='utf-8') as output_file:

无济于事.我也尝试使用utf-16而不是8,但这只是导致单元以奇怪的顺序写入.尽管输出看起来正确,但是它没有在CSV中正确创建单元格.我也尝试过.encode('utf-8')是各种变量,但没有任何效果.

to no avail. I also tried utf-16 instead of 8, but that just resulted in the cells writing in an odd order. It didn't create the cells correctly in the CSV, although the output looked correct. I've also tried .encode('utf-8') are various variable but nothing has worked.

这是怎么回事?在CSV文件包含奇数字符的情况下,为什么控制台会正确打印文本?我怎样才能解决这个问题?

What's going on? Why would the console print the text correctly, while the CSV file has odd characters? How can I fix this?

推荐答案

这可能是您用来打开或打印CSV文件的软件的问题-它无法理解" CSV是否以UTF- 8,并假定使用ASCII,latin-1,ISO-8859-1或类似的编码.

That's most probably a problem with the software that you use to open or print the CSV file - it doesn't "understand" that CSV is encoded in UTF-8 and assumes ASCII, latin-1, ISO-8859-1 or a similar encoding for it.

通过在文件的开头放置BOM序列,您可以帮助该软件识别CSV文件的编码(通常不建议将其用于UTF-8).

You can aid that software in recognizing the CSV file's encoding by placing a BOM sequence in the beginning of your file (which, in general, is not recommended for UTF-8).

这篇关于Python将奇怪的Unicode写入CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆