导出 pandas 数据框,文本列包含utf-8文本和URL到Excel [英] Export Pandas data frame with text column containg utf-8 text and URLs to Excel
问题描述
我想使用 .to_csv
或 .to_excel
这些都是由熊猫提供的,但我无法使其正常工作。
当我使用 .to_csv
时,我的问题是它在数据框的文本部分保持失败。我已经玩过不同的分隔符,但文件从来没有100%对齐。文本列似乎包含混淆Excel的选项卡,管道字符等。
df.to_csv('test.csv' sep ='\t',encoding ='utf-8')
当我尝试使用 .to_excel
连同 xlsxwriter
引擎我遇到了一个不同的问题,这就是我的文本列包含很多网址(我想)。 xlswriter
尝试制作这些URL的特殊点击链接,而不是将其作为字符串处理。我已经找到了一些关于如何规避这些信息的信息,但是我再也无法让它工作。
应该使用以下代码来禁用功能,我认为是造成麻烦:
workbook = xlsxwriter.Workbook(filename,{'strings_to_urls':False})
但是,当使用 to_excel
在将数据框加载到Excel文件之前,要调整此工作簿对象的设置。
简而言之,如何从Pandas数据导出具有极大变化文本的列框架到Excel了解的东西?
编辑:
示例:
@geertwilderspvv @telegraaf ach Wilders toch,nep-voorzitter van een nep-partij met maaréénlid,\\\
zeur niet over nep-premier of parlement!
所以在这种情况下显然是我的数据线路刹车。我会尝试找到更多的例子。
edit2:
<?xml version =1.0encoding =UTF-8standalone =yes?>
< recoveryLog xmlns =http://schemas.openxmlformats.org/spreadsheetml/2006/main>< logFileName> error047600_01.xml< / logFileName>< summary> Er zijn fouten aangetroffen in bestand C :\Users\Guy Mahieu\Desktop\Vu ipython notebook\pandas_simple.xlsx< / summary>< removedRecords summary =Hier volgt een lijst van verwijderde records:>< removedRecord> Verwijderde records:Formule货车
荷兰语翻译:
错误被发现在文件。以下是删除记录的列表:已删除的记录:部分的公式/xl/worksheets/sheet1.xml
不能认为目前可以通过Pandas API传递XlsxWriter构造函数,但可以解决 strings_to_url
问题,如下所示:
import pandas as pd
df = pd.DataFrame({'Data':['http://python.org']})
#使用XlsxWriter作为引擎创建一个Pandas Excel writer。
writer = pd.ExcelWriter('pandas_simple.xlsx',engine ='xlsxwriter')
#不要将类似url的字符串转换为urls。
writer.book.strings_to_urls = False
#将数据框转换为XlsxWriter Excel对象。
df.to_excel(writer,sheet_name ='Sheet1')
#关闭Pandas Excel编写器并输出Excel文件。
writer.save()
更新:在最近的版本的熊猫,您可以直接将XlsxWriter构造函数选项传递给 ExcelWriter()
,而不需要设置 writer.book.strings_to_urls
间接地:
writer = pd.ExcelWriter('pandas_simple.xlsx',
engine ='xlsxwriter',
options = {'strings_to_urls':False})
My Pandas data frame consists of Tweets and meta data of each tweet (300.000 rows). Some of my colleagues need to work with this data in Excel which is why I need to export it.
I wanted to use either .to_csv
or .to_excel
which are both provided by Pandas but I can't get it to work properly.
When I use .to_csv
my problem is that it keeps failing in the text part of the data frame. I've played around with different separators but the file is never 100% aligned. The text column seems to contain tabs, pipe characters etc. which confuses Excel.
df.to_csv('test.csv', sep='\t', encoding='utf-8')
When I try to use .to_excel
together with the xlsxwriter
engine I'm confronted with a different problem, which is that my text column contains to many URLs (I think). xlswriter
tries to make special clickable links of these URLs instead of just handling them as strings. I've found some information on how to circumvent this but, again, I can't get it to work.
The following bit of code should be used to disable the function that I think is causing trouble:
workbook = xlsxwriter.Workbook(filename, {'strings_to_urls': False})
However, when using to_excel
I can't seem to adjust this setting of the Workbook object before I load the data frame into the Excel file.
In short how do I export a column with wildly varying text from a Pandas data frame to something that Excel understands?
edit: example:
@geertwilderspvv @telegraaf ach Wilders toch, nep-voorzitter van een nep-partij met maar één lid, \nzeur niet over nep-premier of parlement!
So in this case It is obviously a line brake that is my data. I will try to find some more examples.
edit2:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>error047600_01.xml</logFileName><summary>Er zijn fouten aangetroffen in bestand C:\Users\Guy Mahieu\Desktop\Vu ipython notebook\pandas_simple.xlsx</summary><removedRecords summary="Hier volgt een lijst van verwijderde records:"><removedRecord>Verwijderde records: Formule van het onderdeel /xl/worksheets/sheet1.xml</removedRecord></removedRecords></recoveryLog>
Translation of Dutch stuff:
Errors were found in "file". Here follows a list of removed records: removed records: formula of the part /xl/worksheets/sheet1.xml
I don't think it is currently possible to pass XlsxWriter constructor options via the Pandas API but you can workaround the strings_to_url
issue as follows:
import pandas as pd
df = pd.DataFrame({'Data': ['http://python.org']})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Don't convert url-like strings to urls.
writer.book.strings_to_urls = False
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Update: In recent version of Pandas you can pass XlsxWriter constructor options to ExcelWriter()
directly and you do not need to set writer.book.strings_to_urls
indirectly:
writer = pd.ExcelWriter('pandas_simple.xlsx',
engine='xlsxwriter',
options={'strings_to_urls': False})
这篇关于导出 pandas 数据框,文本列包含utf-8文本和URL到Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!