以编程方式将Pandas数据框转换为Markdown表 [英] Programmatically convert pandas dataframe to markdown table

查看:82
本文介绍了以编程方式将Pandas数据框转换为Markdown表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从数据库生成的Pandas Dataframe,其中包含混合编码的数据.例如:

I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+

如所见,英语和挪威语混合使用(我认为数据库中编码为ISO-8859-1).我需要以Markdown表的形式获取此Dataframe输出的内容,但不会出现编码问题.我遵循了这个答案(来自问题

As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
    items = (row['date'], 
             row['path'], 
             row['language'], 
             row['shortest_sentence'],
             row['longest_sentence'], 
             row['number_words'], 
             row['readability_consensus'])
    rows.append(items)

headings = ['Date', 
            'Path', 
            'Language',
            'Shortest Sentence', 
            'Longest Sentence since', 
            'Words',
            'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
         ('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)

但是,这会产生UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)错误.如何将数据框输出为Markdown表?也就是说,出于将该代码存储在文件中以用于编写Markdown文档的目的.我需要输出看起来像这样:

However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128) error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |

推荐答案

对,所以我摘自 Rohit ( Python-编码字符串-瑞典字母),扩展了<一个href ="https://stackoverflow.com/a/33187706/603387">他的答案,并提出了以下建议:

Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:

# Enforce UTF-8 encoding
import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('UTF-8')

# SQLite3 database
import sqlite3
# Pandas: Data structures and data analysis tools
import pandas as pd

# Read database, attach as Pandas dataframe
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']

# Parse Dataframe and apply Markdown, then save as 'table.md'
cols = df.columns
df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
df3 = pd.concat([df2, df])
df3.to_csv("table.md", sep="|", index=False)

一个重要的先兆是shortest_sentencelongest_sentence列不包含不必要的换行符,在提交到SQLite数据库之前对它们应用.replace('\n', ' ').replace('\r', '')可以删除这些换行符.看来解决方案不是强制执行特定于语言的编码(对于挪威语,是ISO-8859-1),而是使用了UTF-8而不是默认的ASCII.

An important precursor to this is that the shortest_sentence and longest_sentence columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '') to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1 for Norwegian), but rather that UTF-8 is used instead of the default ASCII.

我在IPython笔记本(Python 2.7.10)中运行了此操作,并得到了一个如下表(此处固定显示间距):

I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):

| Path                    | Language | Date       | Shortest Sentence                                                                            | Longest Sentence                                                                                                                                                                                                                                         | Words | Readability Consensus |
|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
| data/Eng/Something1.txt | Eng      | 2015-09-17 | I am able to relocate to London on short notice.                                             | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant.       | 306   | 11th and 12th grade   |
| data/Nor/NoeNorrønt.txt | Nor      | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe.                                     | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205   | 18th and 19th grade   |
| data/Nor/Ørret.txt.txt  | Nor      | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160   | 18th and 19th grade   |

因此,Markdown表没有编码问题.

Thus, a Markdown table without problems with encoding.

这篇关于以编程方式将Pandas数据框转换为Markdown表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆