如何在Python中读取Excel单元格并保留或检测其格式 [英] How to read excel cell and retain or detect its format in Python

查看:245
本文介绍了如何在Python中读取Excel单元格并保留或检测其格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一个包含一些文本格式的excel文件.有些可以粗体,一些斜体,一些是超级大写字母 1 ,以及其他一些格式(但不如上述三种). /p>

示例:

  • Ku'lah 2 ku.lah v definition :一些定义; 用法: 一些用法;
  • He'lahsa 2 he.lah.sa n ; definition :一些定义; 用法:一些用法;
  • 依此类推

现在,由于此单元格将作为字典( real ,人类,字典)数据库条目进行制作,因此我想保留该单元格的格式,因为这将有助于告知该单元格.单词的用法(如上述情况中的粗体表示单词类型: v (动词)和斜体表示新部分). /p>

但这全在excel单元格中.

当我尝试使用Toad for Oracle之类的数据库工具直接直接读取excel文件时,格式就消失了!

  1. 有什么方法可以读取Excel文件并保留格式?
  2. 或者,有什么方法可以检测格式吗?只要我们可以检测格式,我就可以简单地将文本替换为<b>v</b>之类的HTML格式,这将是我的工作.我只想知道我们如何在Python中保留或检测Excel单元格文本格式. (特别是以下三种格式:粗体,斜体和大写)

我尝试使用xlrd包获取文本格式,但是由于cell对象仅包含:ctypevaluexf_index.它没有有关文本格式的信息,以及当我使用formatting_info=True:

创建实例时的信息.

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)

我遇到以下错误:

NotImplementedError:formatting_info = True尚未实现

xlrd软件包的xlsx.py文件中通过此行引发:

if formatting_info:
    raise NotImplementedError("formatting_info=True not yet implemented")

我发现这很奇怪,因为我使用的是0.9.4 xlrd(最新版本)和文档表示,自0.6.1以上版本开始,格式信息已包括在内:

默认格式

默认格式应用于所有空单元格(未描述的单元格) 通过单元格记录).首先是行默认信息(ROW记录, 如果可用,则使用Rowinfo类).失败,列默认 如果可用,将使用信息(COLINFO记录,Colinfo类).作为一个 不得已时,将使用工作表/工作簿的默认单元格格式; 这应该始终存在于XF所描述的Excel文件中 以固定索引15(从0开始)进行记录.默认情况下,它使用 工作表/工作簿的默认单元格样式,由第一个XF描述 记录(索引0). xlrd版本未包含的格式化功能 0.6.1

富文本,即包含部分粗体斜体和下划线的字符串 文本,字符串中的字体更改等.请参见OOo文档s3.4和s3.2 亚洲拼音文字(称为红宝石"),用于日语假名假名.看 OOo docs s3.4.2(p15)条件格式.参见OOo docs s5.12, s6.21(CONDFMT记录),s6.16(CF记录) 和书籍级项目,例如打印版式,屏幕窗格.现代Excel 文件版本未将大多数内置的数字格式"保留在 文件; Excel根据用户的语言环境加载格式.目前 xlrd对此的模拟仅限于适用的硬接线表 到美国英语语言环境.这可能意味着货币符号,日期 顺序,千位分隔符,小数点分隔符等是不合适的. 请注意,这仅影响复制XLS文件的用户, 那些在视觉上渲染细胞的人.

我在这里犯了任何错误吗?我的代码如下所示:

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)


解决方案

我建议您使用库 xlrd https://github.com/python-excel/xlrd

您可以在此处找到有关如何使用xlrd确定字体样式的简单示例

它输出以下内容:

cell.xf_index is 62
type(fmt) is <class 'xlrd.formatting.XF'>
Dumped Info:
_alignment_flag: 0
_background_flag: 0
_border_flag: 0
_font_flag: 1
_format_flag: 0
_protection_flag: 0
alignment (XFAlignment object):
    hor_align: 0
    indent_level: 0
    rotation: 0
    shrink_to_fit: 0
    text_direction: 0
    text_wrapped: 0
    vert_align: 2
background (XFBackground object):
    background_colour_index: 65
    fill_pattern: 0
    pattern_colour_index: 64
border (XFBorder object):
    bottom_colour_index: 0
    bottom_line_style: 0
    diag_colour_index: 0
    diag_down: 0
    diag_line_style: 0
    diag_up: 0
    left_colour_index: 0
    left_line_style: 0
    right_colour_index: 0
    right_line_style: 0
    top_colour_index: 0
    top_line_style: 0
font_index: 6
format_key: 0
is_style: 0
lotus_123_prefix: 0
parent_style_index: 0
protection (XFProtection object):
    cell_locked: 1
    formula_hidden: 0
xf_index: 62

_font_flag: 1表示为粗体

I am given the an excel file which contains some text formatting. Some can be bold, some italic, some are supercase1, and some other formats (but not as many as the three mentioned).

Examples:

  • Ku'lah 2ku.lah v; definition: some def; usage: some usage;
  • He'lahsa 2he.lah.sa n; definition: some def; usage: some usage;
  • And so on

Now, since this cell is to be made as dictionary (real, human, dictionary) database entry, I would like to retain the format of the cell, as it will be beneficial to tell the usage of the word (such as bold in the above case indicating the word type: v (verb) and italic indicating new section).

But it is all in the excel cell.

When I try to simply read the excel file directly using database tool like Toad for Oracle, the format is gone!

  1. Is there any way to read the excel file and yet retain the format?
  2. Alternatively, is there any way to detect the formatting? As long as we can detect the format, I can simply replace the text with some HTML format like <b>v</b> and that will be my work. I only want to know how we retain or detect the excel cell text format in Python. (in particular are these three formats: bold, italic, and supercase)

Edit:

I try to get the text format with xlrd package, but I can't seem to find a way to get the text format style as the cell object only consists of: ctype, value, and xf_index. It has no info about the text format, and when I create the instance with the formatting_info=True:

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)

I got the following error:

NotImplementedError: formatting_info=True not yet implemented

Raised by this line in the xlsx.py file of the xlrd package:

if formatting_info:
    raise NotImplementedError("formatting_info=True not yet implemented")

Which I found it strange, since I use version 0.9.4 xlrd (latest) and the documentation says that since version above 0.6.1, the formatting info is included:

Default Formatting

Default formatting is applied to all empty cells (those not described by a cell record). Firstly row default information (ROW record, Rowinfo class) is used if available. Failing that, column default information (COLINFO record, Colinfo class) is used if available. As a last resort the worksheet/workbook default cell format will be used; this should always be present in an Excel file, described by the XF record with the fixed index 15 (0-based). By default, it uses the worksheet/workbook default cell style, described by the very first XF record (index 0). Formatting features not included in xlrd version 0.6.1

Rich text i.e. strings containing partial bold italic and underlined text, change of font inside a string, etc. See OOo docs s3.4 and s3.2 Asian phonetic text (known as "ruby"), used for Japanese furigana. See OOo docs s3.4.2 (p15) Conditional formatting. See OOo docs s5.12, s6.21 (CONDFMT record), s6.16 (CF record) Miscellaneous sheet-level and book-level items e.g. printing layout, screen panes. Modern Excel file versions don't keep most of the built-in "number formats" in the file; Excel loads formats according to the user's locale. Currently xlrd's emulation of this is limited to a hard-wired table that applies to the US English locale. This may mean that currency symbols, date order, thousands separator, decimals separator, etc are inappropriate. Note that this does not affect users who are copying XLS files, only those who are visually rendering cells.

Did I make any mistake here? My code is simply as shown:

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)


Edit 2:

The example shown in the post shows that it creates the class instance (book) with formatting_info=True. But I check it in my implementation. It raises the error above. Any idea?

解决方案

I suggest you the library xlrd https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966

On GitHub here https://github.com/python-excel/xlrd

You can find an easy example on how to use xlrd to determine the font style here Using XLRD module and Python to determine cell font style (italics or not)

Here a practical example:

from xlrd import open_workbook

path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()

It outputs the following:

cell.xf_index is 62
type(fmt) is <class 'xlrd.formatting.XF'>
Dumped Info:
_alignment_flag: 0
_background_flag: 0
_border_flag: 0
_font_flag: 1
_format_flag: 0
_protection_flag: 0
alignment (XFAlignment object):
    hor_align: 0
    indent_level: 0
    rotation: 0
    shrink_to_fit: 0
    text_direction: 0
    text_wrapped: 0
    vert_align: 2
background (XFBackground object):
    background_colour_index: 65
    fill_pattern: 0
    pattern_colour_index: 64
border (XFBorder object):
    bottom_colour_index: 0
    bottom_line_style: 0
    diag_colour_index: 0
    diag_down: 0
    diag_line_style: 0
    diag_up: 0
    left_colour_index: 0
    left_line_style: 0
    right_colour_index: 0
    right_line_style: 0
    top_colour_index: 0
    top_line_style: 0
font_index: 6
format_key: 0
is_style: 0
lotus_123_prefix: 0
parent_style_index: 0
protection (XFProtection object):
    cell_locked: 1
    formula_hidden: 0
xf_index: 62

Where _font_flag: 1 indicates that is Bold

这篇关于如何在Python中读取Excel单元格并保留或检测其格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆