在导入时,openpyxl请不要以文本为数字 [英] openpyxl please do not assume text as a number when importing

查看:235
本文介绍了在导入时,openpyxl请不要以文本为数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于如何阻止Excel将文本解释为数字,或者如何使用openpyxl输出数字格式,存在很多问题,但是我没有看到解决此问题的任何方法:

There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I haven't seen any solutions to this problem:

我有其他人给我的Excel电子表格,所以我没有创建它.当我使用Excel打开文件时,我会看到某些值,例如"5E12"(克隆号,如果有人在乎的话),这些值似乎可以正确显示,但是每个值旁边都有一个绿色箭头,警告我这似乎是一个数字".存储为文本".然后,Excel询问我是否要将其转换为数字,如果看到是,我将得到5000000000000,然后它将自动转换为科学计数法并再次显示5E12,仅这次文本输出将显示带零的完整数字.请注意,在转换之前,这实际上是文本,甚至是Excel,也只是警告/提供将其转换的提示.

I have an Excel spreadsheet given to me by someone else, so I did not create it. When I open the file with Excel, I have certain values like "5E12" (clone numbers, if anyone cares) that appear to display correctly, but there's a little green arrow next to each one warning me that "This appears to be a number stored as text". Excel then asks me if I would like to convert it to a number, and if I saw yes, I get 5000000000000, which then converts automatically to scientific notation and displays 5E12 again, only this time a text output would show the full number with zeroes. Note that before the conversion, this really is text, even to Excel, and I'm only being warned/offered to convert it.

因此,当使用openpyxl(从openpyxl.reader.excel import load_workbook)读取该文件时,5E12会自动转换为5000000000000.提示或输入.

So, when reading this file in with openpyxl (from openpyxl.reader.excel import load_workbook), the 5E12 is getting converted automatically to 5000000000000. I assume that openpyxl is making the same assumption that Excel made, only the conversion happens without a prompt or input on my part.

如何防止这种情况发生?我不希望将看起来像数字存储为文本"的文本转换为数字.除非我这么说,否则它们都是文字.

How can I prevent this from happening? I do not want text that look like "numbers stored as text" to convert to numbers. They are text unless I say so.

到目前为止,我发现的唯一解决方案是在每个单元格的前面添加单引号,但这不是理想的解决方案,因为它是人工操作,而不是程序化解决方案.另外,解决方案必须是通用的,因为我并不总是知道此问题可能在哪里发生(我每天要阅读数百万行,所以我不想手动做任何事情).

So far, the only solution I have found is to add single quotes to the front of each cell, but this is not an ideal solution, as it's manual labor rather than a programmatic solution. Also, the solution needs to be general, since I don't always know where this problem might occur (I'm reading millions of lines per day, so I don't want to have to do anything by hand).

认为这是openpyxl的问题.从2011年初开始,有一个google小组讨论提到了这个问题,但是假设它太稀罕了. https://groups.google.com/forum/?fromgroups =#!topic/openpyxl-users/HZfpShMp8Tk

I think this is a problem with openpyxl. There is a google group discussion from the beginning of 2011 that mentions this problem, but assumes it's too rare to matter. https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/HZfpShMp8Tk

那么,有什么建议吗?

推荐答案

如果您想再次使用openpyxl(无论出于何种原因),对工作表阅读器例程的以下更改都可以使字符串保持字符串形式:

If you want to use openpyxl again (for whatever reason), the following changes to the worksheet reader routine do the trick of keeping the strings as strings:

diff --git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py

diff --git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py

--- a/openpyxl/reader/worksheet.py
+++ b/openpyxl/reader/worksheet.py
@@ -134,8 +134,10 @@
             data_type = element.get('t', 'n')
             if data_type == Cell.TYPE_STRING:
                 value = string_table.get(int(value))
-
-            ws.cell(coordinate).value = value
+                ws.cell(coordinate).set_value_explicit(value=value,
+                                                data_type=Cell.TYPE_STRING)
+            else:
+                ws.cell(coordinate).value = value

         # to avoid memory exhaustion, clear the item after use
         element.clear()

Cell.value是一个属性,在赋值调用Cell._set_value时会执行Cell.bind_value,该Cell.bind_value根据方法的文档:给出值,推断类型和显示选项".由于值的类型在XML文件中,因此应该采用这些值(这里我仅对字符串进行此操作),而不要执行智能"操作.

The Cell.value is a property and on assignment call Cell._set_value, which then does a Cell.bind_value which according to the method's doc: "Given a value, infer type and display options". As the types of the values are in the XML file those should be taken (here I only do that for strings) instead of doing something 'smart'.

从代码中可以看到,测试是否已经存在一个字符串.

As you can see from the code, the test whether it is a string was already there.

这篇关于在导入时,openpyxl请不要以文本为数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆