生成表模式检查Excel(CSV)和导入数据 [英] Generate table schema inspecting Excel(CSV) and import data

查看:186
本文介绍了生成表模式检查Excel(CSV)和导入数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何创建一个MYSQL表模式检查Excel(或CSV)文件。
任务是否有任何准备的Python库?

How would I go around creating a MYSQL table schema inspecting an Excel(or CSV) file. Are there any ready Python libraries for the task?

列标题将被清理为列名。将根据电子表格列的内容估算数据类型。完成后,数据将被加载到表中。

Column headers would be sanitized to column names. Datatype would be estimated based on the contents of the spreadsheet column. When done, data would be loaded to the table.

我有一个约200列的Excel文件,我想开始规范化。

I have an Excel file of ~200 columns that I want to start normalizing.

推荐答案

仅仅对于(my)引用,我在下面记录了我做了什么:

Just for (my) reference, I documented below what I did:


  1. XLRD是实用的,但我刚刚将Excel数据保存为CSV,因此我可以使用 LOAD DATA INFILE

  2. 我已复制标题行并开始编写导入和规范化脚本

  3. 脚本会: CREATE TABLE

  4. 查询mysql: LOAD DATA LOCAL INFILE 将所有CSV数据加载到TEXT字段。

  5. 基于 PROCEDURE ANALYZE 的输出,我能够 ALTER TABLE 给列正确的类型和长度。 PROCEDURE ANALYZE 对于具有少量不同值的任何列返回 ENUM ,这不是我需要的,以后进行归一化。使用 PROCEDURE ANALYZE 可以轻松地进行眼球200柱。输出PhpMyAdmin建议的表结构是垃圾。

  6. 我写了一些标准化,主要使用 SELECT DISTINCT INSERT 将结果分隔到不同的表。我已经添加到旧表的FK第一列。刚刚在 INSERT 之后,我有它的ID和 UPDATE ed FK列。当循环完成后,我删除旧的列只留下FK列。类似地,具有多个从属列。

  7. 我跑了(django) python manage.py inspctdb ,复制输出到models.py,添加所有 ForeignkeyField 作为FKs在MyISAM上不存在。写了一些python views.py,urls.py,几个模板... TADA

  1. XLRD is practical, however I've just saved the Excel data as CSV, so I can use LOAD DATA INFILE
  2. I've copied the header row and started writing the import and normalization script
  3. Script does: CREATE TABLE with all columns as TEXT, except for Primary key
  4. query mysql: LOAD DATA LOCAL INFILE loading all CSV data into TEXT fields.
  5. based on the output of PROCEDURE ANALYSE, I was able to ALTER TABLE to give columns the right types and lengths. PROCEDURE ANALYSE returns ENUM for any column with few distinct values, which is not what I needed, but I found that useful later for normalization. Eye-balling 200 columns was a breeze with PROCEDURE ANALYSE. Output from PhpMyAdmin propose table structure was junk.
  6. I wrote some normalization mostly using SELECT DISTINCT on columns and INSERTing results to separate tables. I have added to the old table a column for FK first. Just after the INSERT, I've got its ID and UPDATEed the FK column. When loop finished I've dropped old column leaving only FK column. Similarly with multiple dependent columns. It was much faster than I expected.
  7. I ran (django) python manage.py inspctdb, copied output to models.py and added all those ForeignkeyFields as FKs do not exist on MyISAM. Wrote a little python views.py, urls.py, few templates...TADA

这篇关于生成表模式检查Excel(CSV)和导入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆