生成表模式检查Excel(CSV)和导入数据 [英] Generate table schema inspecting Excel(CSV) and import data
本文介绍了生成表模式检查Excel(CSV)和导入数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何创建一个MYSQL表模式检查Excel(或CSV)文件。
任务是否有任何准备的Python库?
How would I go around creating a MYSQL table schema inspecting an Excel(or CSV) file. Are there any ready Python libraries for the task?
列标题将被清理为列名。将根据电子表格列的内容估算数据类型。完成后,数据将被加载到表中。
Column headers would be sanitized to column names. Datatype would be estimated based on the contents of the spreadsheet column. When done, data would be loaded to the table.
我有一个约200列的Excel文件,我想开始规范化。
I have an Excel file of ~200 columns that I want to start normalizing.
推荐答案
仅仅对于(my)引用,我在下面记录了我做了什么:
Just for (my) reference, I documented below what I did:
- XLRD是实用的,但我刚刚将Excel数据保存为CSV,因此我可以使用
LOAD DATA INFILE
- 我已复制标题行并开始编写导入和规范化脚本
- 脚本会:
CREATE TABLE
- 查询mysql:
LOAD DATA LOCAL INFILE
将所有CSV数据加载到TEXT字段。 - 基于
PROCEDURE ANALYZE
的输出,我能够ALTER TABLE
给列正确的类型和长度。PROCEDURE ANALYZE
对于具有少量不同值的任何列返回ENUM
,这不是我需要的,以后进行归一化。使用PROCEDURE ANALYZE
可以轻松地进行眼球200柱。输出PhpMyAdmin建议的表结构是垃圾。 - 我写了一些标准化,主要使用
SELECT DISTINCT
INSERT 将结果分隔到不同的表。我已经添加到旧表的FK第一列。刚刚在INSERT
之后,我有它的ID和UPDATE
ed FK列。当循环完成后,我删除旧的列只留下FK列。类似地,具有多个从属列。 - 我跑了(django)
python manage.py inspctdb
,复制输出到models.py,添加所有ForeignkeyField
作为FKs在MyISAM上不存在。写了一些python views.py,urls.py,几个模板... TADA
- XLRD is practical, however I've just saved the Excel data as CSV, so I can use
LOAD DATA INFILE
- I've copied the header row and started writing the import and normalization script
- Script does:
CREATE TABLE
with all columns as TEXT, except for Primary key - query mysql:
LOAD DATA LOCAL INFILE
loading all CSV data into TEXT fields. - based on the output of
PROCEDURE ANALYSE
, I was able toALTER TABLE
to give columns the right types and lengths.PROCEDURE ANALYSE
returnsENUM
for any column with few distinct values, which is not what I needed, but I found that useful later for normalization. Eye-balling 200 columns was a breeze withPROCEDURE ANALYSE
. Output from PhpMyAdmin propose table structure was junk. - I wrote some normalization mostly using
SELECT DISTINCT
on columns andINSERT
ing results to separate tables. I have added to the old table a column for FK first. Just after theINSERT
, I've got its ID andUPDATE
ed the FK column. When loop finished I've dropped old column leaving only FK column. Similarly with multiple dependent columns. It was much faster than I expected. - I ran (django)
python manage.py inspctdb
, copied output to models.py and added all thoseForeignkeyField
s as FKs do not exist on MyISAM. Wrote a little python views.py, urls.py, few templates...TADA
这篇关于生成表模式检查Excel(CSV)和导入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文