使用Pentaho Kettle，如何在保持参照完整性的同时从单个表中加载多个表? [英] Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?

查看：133 发布时间：2020/4/25 11:45:24 mysql etl pentaho kettle

本文介绍了使用Pentaho Kettle，如何在保持参照完整性的同时从单个表中加载多个表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

需要将具有100,000条以上记录的单个文件中的数据加载到MySQL上的多个表中，以保持文件/表中定义的关系；表示关系已经匹配.该解决方案应在最新版本的MySQL上运行，并需要使用InnoDB引擎； MyISAM不支持外键.

Need to load data from a single file with a 100,000+ records into multiple tables on MySQL maintaining the relationships defined in the file/tables; meaning the relationships already match. The solution should work on the latest version of MySQL, and needs to use the InnoDB engine; MyISAM does not support foreign keys.

对于使用Pentaho数据集成(又名Kettle)，我是一个全新的人，任何指针都将不胜感激.

I am a completely new to using Pentaho Data Integration (aka Kettle) and any pointers would be appreciated.

我可能还要补充一点，即必须禁用外键约束.据我了解，如果数据库的参照完整性有问题，那么当重新打开外键约束时，MySQL将不会检查参照完整性. 来源: 5.1.4.服务器系统变量- foreign_key_checks

I might add that it is a requirement that the foreign key constraints are NOT disabled. Since it's my understanding that if there is something wrong with the database's referential integrity, MySQL will not check for referential integrity when the foreign key constraints are turned back on. SOURCE: 5.1.4. Server System Variables -- foreign_key_checks

所有方法都应包括一些验证和回滚策略，以防插入失败或无法保持参照完整性.

All approaches should include some from of validation and a rollback strategy should an insert fail, or fail to maintain referential integrity.

再次对此完全陌生，如果您有任何疑问或要求澄清，我们会尽力提供尽可能多的信息-请让我知道.

Again, completely new to this, and doing my best to provide as much information as possible, if you have any questions, or request for clarification -- just let me know.

如果您能够从将为超级"的kjb和ktr文件(作业/转换)中发布XML.甚至可能追捕您在任何地方发表的每条评论/答案，并对其进行投票... :-) ...真的，对于我来说，找到答案的确很重要.

If you are able to post the XML from the kjb and ktr files (jobs/transformations) that would be SUPER. Might even hunt down every comment/answer you've every made anywhere and up vote them... :-) ...really, it's really important to me to find an answer for this.

谢谢！

样本数据:为了更好地说明一个示例，假设我正在尝试加载一个包含员工姓名，他们过去所居住的办公室以及其职位名称历史(由制表符分隔)的文件

SAMPLE DATA: To better elaborate with an example, lets assume I am trying to load a file containing employee name, the offices they have occupied in the past and their Job title history separated by a tab.

文件:

EmployeeName<tab>OfficeHistory<tab>JobLevelHistory
John Smith<tab>501<tab>Engineer
John Smith<tab>601<tab>Senior Engineer
John Smith<tab>701<tab>Manager
Alex Button<tab>601<tab>Senior Assistant
Alex Button<tab>454<tab>Manager

注意::单个表数据库已完全规范化(可能与单个表一样多)，例如，对于"John Smith"，只有一个John Smith；意味着没有重复会导致参照完整性冲突.

NOTE: The single table database is completely normalized (as much as a single table may be) -- and for example, in the case of "John Smith" there is only one John Smith; meaning there are no duplicates that would lead to conflicts in referential integrity.

MyOffice数据库架构具有以下表:

The MyOffice database schema has the following tables:

Employee (nId, name)
Office (nId, number)
JobTitle (nId, titleName)
Employee2Office (nEmpID, nOfficeId)
Employee2JobTitle (nEmpId, nJobTitleID)

所以在这种情况下.表格应如下所示:

So in this case. the tables should look like:

Employee
1 John Smith
2 Alex Button

Office
1 501
2 601
3 701
4 454

JobTitle
1 Engineer
2 Senior Engineer
3 Manager
4 Senior Assistant

Employee2Office
1 1
1 2
1 3
2 2
2 4

Employee2JobTitle
1 1
1 2
1 3
2 4
2 3

以下是用于创建数据库和表的MySQL DDL:

create database MyOffice2;

use MyOffice2;

CREATE TABLE Employee (
      id MEDIUMINT NOT NULL AUTO_INCREMENT,
      name CHAR(50) NOT NULL,
      PRIMARY KEY (id)
    ) ENGINE=InnoDB;

CREATE TABLE Office (
  id MEDIUMINT NOT NULL AUTO_INCREMENT,
  office_number INT NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE JobTitle (
  id MEDIUMINT NOT NULL AUTO_INCREMENT,
  title CHAR(30) NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE Employee2JobTitle (
  employee_id MEDIUMINT NOT NULL,
  job_title_id MEDIUMINT NOT NULL,
  FOREIGN KEY (employee_id) REFERENCES Employee(id),
  FOREIGN KEY (job_title_id) REFERENCES JobTitle(id),
  PRIMARY KEY (employee_id, job_title_id)
) ENGINE=InnoDB;

CREATE TABLE Employee2Office (
  employee_id MEDIUMINT NOT NULL,
  office_id MEDIUMINT NOT NULL,
  FOREIGN KEY (employee_id) REFERENCES Employee(id),
  FOREIGN KEY (office_id) REFERENCES Office(id),
  PRIMARY KEY (employee_id, office_id)
) ENGINE=InnoDB;

我对选定答案的回答:

PREP:

My Notes in Response to Selected Answer:

PREP:

(a)使用示例数据，通过将<TAB>更改为逗号分隔来创建CSV.
(b)安装MySQL并使用MySQL DDL示例创建示例数据库
(c)安装Kettle(它是基于Java的，并且可以在运行Java的任何程序上运行)
(d)下载KTR文件

(a) Use the sample data, create a CSV by changing <TAB> to comma delimited.
(b) Install MySQL and create sample database using the MySQL DDL sample
(c) Install Kettle (it's Java based and will run on anything that runs Java)
(d) Download KTR file

分步数据流:(我的笔记)

在Kettle中打开KTR文件，然后双击"CSV文件输入"并浏览到您创建的CSV文件.分隔符应该已经设置为逗号.然后单击确定".
双击插入员工"，然后选择数据库连接器，然后按照中的说明进行操作创建一个新的数据库连接

使用Pentaho Kettle，如何在保持参照完整性的同时从单个表中加载多个表? [英] Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?

问题描述

我对选定答案的回答:

My Notes in Response to Selected Answer:

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

使用Pentaho Kettle，如何在保持参照完整性的同时从单个表中加载多个表? [英] Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?

问题描述

我对选定答案的回答:

My Notes in Response to Selected Answer:

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭