将多个CSV文件加载到MySQL中 [英] Loading multiple CSV files into MySQL

查看:118
本文介绍了将多个CSV文件加载到MySQL中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我的团队进行指标项目.我必须将几个不同的报告加载到中央存储库中,然后根据这些数据创建表和报告.

I am working on a metrics project for my team. I have to load several different reports into a central repository and then create tables and reports off of this data.

数据来源为:

  1. CSV文件
  2. PDFs
  3. 临时/手动数据.

我在玩Talend和MySQL.对于如何加载CSV文件,我有些困惑.我是否应该具有目录集合以及一个或多个计划的任务来加载文件?

I was playing with Talend and MySQL. I am a little confused as to how to load the CSV files. Should I have a collection of directories and one or more scheduled tasks that load the files?

另一个想法是编写一个自定义文件处理器,该处理器将根据命名约定加载文件.你觉得呢?你有没有什么想法?

Another thought was to write a custom file processor that will load the file based on naming convention. What are your thoughts?

推荐答案

"PDF"非常复杂.因为它是pdf ...临时/手动数据"需要更多详细信息.
但是,如果我们只关注csv,而您的问题仅在我是对的情况下才与那些人有关,那么我将通过编写一个在mySQL DB中调用SP的应用程序来完成此任务,并将完整路径移交给csv(以及任何其他方法).使用关系数据库基本上是关于存储数据,并基于集合论"快速检索数据,而不是关心数据如何进入系统.

因此,在开始执行任何操作之前,请考虑以下问题:

"PDF" is something complicated. as it is pdf... "ad-hoc/manual data" needs much more details.
But if we focus on csv and your question is related to those guys only if I'm right, I'd do this by writing an app which calls an SP in my mySQL DB, handing over the full path to the csv (and any additional data, such as table's "user friendly name" if needed - or any other meta-data you'd like to store) which executes an import using mySQL Load Data.
The reason is, there can be many rules in the "business logic" after a csv was imported, and it's easier to maintain an app according to changing business requirements, than changing DB behavior all the time, and, if something goes terribly wrong my DB would be safe and only the "import manager app" fails - as I don't have to store neither that nor the csvs on the same system where my DB is.
DBs, relational DBs are basically about storing data, and retrieving data rapidly based on 'set theory', not about taking care of how the data gets into the system.

So think about these questions before you start implementing anything:

  • csv处理后会怎样?可以删除吗?应该移至一个已处理"文件夹?应该保留/留在原样吗?
  • 如果应该保留原样,我应该怎么做才能知道我已经处理了文件? (例如,设置准备存档"标志?触摸上次修改"日期并将其设置为1950.01.01?向文件中添加其他属性?
  • 如果csv导入失败(例如,文件中的无效数据,或者NULL值不应包含NULL的值),我该怎么办?显示错误?将csv标记为不可用?发送电子邮件?要移到"processing_failed"文件夹吗?
  • 如果输入文件夹中的文件数量激增该怎么办?
  • 如果业务逻辑更改或csv格式更改,我如何轻松地更改导入/流程/等?

,依此类推.考虑一下您拥有的所有选项并做出决定.

and so on. Think through all the options you have and decide.

我希望我回答了你的问题;)

I hope I answered your question ;)

这篇关于将多个CSV文件加载到MySQL中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆