NIFI-QueryDatabaseTable处理器.如何查询被修改的行? [英] NIFI - QueryDatabaseTable processor. How to query rows which is modified?

查看:837
本文介绍了NIFI-QueryDatabaseTable处理器.如何查询被修改的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用NIFI数据流,其中我的用例是获取mysql表数据并放入hdfs/本地文件系统中.

I am working on NIFI Data Flow where my usecase is fetch mysql table data and put into hdfs/local file system.

我已经建立了一个数据流管道,其中我使用了querydatabaseTable处理器------ ConvertRecord --- putFile处理器.

I have built a data flow pipeline where i used querydatabaseTable processor ------ ConvertRecord --- putFile processor.

我的表格架构---> id,名称,城市,创建日期

My Table Schema ---> id,name,city,Created_date

即使我在表中插入新记录,我也可以接收目标文件

I am able to receive files in destination even when i am inserting new records in table

但是,但是....

当我更新现有行时,处理器似乎无法获取这些记录.

When i am updating exsiting rows then processor is not fetching those records looks like it has some limitation.

我的问题是,如何处理这种情况?通过其他任何处理器或需要更新某些属性.

My Question is ,How to handle this scenario? either by any other processor or need to update some property.

请别人帮忙 @Bryan Bende

PLease someone help @Bryan Bende

推荐答案

QueryDatabaseTable处理器需要被告知可以使用哪些列来标识新数据.

QueryDatabaseTable Processor needs to be informed which columns it can use to identify new data.

串行idcreated时间戳是不够的.

A serial id or created timestamp is not sufficient.

最大值列:

以逗号分隔的列名列表.自处理器开始运行以来,处理器将跟踪已返回的每一列的最大值.使用多列意味着列列表的顺序,并且期望每列的值增加的速度比前一列的值慢.因此,使用多个列意味着列的层次结构,通常用于分区表.该处理器只能用于检索自上次检索以来已添加/更新的那些行.请注意,某些JDBC类型(例如位/布尔值)不利于保持最大值,因此这些类型的列不应在此属性中列出,并且会在处理过程中导致错误.如果未提供任何列,则将考虑表中的所有行,这可能会对性能产生影响.注意:对于给定的表使用一致的最大值列名称很重要,这样增量获取才能正常工作.

A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly.

从表方案来看,没有SQL方式可以判断数据是否已更新.

Judging be the table scheme, there is no sql-way of telling whether data was updated.

有很多方法可以解决此问题.就您而言,最简单的操作可能是将列created重命名为modified并在更新时将其设置为 now() 或使用第二个timestamp列.

There are many ways to solve this. In your case, the easiest thing to do might be to rename column created to modified and set to now() on updates or to work with a second timestamp column.

例如

| stamp_updated | timestamp | CURRENT_TIMESTAMP   | on update CURRENT_TIMESTAMP |

是添加的新列.在处理器中,使用stamp_updated列标识新数据

is the new column added. In the processor you use the stamp_updated column to identify new data

请不要忘记将Maximum-value Columns设置为这些列.

Don't forget to set Maximum-value Columns to those columns.

所以我基本上要说的是:

So what I am basically saying is:

如果您自己不能确定它是sql中的新记录,nifi也不会.

If you cannot tell that it is a new record in sql yourself, nifi cannot either.

这篇关于NIFI-QueryDatabaseTable处理器.如何查询被修改的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆