如何比较一个CSV中的一行与另一个CSV文件中的所有行? [英] How can I compare the one line in one CSV with all lines in another CSV file?

查看:123
本文介绍了如何比较一个CSV中的一行与另一个CSV文件中的所有行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个CSV文件:

  1. Identity(no,name,Age)具有10行
  2. Location(Address,no,City)具有100行
  1. Identity(no,name,Age) which has 10 rows
  2. Location(Address,no,City) which has 100 rows

我需要提取行并使用Location CSV文件检查Identity中的no列.

I need to extract rows and check the no column in the Identity with Location CSV files.

Identity CSV文件中获取单行,然后在Location CSV文件中的Location.no中检查具有100行的Identity.no.

Get the single row from Identity CSV file and check Identity.no with Location.no having 100 rows in Location CSV file.

如果匹配,则在Identity, Location

注意:我需要从Identity中获得第一行并将其与Location CSV文件中的100行进行比较,然后获得第二行将其与100行进行比较.在Identity CSV文件中,该行最多可以连续10行.

Note: I need to get 1st row from Identity compare it with 100 rows in Location CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity CSV file.

然后将整体结果转换为Json.然后将结果移至SQL Server.

And overall results convert into Json.Then move the results in to SQL Server.

Apache Nifi中是否可能?

任何帮助表示赞赏.

推荐答案

您可以使用DistributedMapCache功能在NiFi中执行此操作,该功能实现了用于查找的键/值存储.该设置需要一个分布式地图缓存,外加两个流程-一个流程用地址记录填充缓存,另一个流程通过no字段查找地址.

You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups. The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no field.

  1. DistributedMapCache由两个控制器服务定义,一个

  1. The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService. If your data set is small, you can just use "localhost" as the server.

填充高速缓存需要读取地址文件,拆分记录,提取no键并将键/值对放入高速缓存.大概的流程可能包括GetFile-> SplitText-> ExtractText-> UpdateAttribute->

Populating the cache requires reading the Address file, splitting the records, extracting the no key, and putting key/value pairs to the cache. An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.

查找身份记录实际上与上述流程非常相似,因为它需要读取身份文件,拆分记录,提取no密钥,然后获取地址记录.处理器流可能包括GetFile-> SplitText-> ExtractText-> UpdateAttribute->

Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the no key, and then fetching the address record. Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.

您可以使用AttributesToJSON或ExecuteScript将全部或部分从CSV转换为JSON.

You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.

这篇关于如何比较一个CSV中的一行与另一个CSV文件中的所有行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆