如何比较一个CSV中的一行与另一个CSV文件中的所有行? [英] How can I compare the one line in one CSV with all lines in another CSV file?
问题描述
我有两个CSV文件:
-
Identity(no,name,Age)
具有10行 -
Location(Address,no,City)
具有100行
Identity(no,name,Age)
which has 10 rowsLocation(Address,no,City)
which has 100 rows
我需要提取行并使用Location
CSV文件检查Identity
中的no
列.
I need to extract rows and check the no
column in the Identity
with Location
CSV files.
从Identity
CSV文件中获取单行,然后在Location
CSV文件中的Location.no
中检查具有100行的Identity.no
.
Get the single row from Identity
CSV file and check Identity.no
with Location.no
having 100 rows in Location
CSV file.
如果匹配,则在Identity, Location
注意:我需要从Identity
中获得第一行并将其与Location
CSV文件中的100行进行比较,然后获得第二行将其与100行进行比较.在Identity
CSV文件中,该行最多可以连续10行.
Note: I need to get 1st row from Identity
compare it with 100 rows in Location
CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity
CSV file.
然后将整体结果转换为Json.然后将结果移至SQL Server.
And overall results convert into Json.Then move the results in to SQL Server.
Apache Nifi中是否可能?
任何帮助表示赞赏.
推荐答案
您可以使用DistributedMapCache功能在NiFi中执行此操作,该功能实现了用于查找的键/值存储.该设置需要一个分布式地图缓存,外加两个流程-一个流程用地址记录填充缓存,另一个流程通过no
字段查找地址.
You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups. The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no
field.
-
DistributedMapCache由两个控制器服务定义,一个 DistributeMapCacheClientService .如果您的数据集很小,则可以仅使用"localhost"作为服务器.
The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService. If your data set is small, you can just use "localhost" as the server.
填充高速缓存需要读取地址文件,拆分记录,提取no
键并将键/值对放入高速缓存.大概的流程可能包括GetFile-> SplitText-> ExtractText-> UpdateAttribute->
Populating the cache requires reading the Address file, splitting the records, extracting the no
key, and putting key/value pairs to the cache. An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.
查找身份记录实际上与上述流程非常相似,因为它需要读取身份文件,拆分记录,提取no
密钥,然后获取地址记录.处理器流可能包括GetFile-> SplitText-> ExtractText-> UpdateAttribute->
Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the no
key, and then fetching the address record. Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.
您可以使用AttributesToJSON或ExecuteScript将全部或部分从CSV转换为JSON.
You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.
这篇关于如何比较一个CSV中的一行与另一个CSV文件中的所有行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!