合并散装加载程序中的多个列 [英] Merge multiple columns in bulkloader
问题描述
我使用应用引擎的 bulkloader
将CSV文件导入到我的数据存储中。我有很多想合并到一个列中的列,例如它们都是URL,但并非全部都提供了,并且有替代订单,例如:
url_main
url_temp
url_test
我想说:好的,如果 url_main
存在,使用它,否则用户 url_test
然后使用 url_temp
因此,是否可以创建一个自定义导入转换来引用列并将其合并到一个基于条件?
好的,读完 https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader 我了解了 import_transform
,这可以使用自定义函数。
考虑到这一点,这指出了我的权利方式:
...带有关键字参数bulkload_state,
的双参数函数,返回时包含有关实体:
bulkload_state.current_entity,它是当前实体处理的
; bulkload_state.current_dictionary,当前导出的
字典......
所以,我创建了一个处理两个变量的函数,一个将是当前实体的值
,第二个是允许我获取当前行的 bulkload_state
像这样:
$ $ p $ code def check_url(value,bulkload_state):
row = bulkload_state.current_dictionary
fields = ['最终URL','URL','临时URL']
字段中的字段:
如果字段在行中:
返回行[字段]
return None
所有这些都是抓取当前行( bulkload_state.current_dictionary
),然后检查哪些URL字段存在,否则它只返回 None
。
在我的 bulkloader.yaml
中,我简单地通过设置来调用该函数:
- property:business_ur l
external_name:URL
import_transform:bulkloader_helper.check_url
注意: code> external_name 并不重要,只要它存在,因为我没有真正使用它,我正在使用多个列。
简单!
I'm using app engine's bulkloader
to import a CSV file into my datastore. I've got a number of columns that I want to merge into one, for example they're all URLs, but not all of them are supplied and there is a superseding order, eg:
url_main
url_temp
url_test
I want to say: "Ok, if url_main
exists, use that, otherwise user url_test
and then use url_temp
"
Is it, therefore, possible to create a custom import transform that references columns and merges them into one based on conditions?
Ok, so after reading https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader I learnt about import_transform
and that this can use custom functions.
With that in mind, this pointed me the right way:
... a two-argument function with the keyword argument bulkload_state, which on return contains useful information about the entity: bulkload_state.current_entity, which is the current entity being processed; bulkload_state.current_dictionary, the current export dictionary ...
So, I created a function that handled two variables, one would be the value
of the current entity and the second would be the bulkload_state
that allowed me to fetch the current row, like so:
def check_url(value, bulkload_state):
row = bulkload_state.current_dictionary
fields = [ 'Final URL', 'URL', 'Temporary URL' ]
for field in fields:
if field in row:
return row[ field ]
return None
All this does is grab the current row (bulkload_state.current_dictionary
) and then checks which URL fields exist, otherwise it just returns None
.
In my bulkloader.yaml
I call this function simply by setting:
- property: business_url
external_name: URL
import_transform: bulkloader_helper.check_url
Note: the external_name
doesn't matter, as long as it exists as I'm not actually using it, I'm making use of multiple columns.
Simples!
这篇关于合并散装加载程序中的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!