合并散装加载程序中的多个列 [英] Merge multiple columns in bulkloader

查看:138
本文介绍了合并散装加载程序中的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用应用引擎的 bulkloader 将CSV文件导入到我的数据存储中。我有很多想合并到一个列中的列,例如它们都是URL,但并非全部都提供了,并且有替代订单,例如:

  url_main 
url_temp
url_test

我想说:好的,如果 url_main 存在,使用它,否则用户 url_test 然后使用 url_temp



因此,是否可以创建一个自定义导入转换来引用列并将其合并到一个基于条件?

解决方案

好的,读完 https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader 我了解了 import_transform ,这可以使用自定义函数。



考虑到这一点,这指出了我的权利方式:


...带有关键字参数bulkload_state,
的双参数函数,返回时包含有关实体:
bulkload_state.current_entity,它是当前实体处理的
; bulkload_state.current_dictionary,当前导出的
字典......

所以,我创建了一个处理两个变量的函数,一个将是当前实体的,第二个是允许我获取当前行的 bulkload_state 像这样:

$ $ p $ code def check_url(value,bulkload_state):
row = bulkload_state.current_dictionary
fields = ['最终URL','URL','临时URL']

字段中的字段:
如果字段在行中:
返回行[字段]


return None

所有这些都是抓取当前行( bulkload_state.current_dictionary ),然后检查哪些URL字段存在,否则它只返回 None



在我的 bulkloader.yaml 中,我简单地通过设置来调用该函数:

   -  property:business_ur l 
external_name:URL
import_transform:bulkloader_helper.check_url

注意: code> external_name 并不重要,只要它存在,因为我没有真正使用它,我正在使用多个列。



简单!

I'm using app engine's bulkloader to import a CSV file into my datastore. I've got a number of columns that I want to merge into one, for example they're all URLs, but not all of them are supplied and there is a superseding order, eg:

url_main
url_temp
url_test

I want to say: "Ok, if url_main exists, use that, otherwise user url_test and then use url_temp"

Is it, therefore, possible to create a custom import transform that references columns and merges them into one based on conditions?

解决方案

Ok, so after reading https://developers.google.com/appengine/docs/python/tools/uploadingdata#Configuring_the_Bulk_Loader I learnt about import_transform and that this can use custom functions.

With that in mind, this pointed me the right way:

... a two-argument function with the keyword argument bulkload_state, which on return contains useful information about the entity: bulkload_state.current_entity, which is the current entity being processed; bulkload_state.current_dictionary, the current export dictionary ...

So, I created a function that handled two variables, one would be the value of the current entity and the second would be the bulkload_state that allowed me to fetch the current row, like so:

def check_url(value, bulkload_state):
    row = bulkload_state.current_dictionary
    fields = [ 'Final URL', 'URL', 'Temporary URL' ]

    for field in fields:
        if field in row:
            return row[ field ]


    return None

All this does is grab the current row (bulkload_state.current_dictionary) and then checks which URL fields exist, otherwise it just returns None.

In my bulkloader.yaml I call this function simply by setting:

- property: business_url
  external_name: URL
  import_transform: bulkloader_helper.check_url

Note: the external_name doesn't matter, as long as it exists as I'm not actually using it, I'm making use of multiple columns.

Simples!

这篇关于合并散装加载程序中的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆