如何使用neuraxle实现用于延迟数据加载的存储库? [英] How to implement a repository for lazy data loading with neuraxle?

查看:41
本文介绍了如何使用neuraxle实现用于延迟数据加载的存储库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

neuraxle文档中显示的示例是使用存储库在管道中延迟加载数据的示例,请参见以下代码:

In the neuraxle documentation there is an example shown, using a repository for lazy loading data within a pipeline, see the following code:

from neuraxle.pipeline import Pipeline, MiniBatchSequentialPipeline
from neuraxle.base import ExecutionContext
from neuraxle.steps.column_transformer import ColumnTransformer
from neuraxle.steps.flow import TrainOnlyWrapper

training_data_ids = training_data_repository.get_all_ids()
context = ExecutionContext('caching_folder').set_service_locator({
    BaseRepository: training_data_repository
})

pipeline = Pipeline([
    ConvertIDsToLoadedData().assert_has_services(BaseRepository),
    ColumnTransformer([
        (range(0, 2), DateToCosineEncoder()),
        (3, CategoricalEnum(categeories_count=5, starts_at_zero=True)),
    ]),
    Normalizer(),
    TrainOnlyWrapper(DataShuffler()),
    MiniBatchSequentialPipeline([
        Model()
    ], batch_size=128)
]).with_context(context)

但是,没有显示如何实现BaseRepositoryConvertIDsToLoadedData 类.实现这些类的最佳方法是什么?有人可以举个例子吗?

However, it is not shown, how to implement the BaseRepository and ConvertIDsToLoadedData classes. What would be the best way to implement those classes? Could anyone give an example?

推荐答案

我没有检查以下编译器,但看起来应该如下.如果您发现需要更改的内容并尝试对其进行编译,请有人编辑此答案:

I didn't check wheter or not the following compiles, but it should look like what follows. Please someone edit this answer if you find something to change and tried to compile it:

class BaseDataRepository(ABC): 

    @abstractmethod
    def get_all_ids(self) -> List[int]: 
        pass

    @abstractmethod
    def get_data_from_id(self, _id: int) -> object: 
        pass

class InMemoryDataRepository(BaseDataRepository): 
    def __init__(self, ids, data): 
        self.ids: List[int] = ids
        self.data: Dict[int, object] = data

    def get_all_ids(self) -> List[int]: 
        return list(self.ids)

    def get_data_from_id(self, _id: int) -> object: 
        return self.data[_id]

class ConvertIDsToLoadedData(BaseStep): 
    def _handle_transform(self, data_container: DataContainer, context: ExecutionContext): 
        repo: BaseDataRepository = context.get_service(BaseDataRepository)
        ids = data_container.data_inputs

        # Replace data ids by their loaded object counterpart: 
        data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]

        return data_container, context

context = ExecutionContext('caching_folder').set_service_locator({
    BaseDataRepository: InMemoryDataRepository(ids, data)  # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub. 
})

有关更新,请参见我在此处针对此问题打开的问题: https://github.com/Neuraxio/Neuraxle/issues/421

For updates, see the issue I opened here for this question: https://github.com/Neuraxio/Neuraxle/issues/421

这篇关于如何使用neuraxle实现用于延迟数据加载的存储库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆