如何使用 Guice 将我的 api 注入数据流作业而无需可序列化? [英] How can I inject with Guice my api into dataflow jobs without needed to be serializable?

查看:14
本文介绍了如何使用 Guice 将我的 api 注入数据流作业而无需可序列化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是继这么好的答案之后的后续问题 有没有办法为数据流作业上传 jar,这样我们就不必序列化所有内容?

This question is a follow on after such a great answer Is there a way to upload jars for a dataflow job so we don't have to serialize everything?

这让我意识到好吧,我想要的是没有序列化的注入,以便我可以模拟和测试".

This made me realize 'ok, what I want is injection with no serialization so that I can mock and test'.

我们当前的方法要求我们的 apis/mocks 是可序列化的,但是,我必须将静态字段放入模拟中,因为它会被序列化和反序列化,从而创建数据流使用的新实例.

Our current method requires our apis/mocks to be serialiable BUT THEN, I have to put static fields in the mock because it gets serialized and deserialized creating a new instance that dataflow uses.

我的同事指出,也许这需要是一个水槽,但要区别对待? <- 我们可能会稍后尝试并更新,但我们现在不确定.

My colleague pointed out that perhaps this needs to be a sink and that is treated differently? <- We may try that later and update but we are not sure right now.

我的愿望是在测试期间用模拟替换 API.有人有这方面的例子吗?

My desire is from the top to replace the apis with mocks during testing. Does someone have an example for this?

这是我们的引导代码,不知道它是在生产中还是在功能测试中.我们测试端到端的结果,在我们的测试中没有 apache 光束导入,这意味着如果我们想要调整并保留我们的所有测试,我们会切换到任何技术.不仅如此,我们还发现了更多的集成错误,并且可以在不重写测试的情况下进行重构,因为我们测试的合同是我们无法轻易更改的客户合同.

Here is our bootstrap code that does not know if it is in production or inside a feature test. We test end to end results with no apache beam imports in our tests meaning we swap to any tech if we want to pivot and keep all our tests. Not only that, we catch way more integration bugs and can refactor without rewriting tests since the contracts we test are customer ones we can't easily change.

public class App {

    private Pipeline pipeline;
    private RosterFileTransform transform;

    @Inject
    public App(Pipeline pipeline, RosterFileTransform transform) {
        this.pipeline = pipeline;
        this.transform = transform;
    }


    public void start() {
        pipeline.apply(transform);
        pipeline.run();
    }
}

请注意,我们所做的一切都是基于 Guice 注入的,因此流水线可能是直接运行的,也可能不是.我可能需要修改这个类来传递东西 :( 但是现在任何有用的东西都会很棒.

Notice that everything we do is Guice Injection based so the Pipeline may be direct runner or not. I may need to modify this class to pass things through :( but anything that works for now would be great.

我试图在没有序列化的情况下获取我们的 api(以及模拟和实现)的函数是

The function I am trying to get our api(and mock and impl to) with no serialization is thus

private class ValidRecordPublisher extends DoFn<Validated<PractitionerDataRecord>, String> {
    @ProcessElement
    public void processElement(@Element Validated<PractitionerDataRecord>element) {
        microServiceApi.writeRecord(element.getValue);
    }
}

我不确定如何以一种避免序列化的方式传入 microServiceApi.在使用 guice Provider provider 进行反序列化之后,我也可以延迟创建;使用 provider.get() 如果那里也有解决方案.

I am not sure how to pass in microServiceApi in a way that avoid serialization. I would be ok with delayed creation as well after deserialization using guice Provider provider; with provider.get() if there is a solution there too.

推荐答案

以这样一种方式解决:mock 不再需要静态或序列化,因为玻璃桥接了数据流的世界(在生产和测试中)像这样

Solved in such a way that mocks no longer need static or serialization anymore by one since glass bridging the world of dataflow(in prod and in test) like so

注意:我们公司还有一个额外的魔法,它通过从一个服务到另一个服务的标头并通过数据流传递其中的一些你可以忽略的东西(即 RouterRequest 请求 = Current.request();)所以对于其他人,他们每次都必须将 projectId 传入 getInstance.

NOTE: There is additional magic-ness we have in our company that passes through headers from service to service and through dataflow and that is some of it in there which you can ignore(ie. the RouterRequest request = Current.request();). so for anyone else, they will have to pass in projectId into getInstance each time.

public abstract class DataflowClientFactory implements Serializable {
    private static final Logger log = LoggerFactory.getLogger(DataflowClientFactory.class);

    public static final String PROJECT_KEY = "projectKey";
    private transient static Injector injector;
    private transient static Module overrides;

    private static int counter = 0;

    public DataflowClientFactory() {
        counter++;
        log.info("creating again(usually due to deserialization). counter="+counter);
    }

    public static void injectOverrides(Module dfOverrides) {
        overrides = dfOverrides;
    }

    private synchronized void initialize(String project) {
        if(injector != null)
            return;

        /********************************************
         * The hardest part is this piece since this is specific to each Dataflow
         * so each project subclasses DataflowClientFactory
         * This solution is the best ONLY in the fact of time crunch and it works
         * decently for end to end testing without developers needing fancy
         * wrappers around mocks anymore.
         ***/
        Module module = loadProjectModule();

        Module modules = Modules.combine(module, new OrderlyDataflowModule(project));
        if(overrides != null) {
            modules = Modules.override(modules).with(overrides);
        }

        injector = Guice.createInjector(modules);
    }

    protected abstract Module loadProjectModule();

    public <T> T getInstance(Class<T> clazz) {
        if(!Current.isContextSet()) {
            throw new IllegalStateException("Someone on the stack is extending DoFn instead of OrderlyDoFn so you need to fix that first");
        }
        RouterRequest request = Current.request();
        String project = (String)request.requestState.get(PROJECT_KEY);

        initialize(project);
        return injector.getInstance(clazz);
    }

}

这篇关于如何使用 Guice 将我的 api 注入数据流作业而无需可序列化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆