想要并行运行非线程安全库 - 可以使用多个类加载器来完成吗? [英] Want to run non-threadsafe library in parallel - can it be done using multiple classloaders?

查看:88
本文介绍了想要并行运行非线程安全库 - 可以使用多个类加载器来完成吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个项目中工作,在这个项目中我们使用一个不保证线程安全(并且不是)的库和Java 8流场景中的单线程,它按预期工作。



我们希望使用并行流来获得低挂起的可伸缩性果实。



不幸的是,这导致库失败 - 很可能是因为一个实例干扰与其他实例共享的变量 - 因此我们需要隔离。



我正在考虑为每个实例(可能是本地线程)使用单独的类加载器,据我所知应该是为了所有实际目的,我需要隔离,但我不熟悉故意为此目的构建类加载器。



这是正确的方法吗?为了获得适当的生产质量,我该怎么做?






编辑:我被要求提供有关触发问题的情况的其他信息,以便更好地理解它。问题仍然是关于一般情况,而不是修复库。



我可以完全控制库创建的对象( https://github.com/veraPDF/ )由

 < dependency> 
< groupId> org.verapdf< / groupId>
< artifactId> validation-model< / artifactId>
< version> 1.1.6< / version>
< / dependency>

使用项目maven存储库获取工件。



< pre class =lang-xml prettyprint-override> < repositories>
< repository>
< snapshots>
< enabled> true< / enabled>
< / snapshots>
< id> vera-dev< / id>
< name> Vera development< / name>
< url> http://artifactory.openpreservation.org/artifactory/vera-dev< / url>
< / repository>
< / repositories>

目前强化图书馆是不可行的。






编辑:我被要求显示代码。我们的核心适配器大致是:

 公共类VeraPDFValidator实现函数< InputStream,byte []> {
private String flavorId;
private Boolean prettyXml;

public VeraPDFValidator(String flavorId,Boolean prettyXml){
this.flavorId = flavorId;
this.prettyXml = prettyXml;
VeraGreenfieldFoundryProvider.initialise();
}

@Override
public byte [] apply(InputStream inputStream){
try {
return apply0(inputStream);
} catch(RuntimeException e){
throw e;
} catch(ModelParsingException | ValidationException | JAXBException | EncryptedPdfException e){
throw new RuntimeException(调用VeraPDF验证,e);
}
}

private byte [] apply0(InputStream inputStream)抛出ModelParsingException,ValidationException,JAXBException,EncryptedPdfException {
PDFAFlavour flavor = PDFAFlavour.byFlavourId(flavorId);
PDFAValidator validator = Foundries.defaultInstance()。createValidator(flavor,false);
PDFAParser loader = Foundries.defaultInstance()。createParser(inputStream,flavor);
ValidationResult result = validator.validate(loader);

//在内存中生成XML字节数组 - 因为我们需要将它传递给Fedora,我们无论如何都需要它适合内存。

ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmlSerialiser.toXml(结果,baos,prettyXml,false);
final byte [] byteArray = baos.toByteArray();
返回byteArray;
}
}

这是一个从InputStream映射的函数(提供PDF文件)到字节数组(表示XML报告输出)。



(看到代码,我注意到构造函数中有对初始化程序的调用,这可能是我特殊情况下的罪魁祸首。我会仍然像一般问题的解决方案。

解决方案

我们遇到了类似的挑战。问题通常来自静态属性,这些属性变得不情愿各个线程之间共享。



只要我们能保证静态属性实际上是由类加载器加载的类上设置的,使用不同的类加载器对我们起作用。 Java 可能有一些类,它们提供的属性或方法不是在线程之间隔离的,也不是线程安全的(' System.setProperties() Security.addProvider()没关系 - 关于此事的任何规范性文件都受到欢迎btw)。



潜在的可行且快速的解决方案 - 至少可以让你有机会为你的l测试这个理论ibrary - 是使用诸如Jetty或Tomcat之类的servlet引擎。



构建一些包含你的库并且并行启动进程的战争(每次战争1次)。



在servlet线程中运行代码时,这些引擎的 WebappClassLoaders 会尝试从父类加载类首先加载(与引擎相同),如果找不到类,则尝试从战争打包的jar /类中加载它。



使用jetty你可以通过编程方式将战争热部署到您选择的上下文,然后根据需要理论上扩展处理器(战争)的数量。



我们通过扩展<已经实现了我们自己的类加载器code> URLClassLoader 并从Jetty Webapp ClassLoader中获取灵感。这并不像看起来那么难。



我们的类加载器完全相反:它试图从本地的jar包加载一个类'strong' first ,然后尝试获取它们来自父类加载器。这保证了从不考虑父类加载器意外加载的库(第一个)。我们的'包'实际上是一个jar,其中包含其他带有自定义清单文件的jar /库。



按原样发布此类加载器代码不会产生太多感觉(并创建一些版权问题)。如果你想进一步探索这条路线,我可以试着想出一个骨架。



Jetty WebappClassLoader


I work on a project where we use a library that is not guaranteed thread-safe (and isn't) and single-threaded in a Java 8 streams scenario, which works as expected.

We would like to use parallel streams to get the low hanging scalability fruit.

Unfortunately this cause the library to fail - most likely because one instance interferes with variables shared with the other instance - hence we need isolation.

I was considering using a separate classloader for each instance (possibly thread local) which to my knowledge should mean that for all practical purposes that I get the isolation needed but I am unfamiliar with deliberately constructing classloaders for this purpose.

Is this the right approach? How shall I do this in order to have proper production quality?


Edit: I was asked for additional information about the situation triggering the question, in order to understand it better. The question is still about the general situation, not fixing the library.

I have full control over the object created by the library (which is https://github.com/veraPDF/) as pulled in by

<dependency>
    <groupId>org.verapdf</groupId>
    <artifactId>validation-model</artifactId>
    <version>1.1.6</version>
</dependency>

using the project maven repository for artifacts.

<repositories>
    <repository>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
        <id>vera-dev</id>
        <name>Vera development</name>
        <url>http://artifactory.openpreservation.org/artifactory/vera-dev</url>
    </repository>
</repositories>

For now it is unfeasible to harden the library.


EDIT: I was asked to show code. Our core adapter is roughly:

public class VeraPDFValidator implements Function<InputStream, byte[]> {
    private String flavorId;
    private Boolean prettyXml;

    public VeraPDFValidator(String flavorId, Boolean prettyXml) {
        this.flavorId = flavorId;
        this.prettyXml = prettyXml;
        VeraGreenfieldFoundryProvider.initialise();
    }

    @Override
    public byte[] apply(InputStream inputStream) {
        try {
            return apply0(inputStream);
        } catch (RuntimeException e) {
            throw e;
        } catch (ModelParsingException | ValidationException | JAXBException | EncryptedPdfException e) {
            throw new RuntimeException("invoking VeraPDF validation", e);
        }
    }

    private byte[] apply0(InputStream inputStream) throws ModelParsingException, ValidationException, JAXBException, EncryptedPdfException {
        PDFAFlavour flavour = PDFAFlavour.byFlavourId(flavorId);
        PDFAValidator validator = Foundries.defaultInstance().createValidator(flavour, false);
        PDFAParser loader = Foundries.defaultInstance().createParser(inputStream, flavour);
        ValidationResult result = validator.validate(loader);

        // do in-memory generation of XML byte array - as we need to pass it to Fedora we need it to fit in memory anyway.

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        XmlSerialiser.toXml(result, baos, prettyXml, false);
        final byte[] byteArray = baos.toByteArray();
        return byteArray;
    }
}

which is a function that maps from an InputStream (providing a PDF-file) to a byte array (representing the XML report output).

(Seeing the code, I've noticed that there is a call to the initializer in the constructor, which may be the culprit here in my particular case. I'd still like a solution to the generic problem.

解决方案

We have faced similar challenges. Issues usually came from from static properties which became unwillingly "shared" between the various threads.

Using different classloaders worked for us as long as we could guarantee that the static properties were actually set on classes loaded by our class loader. Java may have a few classes which provide properties or methods which are not isolated among threads or are not thread-safe ('System.setProperties() and Security.addProvider() are OK - any canonical documentation on this matter is welcomed btw).

A potentially workable and fast solution - that at least can give you a chance to test this theory for your library - is to use a servlet engine such as Jetty or Tomcat.

Build a few wars that contain your library and start processes in parallel (1 per war).

When running code inside a servlet thread, the WebappClassLoaders of these engines attempt to load a classes from the parent class loader first (the same as the engine) and if it does not find the class, attempts to load it from the jars/classes packaged with the war.

With jetty you can programmatically hot deploy wars to the context of your choice and then theoretically scale the number of processors (wars) as required.

We have implemented our own class loader by extending URLClassLoader and have taken inspiration from the Jetty Webapp ClassLoader. It is not as hard a job as as it seems.

Our classloader does the exact opposite: it attempts to load a class from the jars local to the 'package' first , then tries to get them from the parent class loader. This guarantees that a library accidentally loaded by the parent classloader is never considered (first). Our 'package' is actually a jar that contains other jars/libraries with a customized manifest file.

Posting this class loader code "as is" would not make a lot of sense (and create a few copyright issues). If you want to explore that route further, I can try coming up with a skeleton.

Source of the Jetty WebappClassLoader

这篇关于想要并行运行非线程安全库 - 可以使用多个类加载器来完成吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆