递归调用重复的 Bash 脚本,使其无法访问资产 [英] Recursive call to a duplicated Bash script, making it unable to access the assets

查看:47
本文介绍了递归调用重复的 Bash 脚本,使其无法访问资产的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑 :这篇文章现在以新的方式解决,因为问题的呈现方式略有不同.它在这里:如何有效地并行运行大量文件的 XSLT 转换?

Edit : This post is now addressed in a new, as the problem as to be presented slightly differently. It's here : How can I efficiently run XSLT transformations for a large number of files in parallel?

我一直在尝试并行化一个进程,在花了一些时间之后,我想请求一些帮助......

I'm stuck in my attempts of parallelizing a process, and after some decent time spent on it I'd like to request some help ...

基本上,我有很多 XML 文件要使用特定的 XSLT 表进行转换.但是该工作表使用对(非常慢的)API 的调用来获取额外的数据,并且一次性处理整批 XML 将需要(非常)长的时间.

Basically, I have a lots of XML files to transform with a specific XSLT sheet. But the sheet uses a call to an (very slow) API to fetch additional data, and taking the whole batch of XMLs in 1 go will take (very) long.

因此,我将原始输入"文件夹中的所有文件拆分为包含大约 5000 个 XML 文件的子文件夹,并且我也在每个子文件夹中复制了以下 Bash 脚本:

Therefore I splitted all the files from the original "input" folder into subfolder containing each around 5000 XML files, and I copied the following Bash script inside each subfolder too:

for f in *.xml
do
  java -jar ../../saxon9he.jar -xsl:../../some-xslt-sheet.xsl -s:$f
done

我从包含输入"文件夹、Saxon 库和 XSLT 表的根"文件夹中为每个文件夹调用每个进程:

And I call each process, for each folder, from the "root" folder containing altogether the "input" folder, the Saxon library and the XSLT sheet :

find input -type d -exec sh {}/script.sh \;

但我收到此错误:

Unable to access jarfile ../../saxon9he.jar

我想这是因为我在根"文件夹中操作,而被调用的脚本在目录中较低.我可以通过复制每个子文件夹中的所有资产来解决问题(如果我是正确的),但我发现解决方案使我当前的方法更加笨拙.

I suppose it comes form the fact that I'm operating from the "root" folder, when the scripts being called are lower in the directories. I could solver the problem (if I'm correct) by copying all the assets in each subfolder, but I found the solution making my current approach even clumsier.

感谢任何可能有想法并使我理解这一点的人!

Thanks to anyone who might have an idea and make me understand this !

推荐答案

首先,您真的不想初始化一个新的 Java VM 来运行每个转换:这通常比运行实际转换需要更长的时间.从这个角度来看,对于典型"转换,您经常会看到 Java 初始化时间为 3 秒,样式表编译时间为 300 毫秒,转换时间为 10 毫秒.因此,如果您能找到一种只初始化 Java 并编译一次样式表的方法,那么您处理 10K 文档的总时间将是 2 分钟而不是 10 小时.

Firstly, you really don't want to initialize a new Java VM to run each transformation: this is typically going to take much longer than running the actual transformation. To put this in perspective, for "typical" transformations you will often see Java initialization time 3 seconds, stylesheet compilation time 300ms, transformation time 10ms. So if you can find a way to do it that only initializes Java and compiles the stylesheet once, your total time for 10K documents is going to be 2 minutes rather than 10 hours.

有多种方法可以实现这一点,但它们都涉及使用 shell 脚本以外的其他东西来控制进程.在我看来,最简单的方法是从 XSLT 本身控制它,通过使用 collection() 函数访问目录中的所有文件.这有一个额外的好处,如果您使用 Saxon-EE,将使用机器上的所有内核并行处理(解析)文件,这可以将速度提高 4 倍左右.您只需要在样式表中添加一个入口点,例如:

There are various ways to achieve this but they all involve using something other than a shell-script to control the process. The simplest, in my view, is to control it from XSLT itself, by using the collection() function to access all the files in the directory. This has an added bonus, if you're using Saxon-EE, that the files will be processed (parsed) in parallel using all the cores on your machine, which can speed things up by another factor of 4 or so. You just need to add an entry point to the stylesheet something like:

<xsl:template name="main">
  <xsl:for-each select="collection('file:///my/dir?select=*.xml;recurse=yes')!saxon:discard-document(.)">
    <xsl:result-document href="....">
      <xsl:apply-templates/>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template>

saxon:discard-document 调用是可选的,但因为它使文档符合垃圾收集条件,这意味着您不太可能耗尽内存.

The saxon:discard-document call is optional, but because it makes documents eligible for garbage collection, means that you are less likely to run out of memory.

另一种编写控制循环的方法是使用专门的 shell,例如 xmlsh.

Another approach to writing the control loop is to use a specialized shell such as xmlsh.

这篇关于递归调用重复的 Bash 脚本,使其无法访问资产的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆