有效地将XML转换成弹性搜索 [英] Efficiently getting XML into Elasticsearch

查看:114
本文介绍了有效地将XML转换成弹性搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我正在使用scrapy将大型XML文件从ftp服务器解析为弹性搜索。它的工作原理似乎是一个很重的解决方案,它也使用了很多内存。

Currently I am using scrapy to parse a large XML file from an ftp server into elasticsearch. It works but seems quite a heavy weight solution and it uses a lot of memory too.

我想知道我是否更好地为ES编写一个插件。我知道logstash可以做到这一点,但是我无法进行内联语言检测等。

I am wondering if I am better off writing a plugin for ES instead. I know logstash can do it but I can't do inline language detection etc with that.

A)如果我为ES写了一个实际的插件,我认为它必须在Java中拉入数据。这种方法有什么好处吗?或者我可以编写一个单独的Python脚本来推送数据。有没有明确的理由选择一种方法(假设我不知道Java或Python)

A) if I write an actual plugin for ES I think it has to be in Java to pull in the data. Is there any advantage in this approach or could I write a separate Python script to push the data in instead. Is there any clear reason for selecting one method over the other (assuming I don't know Java or Python)

这归结为:


  • 使用实际的ES插件,内存管理是否更好

  • Java比Python更适合处理XML? / li>
  • Would the memory management be better with an actual ES plugin
  • Is Java better suited to processing XML than say, Python?

当我开始这个旅程时,任何帮助和建议将不胜感激。

Any help and advice would be appreciated as I start on this journey.

James

推荐答案

将XML转换为JSON对于理解XML中的实际数据来说是一个问题,因为它不是那么容易转换到JSON,通常需要额外的逻辑。因此,没有错误的XML> JSON翻译器。

Converting XML to JSON is rather question about understanding actual data in XML, as it can be not so easy to transform to JSON and usually needs additional logic. For this reason, there's no error-proof XML>JSON translators.

如果您决定使用python来执行此操作,请查看 eTree lxml xmltodict 。 JSON支持位于 python 的stdlib本身。

If you'll decide to use python to do that, take a look at eTree, lxml and xmltodict. JSON support is in python's stdlib natively.

如果你决定从ES方尝试一些运气,请查看 elasticsearch-xml 。它可能适合您的需要,以保持一致的XML。

If you'll decide to try some luck from ES side, look at elasticsearch-xml. It may fit your needs in case of consistent XML.

谈论 python vs java 解析性能 - 如果性能是您的关键,您可以利用已经在低级别进行优化的一些库,但是一般来说,良好的java代码应该能够更好地执行。

Talking about python vs java performance for parsing - if performance is a key for you, you can leverage some libraries, that is already optimized at low-level, but generally, good java code should perform better.

这篇关于有效地将XML转换成弹性搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆