在java中查看新文件的文件夹的最佳API [英] Best API to watch folder for new files in java

查看:93
本文介绍了在java中查看新文件的文件夹的最佳API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为新文件查看特定文件夹,每当新文件到达时,我都需要对其中一个索引软件执行一些处理和处理数据。

I need to watch a specific folder for new files and whenever new file arrives, i need to perform some processing and processed data to one of the indexing software.

我需要做的就是,观看文件夹,每当有新文件进来时,我都需要阅读它的内容。 Flume假脱机目录看起来很合适,但这是我正在考虑的挑战。

All i need to do is, watch the folder and whenever a new file comes in, i need to read the contents of it. Flume spooling directory looks good fit, but here are the challenges i am thinking.

1)只读取一次文件,不应读取任何已读取的文件。
2)文件的完整性,例如:如果文件没有完全复制,可以说.staging或.tmp文件在那里,我不应该阅读它们。
3)输入文件可能很大,而且是xmls。因此,在拆分中读取文件对我的原因没有帮助。我需要完整地读取文件并处理它们。
4)由于文件的大小可能很大,因此水槽似乎存在大文件的问题。它能否符合我的要求。或者我应该检查任何其他文件观察者。?

1) Reading the file only once and should not read any file that is already read. 2) Completeness of a file, for eg: if the file has not been copied fully lets say .staging or .tmp files are there, i should not read them. 3) The input files can be of huge size and they are xmls. So, reading file in splits does not help my cause. I need to read file in full, and process them. 4) As the size of file might be huge, flume seems to have some problems with huge files. Can it fit into my requirement.? or should i check for any other file watchers.?

您能否建议最佳选项来执行文件观看。水槽假脱机会做这一切。?

Could you please suggest best option to perform the file watching. Is flume spooling does all this.?

推荐答案

我不能说任何关于水槽的事情,我对它不熟悉。

I can't say anything about flume, I am unfamiliar with it.

您可以执行以下操作之一。

You can do one of a couple of things.

首先,您可以使用以下方法将文件复制到目录中一种类型的名称(如newfile.copying),然后在复制完成后将它们重命名为newfile。然后在扫描期间,您只需忽略* .copying文件。

First, you could copy the files in to the directory using one type of name (like newfile.copying), and then rename them to just "newfile" after the copy is complete. Then during you scans, you simply ignore the "*.copying" files.

您可以在加载文件时监控文件的文件大小,如果文件大小有一段时间后(几秒钟)没有改变,那么你可以假设文件已经完成复制并开始处理。

You could monitor the file sizes of the files as they load, and if the file size has not changed after some time (few seconds), then you can assume the file is done copying and start processing.

最后,你应该只有一个完成目录(在同一个驱动器上),并在完成后将文件重命名为该目录。

Finally, you should simply have a "done" directory (on the same drive), and rename the files to that directory when you're done with them.

另一种选择是你可以有三个目录:incoming, working,done。

Another option is that you could have three directories: "incoming", "working", "done".

将文件复制到incoming目录中。在开始处理它们之前,将它们重命名为working目录。最后,然后将其移出完成目录。

The files are copied in to the "incoming" directory. Before you start processing them, you rename them to the "working" directory. Finally you then move it out of there in to the "done" directory.

这使您能够在系统中断时进行恢复。你将知道你正在处理的最后一个文件是什么,你可以重新处理它,或者你喜欢什么。

This gives you the ability to recover in case the system gets interrupted. You will "know" what the last file you were processing is, and you can either reprocess it, or whatever you like.

重命名选项很重要因为,在相同的文件系统,它们是原子的。你永远不会在一个目录中拥有一个文件而不是另一个目录,或者同时拥有一个名称和另一个名称。

The rename options are important because, on the same file system, they are atomic. You'll never have a file in one directory and not the other, or of one name and the other at the same time.

这篇关于在java中查看新文件的文件夹的最佳API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆