读取文件的前N个字节作为Java中的InputStream? [英] Reading first N bytes of a file as an InputStream in Java?

查看:63
本文介绍了读取文件的前N个字节作为Java中的InputStream?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我生命中,我一直无法找到与我想做的事情相匹配的问题,因此在这里我将解释我的用例.如果您知道某个主题已涵盖此问题的答案,请随时将我引向该主题. :)

For the life of me, I haven't been able to find a question that matches what I'm trying to do, so I'll explain what my use-case is here. If you know of a topic that already covers the answer to this, please feel free to direct me to that one. :)

我有一段代码可以定期(每20秒一次)将文件上传到Amazon S3.该文件是由另一个进程写入的日志文件,因此此功能实际上是尾随日志的一种方式,以便某人可以半实时读取其内容,而不必直接访问日志所在的计算机.

I have a piece of code that uploads a file to Amazon S3 periodically (every 20 seconds). The file is a log file being written by another process, so this function is effectively a means of tailing the log so that someone can read its contents in semi-real-time without having to have direct access to the machine that the log resides on.

直到最近,我一直只是使用S3 PutObject方法(使用File作为输入)来执行此上传.但是在AWS开发工具包1.9中,此方法不再起作用,因为如果实际上载的内容大小大于上载开始时承诺的内容长度,则S3客户端会拒绝该请求.此方法在开始流传输数据之前先读取文件的大小,因此,鉴于此应用程序的性质,文件很可能在该点和流的末尾之间增大了大小.这意味着无论文件有多大,我现在都必须确保仅发送N个字节的数据.

Up until recently, I've simply been using the S3 PutObject method (using a File as input) to do this upload. But in AWS SDK 1.9, this no longer works because the S3 client rejects the request if the content size actually uploaded is greater than the content-length that was promised at the start of the upload. This method reads the size of the file before it starts streaming the data, so given the nature of this application, the file is very likely to have increased in size between that point and the end of the stream. This means that I need to now ensure I only send N bytes of data regardless of how big the file is.

我不需要以任何方式解释文件中的字节,因此我不关心编码.我可以逐字节传输它.基本上,我想要的是一种简单的方法,在该方法中,我可以读取文件的第N个字节,然后即使文件中有更多数据超过该点,它也可以终止读取. (换句话说,将EOF插入流中的特定位置.)

I don't have any need to interpret the bytes in the file in any way, so I'm not concerned about encoding. I can transfer it byte-for-byte. Basically, what I want is a simple method where I can read the file up to the Nth byte, then have it terminate the read even if there's more data in the file past that point. (In other words, insert EOF into the stream at a specific point.)

例如,如果开始上载时我的文件长为10000字节,但在上载期间增长到12000字节,则无论大小更改如何,我都希望停止以10000字节上载. (在随后的上传中,我然后将上传12000字节或更多.)

For example, if my file is 10000 bytes long when I start the upload, but grows to 12000 bytes during the upload, I want to stop uploading at 10000 bytes regardless of that size change. (On a subsequent upload, I would then upload the 12000 bytes or more.)

我还没有找到执行此操作的预制方法-到目前为止,我发现最好的方法似乎是IOUtils.copyLarge(InputStream,OutputStream,offset,length),可以告诉它最多复制一个提供的OutputStream的长度"字节.但是,copyLarge和PutObject一样都是阻塞方法(大概在InputStream上调用read()的形式),因此看来我根本无法使它正常工作.

I haven't found a pre-made way to do this - the best I've found so far appears to be IOUtils.copyLarge(InputStream, OutputStream, offset, length), which can be told to copy a maximum of "length" bytes to the provided OutputStream. However, copyLarge is a blocking method, as is PutObject (which presumably calls a form of read() on its InputStream), so it seems that I couldn't get that to work at all.

我还没有找到可以做到这一点的任何方法或预构建的流,因此让我觉得我需要编写自己的实现来直接监视已读取的字节数.然后,这可能像BufferedInputStream一样工作,其中每批读取的字节数是缓冲区大小或要读取的剩余字节中的较小者. (例如,在缓冲区大小为3000字节的情况下,我将分别以3000字节进行三批处理,然后再进行1000字节+ EOF的处理.)

I haven't found any methods or pre-built streams that can do this, so it's making me think I'd need to write my own implementation that directly monitors how many bytes have been read. That would probably then work like a BufferedInputStream where the number of bytes read per batch is the lesser of the buffer size or the remaining bytes to be read. (eg. with a buffer size of 3000 bytes, I'd do three batches at 3000 bytes each, followed by a batch with 1000 bytes + EOF.)

有人知道更好的方法吗?谢谢.

Does anyone know a better way to do this? Thanks.

编辑为了澄清一下,我已经知道了几种替代方法,都不是理想选择:

EDIT Just to clarify, I'm already aware of a couple alternatives, neither of which are ideal:

(1)我可以在上传文件时锁定它.这样做会导致数据丢失或写入文件的过程中的操作问题.

(1) I could lock the file while uploading it. Doing this would cause loss of data or operational problems in the process that's writing the file.

(2)我可以在上传文件之前创建文件的本地副本.这可能会非常低效,并且会占用大量不必要的磁盘空间(此文件可能会扩展到数GB的范围,并且正在运行的计算机可能会缺少磁盘空间).

(2) I could create a local copy of the file before uploading it. This could be very inefficient and take up a lot of unnecessary disk space (this file can grow into the several-gigabyte range, and the machine it's running on may be that short of disk space).

根据同事的建议,我的最终解决方案如下:

EDIT 2: My final solution, based on a suggestion from a coworker, looks like this:

private void uploadLogFile(final File logFile) {
    if (logFile.exists()) {
        long byteLength = logFile.length();
        try (
            FileInputStream fileStream = new FileInputStream(logFile);
            InputStream limitStream = ByteStreams.limit(fileStream, byteLength);
        ) {
            ObjectMetadata md = new ObjectMetadata();
            md.setContentLength(byteLength);
            // Set other metadata as appropriate.
            PutObjectRequest req = new PutObjectRequest(bucket, key, limitStream, md);
            s3Client.putObject(req);
        } // plus exception handling
    }
}

LimitInputStream是我的同事建议的,显然不知道它已被弃用. ByteStreams.limit是当前的番石榴替代品,它可以满足我的要求.谢谢大家.

LimitInputStream was what my coworker suggested, apparently not aware that it had been deprecated. ByteStreams.limit is the current Guava replacement, and it does what I want. Thanks, everyone.

推荐答案

完成答案rip&替换:

包装InputStream相对容易,例如限制在发送数据结束信号之前将要传递的字节数. FilterInputStream是针对这种一般性工作的,但是由于您必须覆盖该特殊这项工作的几乎所有方法,因此会遇到麻烦.

It is relatively straightforward to wrap an InputStream such as to cap the number of bytes it will deliver before signaling end-of-data. FilterInputStream is targeted at this general kind of job, but since you have to override pretty much every method for this particular job, it just gets in the way.

这是解决方案的粗略做法:

Here's a rough cut at a solution:

import java.io.IOException;
import java.io.InputStream;

/**
 * An {@code InputStream} wrapper that provides up to a maximum number of
 * bytes from the underlying stream.  Does not support mark/reset, even
 * when the wrapped stream does, and does not perform any buffering.
 */
public class BoundedInputStream extends InputStream {

    /** This stream's underlying @{code InputStream} */
    private final InputStream data;

    /** The maximum number of bytes still available from this stream */ 
    private long bytesRemaining;

    /**
     * Initializes a new {@code BoundedInputStream} with the specified
     * underlying stream and byte limit
     * @param data the @{code InputStream} serving as the source of this
     *        one's data
     * @param maxBytes the maximum number of bytes this stream will deliver
     *        before signaling end-of-data
     */
    public BoundedInputStream(InputStream data, long maxBytes) {
        this.data = data;
        bytesRemaining = Math.max(maxBytes, 0);
    }

    @Override
    public int available() throws IOException {
        return (int) Math.min(data.available(), bytesRemaining);
    }

    @Override
    public void close() throws IOException {
        data.close();
    }

    @Override
    public synchronized void mark(int limit) {
        // does nothing
    }

    @Override
    public boolean markSupported() {
        return false;
    }

    @Override
    public int read(byte[] buf, int off, int len) throws IOException {
        if (bytesRemaining > 0) {
            int nRead = data.read(
                    buf, off, (int) Math.min(len, bytesRemaining));

            bytesRemaining -= nRead;

            return nRead;
        } else {
            return -1;
        }
    }

    @Override
    public int read(byte[] buf) throws IOException {
        return this.read(buf, 0, buf.length);
    }

    @Override
    public synchronized void reset() throws IOException {
        throw new IOException("reset() not supported");
    }

    @Override
    public long skip(long n) throws IOException {
        long skipped = data.skip(Math.min(n, bytesRemaining));

        bytesRemaining -= skipped;

        return skipped;
    }

    @Override
    public int read() throws IOException {
        if (bytesRemaining > 0) {
            int c = data.read();

            if (c >= 0) {
                bytesRemaining -= 1;
            }

            return c;
        } else {
            return -1;
        }
    }
}

这篇关于读取文件的前N个字节作为Java中的InputStream?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆