从远程读取 h5 文件 [英] Read h5 file from remote

查看:48
本文介绍了从远程读取 h5 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我无法从我的服务器读取 h5 文件.我的服务器上有 ssh,服务器也是本地的.所以我有两种类型的代码:

I have a problem, I cannot read h5 file from my server. I have ssh on my server, also server is local. So I have two type of code:

store1 = pd.HDFStore(os.system("scp newrow_data_copy.h5 lucy@192.168.1.51:media/lucy/hdd1/hdf_row/Archive1"))

错误是预期字节,得到整数.另外 os.system 说错了,期望的字符串

Error is Expected bytes, got int. In addition os.system says wrong, expected string

store1 = pd.HDFStore('//192.168.1.51/media/lucy/hdd1/hdf_row/Archive1/newrow_data_copy.h5', mode='r')

错误:文件不存在.尽管如此,我还是在服务器上看到了该文件.

Error: The file doesn't exist. Nevertheless, I see the file on the server.

出了什么问题,我应该怎么做才能从远程服务器读取 h5 文件.我无法下载,因为文件足够大.

Whats wrong and what should I do to read h5 file from remote server. I can't download, because the file is huge enough.

推荐答案

您知道读取整个远程文件就是下载,对吗?是将文件下载到工作内存还是磁盘是完全不同的问题.

You are aware that reading a whole remote file is, by definition, downloading, right? Whether you download the file to your working memory or a disk is a whole different issue.

话虽如此,除非您愿意编写自己的 tty 模拟器,否则 sshscp 都不会对您有太大帮助,因此只需安装 paramiko 模块并将其用于 Python 中的所有远程 SSH/SFTP 需求.在您的情况下,应该这样做:

That being said, both ssh and scp won't help you much unless you're willing to write your own tty emulator, so instead just install the paramiko module and use it for all your remote SSH/SFTP needs within Python. In your case, this should do it:

import pandas as pd
import paramiko

ssh = paramiko.SSHClient()  # start the client
ssh.load_system_host_keys()  # load local host keys
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())  # add the host keys automatically
ssh.connect("192.168.1.51", 22, "lucy", "your_password")  # replace the password with yours

sftp = ssh.open_sftp()  # start a SFTP session
# 'open' the remote file, adjust the path based on your home path (or use an absolute path)
target = sftp.open("media/lucy/hdd1/hdf_row/Archive1/newrow_data_copy.h5")

更新:但这就是您只能获取远程文件句柄的方式(您可以流式传输、查找和对本地文件执行任何其他操作),遗憾的是再看一下 - HDFStore 需要文件的路径并通过 PyTables 执行所有文件处理,因此除非您想破解 PyTables 以处理远程数据(并且您不t) 最好的办法是安装 sshfs 并将远程文件系统挂载到您本地的,然后让 Pandas 将远程文件视为本地文件,例如:

UPDATE: But that's how you only get the a remote file handle (which you can stream, seek and do whatever else you would to your local file), sadly on second look - HDFStore expects a path to the file and performs all the file handling through PyTables so unless you want to hack PyTables to work with remote data (and you don't) your best bet is to install sshfs and mount your remote file system to your local one, and then let Pandas treat the remote files as local ones, something like:

sshfs lucy@192.168.1.51:media/lucy/hdd1 ~/hdf

然后在 Python 中:

And then in Python:

import os
import pandas as pd

store1 = pd.HDFStore(os.path.expanduser("~/hdf/hdf_row/Archive1/newrow_data_copy.h5"))

不会直接下载文件,除非指示 PyTables 存储文件而不是在内存中读取文件.

The file won't be directly downloaded, unless PyTables is instructed to store the file instead of reading it in-memory.

这篇关于从远程读取 h5 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆