databricks-已安装S3-如何获取文件元数据,如上次修改日期(Python) [英] databricks - mounted S3 - how to get file metadata like last modified date (Python)

查看:100
本文介绍了databricks-已安装S3-如何获取文件元数据,如上次修改日期(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在数据砖中安装了一个s3存储桶,可以看到文件列表,也可以使用python读取文件

I have mounted a s3 bucket in my databricks and I can see the list of files and i can read the files as well using python

ACCESS_KEY = "XXXXXXXXXX"
SECRET_KEY = "XXXXXXXXXXXXXX"
ENCODED_SECRET_KEY = SECRET_KEY.replace("/", "%2F")
AWS_BUCKET_NAME = "testbucket"
MOUNT_NAME = "awsmount1"

dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s/data" % MOUNT_NAME))

我想找出我正在读取的文件的最后修改日期,我找不到很多,但是java选项

I want to find out the last modified date of the file i am reading, I couldn't find much but the java option Databricks read Azure blob last modified date for azure blob, is there a python native option in databricks to read the file metadata.

推荐答案

如果我正确理解,则需要使用python native sdk对Azure数据块中装入的文件的最后修改日期.

If i understand correctly, you need the last modified date for mounted file in Azure data bricks using python native sdk.

以下是从Azure blob获取元数据信息的示例代码:

Here is the sample code to get the metadata information from Azure blob:

from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='accoutName', account_key='accountKey')
container_name ='containerName'
block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
    lastModified= BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    print("\t Blob name: " + blob.name)
    print(lastModified)

您可以在如果您正在寻找S3,那么我建议您使用Boto.oto3在使用(S3)Object python对象时为LastModified返回一个datetime对象:

If you are looking fro S3 then i would suggest you to use Boto.oto3 returns a datetime object for LastModified when you use the the (S3) Object python object:

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.last_modified

比较LastModified和今天的日期(Python3):

To compare LastModified to today's date (Python3):

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='eu-west-1')

objects = s3.list_objects(Bucket='my_bucket')

for o in objects["Contents"]:
    if o["LastModified"] == today:
        print(o["Key"])

参考

希望有帮助.

这篇关于databricks-已安装S3-如何获取文件元数据,如上次修改日期(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆