使用Python API创建文件后,如何立即从Google Vault导出中下载文件? [英] How to download files from Google Vault export immediately after creating it with Python API?

查看:108
本文介绍了使用Python API创建文件后,如何立即从Google Vault导出中下载文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python API,我创建了一个导出.如何使用相同的授权服务在导出文件中下载.zip文件?创建导出时,我可以看到cloudStorageSink的bucketName和objectNames,但是找不到有关如何使用创建导出的现有服务将它们下载到主机的文档.

Using Python API, I have created an export. How do I download the .zip file in the export using the same authorized service? When creating the export, I can see the bucketName and objectNames of the cloudStorageSink, however I cannot find any documentation on how to download them to my host using the existing service that created the export

#!/usr/bin/env python
from __future__ import print_function

import datetime
import json
import time

from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools

# If modifying these scopes, delete the file token.json.
SCOPES = 'https://www.googleapis.com/auth/ediscovery'

def list_exports(service, matter_id):
    return service.matters().exports().list(matterId=matter_id).execute()


def get_export_by_id(service, matter_id, export_id):
    return service.matters().exports().get(matterId=matter_id, exportId=export_id).execute()

def get_service():
    '''
    Look for an active credential token, if one does not exist, use credentials.json
    and ask user for permission to access.  Store new token, return the service object
    '''
    store = file.Storage('token.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('credentials.json', SCOPES)
        creds = tools.run_flow(flow, store)
    service = build('vault', 'v1', http=creds.authorize(Http()))

    return service


def create_drive_export(service, matter_id, export_name, num_days):
    """
    once we have a matter_id , we can create an export under it with the relevant files we are looking for.

    """
    # set times for beginning and end of query:
    today = datetime.datetime.now()
    print("creating a drive export at {}".format(today))
    start_time = today - datetime.timedelta(days=num_days)

    drive_query_options = {'includeTeamDrives': True}
    user_list = ['me@gmail.com']
    drive_query = {
        'corpus': 'DRIVE',
        'dataScope': 'ALL_DATA',
        'searchMethod': 'ACCOUNT',
        'accountInfo': {
            'emails': user_list
        },
        'driveOptions': drive_query_options,
        # end time is more recent date, start time is older date
        'endTime': '{}-{}-{}T00:00:00Z'.format(today.year, today.month, today.day),
        'startTime': '{}-{}-{}T00:00:00Z'.format(start_time.year, start_time.month, start_time.day),
        'timeZone': 'Etc/GMT'
    }

    wanted_export = {
        'name': export_name,
        'query': drive_query,
        'exportOptions': {
            'driveOptions': {}
        }
    }

    return service.matters().exports().create(matterId=matter_id, body=wanted_export).execute()


def get_export(service, matter_id, export_id):
    return service.matters().exports().get(matterId=matter_id, exportId=export_id).execute()


def main():
    service = get_service()
    matter_id = '<known_matter_id>'
    timestamp = datetime.datetime.now().strftime("%Y%m%d.%H%M%s")
    export = create_drive_export(service, matter_id, "code_gen_export.{}".format(timestamp), 1)

    # check every 5 seconds until export is done being created:
    while export['status'] == 'IN_PROGRESS':
        export = get_export(service, matter_id, export['id'])
        print('...')
        time.sleep(5)

    # print(json.dumps(export, indent=2))
    print(json.dumps(export['cloudStorageSink']['files'], indent=2))


if __name__ == '__main__':
    main()

并运行上面的代码将产生:

and running the above code produces:

creating a drive export at 2018-09-20 17:12:38.026402
...
...
...
...
...
...
[
  {
    "md5Hash": "hash_value",
    "bucketName": "bucket_string",
    "objectName": "object1_string/code_gen_export.20180920.17121537481558-custodian-docid.csv",
    "size": "1684"
  },
  {
    "md5Hash": "hash_value",
    "bucketName": "bucket_string",
    "objectName": "object2_string/code_gen_export.20180920.17121537481558-metadata.xml",
    "size": "10600"
  },
  {
    "md5Hash": "hash_value",
    "bucketName": "bucket_string",
    "objectName": "object3_string/code_gen_export.20180920.17121537481558_0.zip",
    "size": "21599222"
  }
]

我可以使用在get_service()中创建的服务对象下载.zip文件吗?

Can I download the .zip file using the service object I created in get_service()?

推荐答案

经过以上长时间的努力,我在Google的API支持代理之一的帮助下找到了正确的方法.

After a long struggle with the above, I found the right approach with the aid of one of Googles' API support agents.

请注意,您将需要使用以下方法来创建新服务:

Notice that you will need to create a new service using:

build('storage', 'v1', credentials=credentials)

cradintials是:

where cradintials is:

service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE,
        scopes=SCOPES, 
        subject='user@domain.com'
)

(可能与您用于凭据的参数相同:http=creds.authorize(Http())可以正常工作-我没有尝试过)

(it may be that the same argument you used for your credentials: http=creds.authorize(Http()) will work as weel - I did not try that)

此外,您将需要使用字节流库,例如:io并导入googleapiclient.http.

In addition you will need to use a byte stream library such as: io and import googleapiclient.http as well.

完整代码:

import io
from google.oauth2 import service_account
from googleapiclient.discovery import build
import googleapiclient.http


SCOPES = ['https://www.googleapis.com/auth/devstorage.full_control']
SERVICE_ACCOUNT_FILE = 'yourServiceAccountFile.json'
bucket_name = 'yourBucketName'
object_name = 'yourObjectName.zip'

credentials = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE,
        scopes=SCOPES, 
        subject='user@domain.com'
)

service = build('storage', 'v1', credentials=credentials)

req = service.objects().get_media(bucket=bucket_name, object=object_name)

out_file = io.BytesIO()
downloader = googleapiclient.http.MediaIoBaseDownload(out_file, req)

done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download {}%.".format(int(status.progress() * 100)))

file_name = '/Users/myUser/Downloads/new_file.zip'
open(file_name, "w").write(out_file.getvalue())

这篇关于使用Python API创建文件后,如何立即从Google Vault导出中下载文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆