Python Google Drive API - 列出整个驱动器文件树 [英] Python Google Drive API - list the entire drive file tree

查看:346
本文介绍了Python Google Drive API - 列出整个驱动器文件树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个使用Google驱动器API的python应用程序,所以开发很好,但我有一个问题来检索整个Google驱动器文件树,我需要这样做的目的有两个:


  1. 检查路径是否存在,所以如果我想在root / folder1 / folder2下上传test.txt我想检查文件是否已经存在,它建立一个可视化的文件浏览器,现在我知道谷歌提供了他自己的(我现在不记得名字,但我知道存在),但我想限制文件浏览器到特定文件夹。

现在我有一个函数可以获取Gdrive的根目录,我可以通过递归调用一个函数该列表列出了单个文件夹的内容,但速度极其缓慢,可能会向Google发出数千次请求,这是不可接受的。



这里获取root:

  def drive_get_root():
检索一个roo t文件资源列表。
返回:
词典列表。


#build服务,driveHelper模块将负责认证和凭证存储
drive_service = build('drive','v2',driveHelper.buildHttp ())
#结果将是一个列表
result = []
page_token = None
while:
try:
param = {}
if page_token:
param ['pageToken'] = page_token
files = drive_service.files()。list(** param).execute()
#将文件添加到列表
result.extend(files ['items'])
page_token = files.get('nextPageToken')
如果不是page_token:
break
, .HttpError,_error:
print'发生错误:%s'%_error
中断
返回结果

,这里是从文件夹中获取文件的文件夹

  def drive_files_in_folder(folder_id) :
打印属于文件夹的文件。

参数:
folder_id:从中获取文件的文件夹ID。

#build服务,driveHelper模块将负责认证和凭证存储
drive_service = build('drive','v2',driveHelper.buildHttp())
#结果将是一个列表
result = []
#代码从谷歌,工作,所以我没有触及它
page_token =无
而真:
尝试:
param = {}
$ b $如果page_token:
param ['pageToken'] = page_token

children = drive_service.children ().list(folderId = folder_id,** param).execute()
$ b $ for child.get('items',[]):
result.append(drive_get_file( child ['id']))
$ b $ page_token = children.get('nextPageToken')
如果不是page_token:
break
except errors.HttpError,_error:
print'发生错误:%s'%_error
中断
返回结果

,例如现在到检查一个文件是否存在我正在使用这个:

$ p $ def $ drive_path_exist(file_path,list = False):

这是递归函数来检查给定的路径是否存在


#如果列表参数为空,则将列表设置为Gdrive $的根目录b $ b if list == False:
list = drive_get_root()

#打开字符串获取第一项并检查是否在根目录
file_path = string中。 split(file_path,/)

#如果文件路径中只有一个元素,我们就是实际文件名
#so如果在这个文件夹中,我们可以返回它
如果len(file_path)== 1:
exists = False
为列表中的元素:
如果elem [title] == file_path [0]:
#set exist = elem,因为elem是一个包含所有文件信息的字典
exists = elem

返回存在
#如果我们不是最后一个元素,我们必须保留在列表中搜索
else:
exists = False

#check如果当前项目在文件夹
中if elem [title] == file_path [0]:
exists = True
folder_id = elem [id ]
#delete第一个元素并继续搜索
file_path.pop(0)

如果存在:
#递归调用,我们必须重新加入filpath作为字符串作为列表传递列表
#from drive_file_exist函数
return drive_path_exist(/。join(file_path),drive_files_in_folder(folder_id))

任何想法如何解决我的问题?我在这里看到了一些关于溢出的讨论,在一些答案中,人们写道这是可能的,但当然没有说如何!



谢谢

解决方案

停止将Drive视为树结构。事实并非如此。 文件夹只是标签,例如。一个文件可以有多个父母。



为了在您的应用程序中构建树的表示形式,您需要执行此操作...


  1. 运行磁盘列表查询以检索所有文件夹
  2. 迭代结果数组并检查父项属性以构建内存层次结构

  3. 运行第二个云端硬盘列表查询以获取所有非文件夹(即文件)

  4. 对于每个返回的文件, -memory tree

如果您只是想检查文件夹B中是否存在文件A,则该方法取决于名称文件夹-B保证是唯一的。



如果它是唯一的,只需为title ='file-A'执行FilesList查询,然后为每个文件执行一次Files Get如果'folder-C'和'folder'都存在'folder-B'的话,那么这个文件夹的名字就是'folder-B' -D',那么它更复杂,你需要从步骤1和步骤2 ab构建内存中的层次结构您不会说这些文件和文件夹是由您的应用程序创建的,还是由用户使用Google Drive Web应用程序创建的。如果您的应用程序是这些文件/文件夹的创建者,则可以使用一种技巧将搜索限制为单个根目录。假设您有

  MyDrive / app_root / folder-C / folder-B / file-A 

您可以制作所有文件夹-c,文件夹-B和文件-A app_root的子元素

通过这种方式,您可以将所有查询限制在父母
中包含

 和'app_root_id'  


I'm building a python application that uses the Google drive APIs, so fare the development is good but I have a problem to retrieve the entire Google drive file tree, I need that for two purposes:

  1. Check if a path exist, so if i want upload test.txt under root/folder1/folder2 I want to check if the file already exist and in the case update it
  2. Build a visual file explorer, now I know that google provides his own (I can't remember the name now, but I know that exist) but I want to restrict the file explorer to specific folders.

For now I have a function that fetch the root of Gdrive and I can build the three by recursive calling a function that list me the content of a single folder, but it is extremely slow and can potentially make thousand of request to google and this is unacceptable.

Here the function to get the root:

def drive_get_root():
"""Retrieve a root list of File resources.
Returns:
List of dictionaries.
"""

#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
page_token = None
while True:
    try:
        param = {}
        if page_token:
            param['pageToken'] = page_token
        files = drive_service.files().list(**param).execute()
        #add the files in the list
        result.extend(files['items'])
        page_token = files.get('nextPageToken')
        if not page_token:
            break
    except errors.HttpError, _error:
        print 'An error occurred: %s' % _error
    break
return result

and here the one to get the file from a folder

def drive_files_in_folder(folder_id):
"""Print files belonging to a folder.

Args:
folder_id: ID of the folder to get files from.
"""
#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
#code from google, is working so I didn't touch it
page_token = None
while True:
    try:
        param = {}

        if page_token:
            param['pageToken'] = page_token

        children = drive_service.children().list(folderId=folder_id, **param).execute()

        for child in children.get('items', []):
            result.append(drive_get_file(child['id']))

        page_token = children.get('nextPageToken')
        if not page_token:
            break
    except errors.HttpError, _error:
        print 'An error occurred: %s' % _error
        break
return result

and for example now to check if a file exist I'm using this:

def drive_path_exist(file_path, list = False):
"""
This is a recursive function to che check if the given path exist
"""

#if the list param is empty set the list as the root of Gdrive
if list == False:
    list = drive_get_root()

#split the string to get the first item and check if is in the root
file_path = string.split(file_path, "/")

#if there is only one element in the filepath we are at the actual filename
#so if is in this folder we can return it
if len(file_path) == 1:
    exist = False
    for elem in list:
        if elem["title"] == file_path[0]:
            #set exist = to the elem because the elem is a dictionary with all the file info
            exist = elem

    return exist
#if we are not at the last element we have to keep searching
else:
    exist = False
    for elem in list:
        #check if the current item is in the folder
        if elem["title"] == file_path[0]:
            exist = True
            folder_id = elem["id"]
            #delete the first element and keep searching
            file_path.pop(0)

    if exist:
        #recursive call, we have to rejoin the filpath as string an passing as list the list
        #from the drive_file_exist function
        return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))

any idea how to solve my problem? I saw a few discussion here on overflow and in some answers people wrote that this is possible but of course the didn't said how!

Thanks

解决方案

Stop thinking about Drive as being a tree structure. It isn't. "Folders" are simply labels, eg. a file can have multiple parents.

In order to build a representation of a tree in your app, you need to do this ...

  1. Run a Drive List query to retrieve all Folders
  2. Iterate the result array and examine the parents property to build an in-memory hierarchy
  3. Run a second Drive List query to get all non-folders (ie. files)
  4. For each file returned, place it in your in-memory tree

If you simply want to check if file-A exists in folder-B, the approach depends on whether the name "folder-B" is guaranteed to be unique.

If it's unique, just do a FilesList query for title='file-A', then do a Files Get for each of its parents and see if any of them are called 'folder-B'.

If 'folder-B' can exist under both 'folder-C' and 'folder-D', then it's more complex and you'll need to build the in-memory hierarchy from steps 1 and 2 above.

You don't say if these files and folders are being created by your app, or by the user with the Google Drive Webapp. If your app is the creator of these files/folders there is a trick you can use to restrict your searches to a single root. Say you have

MyDrive/app_root/folder-C/folder-B/file-A

you can make all of folder-C, folder-B and file-A children of app_root

That way you can constrain all of your queries to include

and 'app_root_id' in parents

这篇关于Python Google Drive API - 列出整个驱动器文件树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆