将数据从Google App Engine导出到csv [英] Export data from Google App Engine to csv

查看:157
本文介绍了将数据从Google App Engine导出到csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

旧回答指向<一个href =http://code.google.com/appengine/docs/python/tools/uploadingdata.html =nofollow noreferrer>谷歌App Engine文档,但该链接现在关于备份您的GAE数据,而不是下载它。

那么如何将所有数据下载到csv中?数据很小, 1 GB

解决方案

我尝试了几种不同的方法来导出到csv,使用概述这里此处。但我无法上班。所以,这就是我所做的(我最大的表格大概是2GB)。尽管它看起来像很多步骤,但它的工作速度相对较快......比打乱随机数的代码还要好,Google可能已经连续数小时改变了它:


  1. 进入云存储并创建2个新桶data_backup和data_export。如果您已有存储桶来存储内容,则可以跳过此操作。

  2. 进入我的控制台> Google数据存储>管理>打开数据存储管理。
  3. 勾选您要备份的实体或实体,然后单击备份实体。我一次只做了一个,因为我只有5个表格可以导出,而不是一次检查所有5个表格。

  4. 指示要存储在其中的Google存储(gs)存储桶

  5. 现在转到Google Big Query(我以前从未使用过这个功能,但是它可以作为蛋糕)

  6. 点击向下的小箭头并选择创建一个新数据集并为其命名。然后,单击刚刚创建的新数据集旁边的向下箭头,然后选择创建新表。浏览选择数据步骤下选择云数据存储备份的步骤。然后选择要导入到Big Query的备份,以便在下一步中将其导出到csv。
  7. 导入表格后(对于我的相当快),请单击向下箭头旁边的表名并选择导出。您可以直接导出到csv,您可以保存到您为导出创建的谷歌存储桶中,然后从那里下载。

这里是一些提示:


  • 如果您的数据具有嵌套关系,则必须导出为JSON而不是CSV(它们还提供了avro格式就是)

  • 我使用json2csv将我导出的JSON文件转换为无法保存为csv的文件。它在大表上运行速度稍慢,但可以完成。

  • 由于json2csv中存在python内存错误,我必须将2GB文件分成2个文件。我使用gsplit来分割文件,并在其他属性>标记和标题>不要添加Gsplit标签...(这可以确保Gsplit不会将任何数据添加到分割文件中)



像我一样说,这实际上很快,即使它是一些步骤。希望它能帮助某人避免花费大量时间来尝试转换奇怪的备份文件格式或运行可能无法工作的代码。


This old answer points to a link on Google App Engine documentation, but that link is now about backup your GAE data, not downloading it.

So how to download all the data into a csv? The data is small, i.e < 1 GB

解决方案

I had tried a couple different approaches to export to csv using steps outlined here and here. But I could not get either to work. So, here is what I did (my largest table was about 2GB). This works relatively quickly even though it seems like a lot of steps...better than fighting random code that Google may have changed for hours on end, too:

  1. Go into Cloud Storage and create 2 new buckets "data_backup" and "data_export". You can skip this if you already have a bucket to store things in.
  2. Go into "My Console" > Google Datastore > Admin > Open Datastore Admin for the datastore you are trying to convert.
  3. Check off the entity or entities that you want to backup and click "Backup Entities". I did one at a time since I only had like 5 tables to export rather than checking off all 5 at once.
  4. Indicate the Google Storage (gs) bucket you want to store them in
  5. Now go to Google Big Query (I had never used this before but it was cake to get going)
  6. Click the little down arrow and select "Create a New Dataset" and give it a name.
  7. Then click the down arrow next to the new dataset you just created and select "Create New Table". Walk through the steps to import selecting "Cloud Datastore Backup" under the Select Data step. Then choose whichever backup that you want to import to Big Query so you can export it to csv in the next step.
  8. Once the table imports (which was pretty quick for mine), click the down arrow next to the table name and select "Export". You can export directly to csv and you can save to the google storage bucket you created for the export and then download from there.

Here's a few tips:

  • If your data has nested relationships, you will have to export to JSON rather than CSV (they also offer avro format whatever that is)
  • I used json2csv to convert my exported JSON files that could not be saved as csv. It runs a little slow on big tables but gets it done.
  • I had to split the 2GB file into 2 files because of a python memory error in json2csv. I used gsplit to split the files and checked off the option under Other Properties > Tags & Headers > Do not add Gsplit tags...(this made sure Gsplit did not add any data to the split files)

Like I said, this was actually pretty quick even though it is a number of steps. Hope it helps someone avoid a bunch of time spent trying to convert strange backup file formats or run code that may not work anymore.

这篇关于将数据从Google App Engine导出到csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆