存储大量数据最聪明的方式 [英] Smartest way to store huge amounts of data

查看：172 发布时间：2016/8/5 19:14:02 python database web-scraping beautifulsoup bigdata

本文介绍了存储大量数据最聪明的方式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想用REST请求访问Flickr的API和下载大约元数据。 1神达照片（也许更多）。
我想将它们存储在一个.csv文件，并将其导入然后进入MySQL数据库进行进一步的处理。

I want to access the flickr API with a REST request and download the Metadata of approx. 1 Mio photos (maybe more). I want to store them in a .csv file and import them then into a MySQL Database for further processing

我想知道什么是处理这种大数据的最聪明的方式。什么我不知道的是如何将它们存储访问网站的Python中，将它们传递给.csv文件，并从那里到数据库后。那是一个很大的问号。

I am wondering what is the smartest way to handle such big data. What I am not sure about is how to store them after accessing the website in Python, passing them to the .csv file and from there to the db. Thats one big questionmark.

请告诉我现在发生的事情（我的理解，见code以下）是一个字典是每个创建照片（250元称为URL）。这样，我将结束与尽可能多的词典作为照片（1百万或更多）。那可能吗？
所有这些词典将被添加到列表中。我可以有很多字典追加到一个列表？我想在字典添加到列表中的唯一原因是因为它似乎更容易的方式从一个列表保存，每鳞次栉比，为.csv文件。

Whats happening now (for my understanding, see code below) is that a dictionary is created for every photo (250 per called URL). This way I would end up with as many dictionaries as photos (1 Mio or more). Is that possible? All these dictionaries will be appended to a list. Can I append that many dictionaries to a list? The only reason I want to append the dictionaries to the list is because it seems way easier to save from a list, row per row, to a .csv file.

你应该知道的是，我是一个初学者到编程，Python或都没有。我的专业是完全不同的一个，我刚开始学。如果您需要任何进一步的解释，请让我知道！

What you should know is that I am a complete beginner to programming, python or what so ever. My profession is a completely different one and I just started to learn. If you need any further explanations please let me know!

#accessing website
list = []
url = "https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...1b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description"
soup = BeautifulSoup(urlopen(url)) #soup it up
for data in soup.find_all('photo'):
    dict = {
        "id": data.get('id'),
        "title": data.get('title'),
        "tags": data.get('tags'),
        "latitude": data.get('latitude'),
        "longitude": data.get('longitude'),
    }
print (dict)

list.append(dict)

我使用Python 3.3的工作。我为什么不通过数据直接进入数据库的原因是因为我无法获得MySQL数据库蟒蛇连接器在我的OS X 10.6运行。

I am working with python 3.3. The reason why I do not pass the data direct into the db is because I cannot get the python connecter for mysql db on my os x 10.6 to run.

任何帮助是非常AP preciated。
谢谢乡亲！

Any help is very appreciated. Thank you folks!

存储大量数据最聪明的方式 [英] Smartest way to store huge amounts of data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

存储大量数据最聪明的方式 [英] Smartest way to store huge amounts of data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭