有效读取SO的数据转储 [英] To read SO's data dump effectively

查看:110
本文介绍了有效读取SO的数据转储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前使用Vim读取 SO的数据转储.但是,当我仅向下滚动几行时,我的Macbook速度会变慢.这表明我必须有更有效的方式来读取数据.

I use currently Vim to read SO's data dump. However, my Macbook slows down when I roll down just a few rows. This suggests me that there must be more efficient ways to read the data.

我不太了解MySQL.这些文件为.xml格式.目前很难在.xml中读取数据.将xml文件转换为MySQL,然后读取文件可能会更有效.我知道只有MS db -tool可以执行此类操作.但是,我也想知道另一种工具.

I know little MySQL. The files are in .xml -format. It is rather hard to read the data at the moment in .xml. It may be more efficient to convert the xml -files to MySQL and then read the files. I know only MS db -tool for such actions. However, I would like to know another tool too.

问题

  1. 将.xml解析为SQL查询,以便MySQL理解它. 我们需要了解数据的数据结构.
  2. 在MySQL中运行数据
  3. 找到一些类似于MS db -tool的工具,通过它我们可以有效地读取数据

您如何有效读取SO的数据转储?

-

  1. 如何运行 523 SQL查询在终端中创建数据库? 此刻我在文本文件中有了命令.
  2. 如何将数据库中的[恢复模式]切换为简单恢复模式?
  1. How can you run the 523 SQL queries to create the database in your terminal? I have the commands at the moment in a text -file.
  2. How can you "switch to [the recovery mode] to a simple recovery mode in the database?

推荐答案

我制作了我的第一个python程序,以读取它们并输出用于MySQL的SQL插入语句(虽然很丑,但是可​​以工作).您需要首先手动创建表格.

I made my first ever python program to read them and output SQL insert statements for use with mysql (It's ugly but worked). You'll need to create the tables first though by hand.

import xml.sax.handler
import xml.sax
import sys
class SOHandler(xml.sax.handler.ContentHandler):
        def __init__(self):
                self.errParse = 0

        def startElement(self, name, attributes):
                if name != "row":
                        self.table = name;
                        self.outFile = open(name+".sql","w")
                        self.errfile = open(name+".err","w")
                else:
                        skip = 0
                        currentRow = u"insert into "+self.table+"("
                        for attr in attributes.keys():
                                currentRow += str(attr) + ","
                        currentRow = currentRow[:-1]
                        currentRow += u") values ("
                        for attr in attributes.keys():
                                try:
                                        currentRow += u'"{0}",'.format(attributes[attr].replace('\\','\\\\').replace('"', '\\"').replace("'", "\\'"))
                                except UnicodeEncodeError:
                                        self.errParse += 1;
                                        skip = 1;
                                        self.errfile.write(currentRow)
                        if skip != 1:
                                currentRow = currentRow[:-1]
                                currentRow += u");"
                                #print len(attributes.keys())
                                self.outFile.write(currentRow.encode("utf-8"))
                                self.outFile.write("\n")
                                self.outFile.flush()
                                print currentRow.encode("utf-8");

        def characters(self, data):
                pass

        def endElement(self, name):
                pass

if len(sys.argv) < 2:
        print "Give me an xml file argument!"
        sys.exit(1)

parser = xml.sax.make_parser()
handler = SOHandler()
parser.setContentHandler(handler)
parser.parse(sys.argv[1])
print handler.errParse

这篇关于有效读取SO的数据转储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆