将MySQL转储导入R(不需要MySQL服务器) [英] Import MySQL dump into R (without requiring MySQL server)

查看:189
本文介绍了将MySQL转储导入R(不需要MySQL服务器)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

包括 RMySQL sqldf 允许与本地或远程数据库服务器连接。我正在创建一个可移植项目,涉及在不总是访问正在运行的服务器,但是总是访问最新的.sql转储的情况下(或在设备上)导入sql数据。数据库。

Packages like RMySQL and sqldf allow one to interface with local or remote database servers. I'm creating a portable project which involves importing sql data in cases (or on devices) which do not always have access to a running server, but which do always have access to the latest .sql dump of the database.

目标似乎足够简单:将.sql转储导入R而不涉及MySQL服务器。想创建一个列表列表,其中元素对应于在.sql转储中定义的任何数据库(可能有多个),这些元素又由这些数据库中的表组成。

The goal seems simple enough: import an .sql dump into R without the involvement of a MySQL server. More specifically, I'd like to create a list of lists in which the elements correspond to any databases defined in the .sql dump (there may be multiple), and those elements in turn consist of the tables in those databases.

为了使这个可重现,让我们来看一个示例sportsdb SQL文件在这里 - 如果你解压缩它叫sportsdb_sample_mysql_20080303.sql。

To make this reproducible, let's take the sample sportsdb SQL file here — if you unzip it it's called sportsdb_sample_mysql_20080303.sql.

一个人会认为sqldf可以这样做:

One would think sqldf might be able to do it:

read.csv.sql('sportsdb_sample_mysql_20080303.sql',sql =SELECT * FROM addresses)
sqliteSendQuery con,statement,bind.data):
语句中的错误:no such table:addresses

转储中的表地址。 此信息 sqldf列表提到同样的错误,但没有解决方案。

This even though there certainly is a table addresses in the dump. This post on the sqldf list mentions the same error, but no solution.

然后在包 ProjectTemplate 中有一个 sql.reader $ c>,看起来很有前途。在此处,您可以在此处,它假定正在运行的数据库服务器并依赖于 RMySQL - 而不是我需要的。

Then there is an sql.reader function in the package ProjectTemplate, which looks promising. Poking around, the source for the function can be found here, and it assumes a running database server and relies on RMySQL — not what I need.

。我们似乎耗尽了选项。任何帮助从hivemind赞赏!

So... we seem to be running out of options. Any help from the hivemind appreciated!

(重申,我不是寻找依赖于访问SQL服务器的解决方案; dbreadTable RMySQL 包。我非常想绕过服务器,并从.sql转储文件直接获取数据。)

(To reiterate, I am not looking for a solution that relies on access to an SQL server; that's easy with dbReadTable from the RMySQL package. I would very much like to bypass the server and get the data straight from the .sql dump file.)

推荐答案

取决于你想从表中提取什么,这里是如何玩弄数据

depending on what you want to extract from the table, here is how you can play around with the data

numLines <- R.utils::countLines("sportsdb_sample_mysql_20080303.sql")
# [1] 81266

linesInDB <- readLines("sportsdb_sample_mysql_20080303.sql",n=60)

可以做一些regex获取表名(在CREATE TABLE之后),列名(在第一个括号之间)和VALUES(CREATE TABLE之后和第二个括号之间的行)

Then you can do some regex to get tables names (after CREATE TABLE), column names (between first brackets) and VALUES (lines after CREATE TABLE and between second brackets)


逆向工程mysqldump输出MySQL Workbench给出从尖点开始的语句包含非UTF8字符错误

编辑:为了回应OP的回答,如果我解释python脚本正确,逐行读取,过滤INSERT INTO行,解析为csv,然后写入文件。这与我原来的建议非常相似。如果文件大小太大,最好使用一些其他R包读取大块文件。

in response to OP's answer, if i interpret the python script correct, it is also reading it line by line, filter for INSERT INTO lines, parse as csv, then write to file. This is very similar to my original suggestion. My version below in R. If the file size is too large, it would be better to read in the file in chunks using some other R package

rm(list=ls())
options(stringsAsFactors=F)
library(utils)
library(stringi)
library(plyr)

mysqldumpfile <- "sportsdb_sample_mysql_20080303.sql"

allLines <- readLines(mysqldumpfile)
insertLines <- allLines[which(stri_detect_fixed(allLines, "INSERT INTO"))]
allwords <- data.frame(stri_extract_all_words(insertLines, " "))
d_ply(allwords, .(X3), function(x) {
    #x <- split(allwords, allwords$X3)[["baseball_offensive_stats"]]
    print(x[1,3])

    #find where the header/data columns start and end
    valuesCol <- which(x[1,]=="VALUES")
    lastCols <- which(apply(x, 2, function(y) all(is.na(y))))
    datLastCol <- head(c(lastCols, ncol(x)+1), 1) - 1

    #format and prepare for write to file
    df <- data.frame(x[,(valuesCol+1):datLastCol])
    df <- setNames(df, x[1,4:(valuesCol-1)])
    #type convert before writing to file otherwise its all strings
    df[] <- apply(df, 2, type.convert)
    #write to file
    write.csv(df, paste0(x[1,3],".csv"), row.names=F)
})

这篇关于将MySQL转储导入R(不需要MySQL服务器)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆