R 和 MySQL - 在使用 dbWriteTable() 写入之前检查表中是否存在一行 [英] R and MySQL - checking if a row exists in a table before writing to it using dbWriteTable()

查看:59
本文介绍了R 和 MySQL - 在使用 dbWriteTable() 写入之前检查表中是否存在一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 R 包 RMySQL 和 dbConnect 将我的 R 与我公司的 MySQL 数据库连接起来.简而言之,我的 R 脚本连接到 MySQL 数据库,从数据库中提取一个表 (table1),使用该表进行一些分析,然后写入 MySQL 数据库中的另一个表 (table2).

I am using the R packages RMySQL and dbConnect to connect my R with my Company's MySQL database. In short, my R script connects to the MySQL database, pulls one table (table1) from the database, uses that table to do some analyses, and then writes to another table (table2) in the MySQL database.

所以 table2 不是从数据库中提取的,而是需要附加我在 R 中所做的工作.将我在 R 中所做的其他一些分析添加到表中,然后将表重新写入 MySQL数据库,将新信息附加到前一个数据库中.我目前用来写入 table2 的代码行是:

So table2 is not pulled from the database, but needs to be appended with the work I do in R. adds to the table with some other analysis I've done in R, and then re-writes the table to the MySQL database, appending the previous one with the new information. The line of code that I currently use to write to table2 is:

dbWriteTable(con_hub, value = my_R_dataframe, name = "table2", append = TRUE, row.names = FALSE)

在写之前,我需要确保 my_R_dataframe 中没有已经在 table2 数据库表中的行,因为重复处理很麻烦.有人对此有什么建议吗?另外 - 我不希望将 table2 从 MySQL 数据库中提取到 R 中,因为该查询会为我的代码添加相当多的时间.

I need to make sure that there are no rows in my_R_dataframe that are already in the table2 database table before writing this, since duplicates are a pain to handle. Does anybody have any suggestions for this? Also - I would prefer not to have to pull table2 from the MySQL database into R, since that query would add a decent bit of time to my code.

谢谢,

推荐答案

考虑以下两种方法之一:SQL 驱动的重复数据删除或 R 驱动的重复数据删除解决方案.对于前者,您需要使用运行经典的临时临时表 LEFT JOIN...IS NULL/NOT EXISTS/NOT IN SQL 查询.对于后者,您将导入数据框中的所有内容,附加您当前的 df,然后运行 ​​R 的 unique().但是,听起来您不打算使用后者,但我向未来的读者展示.

Consider one of two ways: an SQL-driven de-dupe or R-driven de-dupe solution. For the former, you would need use of a temp, staging table that runs the classic LEFT JOIN...IS NULL/NOT EXISTS/NOT IN SQL query. For latter, you would import all content in a dataframe, append your current df, and run R's unique(). However, it sounds like you do not intend for latter but I show for future readers.

SQL (使用具有目标表精确结构的临时表)

# OVERWRITE TEMP EACH TIME
dbWriteTable(con_hub, value = my_R_dataframe, 
                      name = "table2_temp", 
                      overwrite = TRUE,                        
                      row.names = FALSE)

# RUN LEFT JOIN...IS NULL QUERY (COMPARE COLS --COL1, COL2, COL3-- ADD/REMOVE AS NEEDED)
dbSendQuery(con_hub, paste0("INSERT INTO table2", 
                            " SELECT * FROM table2_temp",
                            " LEFT JOIN table2",
                            "   ON table2_temp.col1 = table2.col1", 
                            "   AND table2_temp.col2 = table2.col2",
                            "   AND table2_temp.col3 = table2.col3",
                            " WHERE table2.col1 IS NULL",
                            "   OR table2.col2 IS NULL",
                            "   OR table2.col3 IS NULL"))

R (读入 table2 数据,考虑是否占用资源过多,如果对所有列进行去重,则优先考虑)

# RETRIEVE table2 DATA
table2df <- dbGetQuery(con_hub, "SELECT * FROM table2")

# APPEND BOTH DATAFRAMES
stackeddf <- rbind(table2df, my_R_dataframe)

# RETURN UNIQUE ROWS
finaldf <- unique(stackeddf)

# OVERWRITE DESTINATION TABLE EACH TIME
dbWriteTable(con_hub, value = finaldf, 
                      name = "table2", 
                      overwrite = TRUE,                         
                      row.names = FALSE)

# CLEAN UP ENVIRON OF UNNEEDED OBJECTS
rm(table2df, stackeddf, finaldf)
gc()

这篇关于R 和 MySQL - 在使用 dbWriteTable() 写入之前检查表中是否存在一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆