创建连续的整数，然后使用dplyr为存储在sqlserver中的表创建索引 [英] create consecutive integer and then create index to a table stored in sqlserver using dplyr

查看：219 发布时间：2018/8/2 14:03:00 r indexing dplyr

本文介绍了创建连续的整数，然后使用dplyr为存储在sqlserver中的表创建索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在对存储在sqlserver中的一些大型表进行一些数据处理，创建索引有时会减少某些R脚本运行所需的时间。我尝试使用 dplyr 的 mutate 函数来创建一个新列（ idx ）连续编号，然后使用 idx 列作为索引。但mutate函数似乎不起作用并且经常给我这个错误：

I am doing some data processing of some large tables stored in sqlserver that creating an index sometimes reduce the time needed for some R script to run. I try to use the mutate function of dplyr to create a new column (idx) with consecutive number, then use that idxcolumn as index. But the mutate function seems not working and constantly give me this error:

> tbl(channel,'tbl_iris') %>% mutate(idx=1:n())
Error in from:to : NA/NaN argument
In addition: Warning message:
In 1:n() : NAs introduced by coercion\

现在我正在做一些似乎这样对我来说非常愚蠢，绕过上面的错误信息：

Right now I am doing something that seems quite stupid to me like this, to "bypass" the above error message:

iris <- tbl(channel,'tbl_iris') %>% 
  collect %>%
  mutate(idx=1:n())

try(db_drop_table(channel,'##iris'))
copy_to(channel,iris,'##iris',temporary=FALSE)
db_create_index(channel,'##iris',columns='idx')

有没有更好的方法呢？谢谢！

Is there any better way of doing this? Thanks!

我试过 mutate（idx = row_number（）） 根据@Phil的建议，它无效并显示以下错误消息：

I tried mutate(idx = row_number()) as suggested by @Phil, it is not working and show the following error message:

> tbl(channel,'##iris') %>%
+   mutate(idx=row_number())
Error: <SQL> 'SELECT  TOP 10 "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", row_number() OVER () AS "idx"
FROM "##iris"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The function 'row_number' must have an OVER clause with ORDER BY. 
> tbl(channel,'##iris') %>%
+   arrange(Species) %>%
+   mutate(idx=row_number())
Error: <SQL> 'SELECT  TOP 10 "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", row_number() OVER (ORDER BY "Species") AS "idx"
FROM (SELECT *
FROM "##iris"
ORDER BY "Species") "kwtundzona"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

更新02

我试过的方式由@Moody_Mudskipper建议，似乎有效

Update 02

I tried the way suggested by @Moody_Mudskipper, and it seems working

> try(db_drop_table(channel,'##iris'))
[1] 0
> copy_to(channel,iris,'##iris',temporary=FALSE)
> tbl(channel,'##iris') %>% head(.,1)
# Source:   lazy query [?? x 5]
# Database: Microsoft SQL Server 11.00.6251[dbo@WCDCHCMS9999\CMSAH_DC7_999/data_xx_yyy]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl> <chr>  
1         5.10        3.50         1.40       0.200 setosa 
> 
> DBI::dbSendQuery(channel,"ALTER TABLE ##iris ADD idx INT IDENTITY(1,1) NOT NULL")
<OdbcResult>
  SQL  ALTER TABLE ##iris ADD idx INT IDENTITY(1,1) NOT NULL
  ROWS Fetched: 0 [complete]
       Changed: 0
> db_create_index(channel,'##iris',columns='idx')
[1] 0
Warning message:
In new_result(connection@ptr, statement) : Cancelling previous query
> tbl(channel,'##iris') %>% head(.,5)
# Source:   lazy query [?? x 6]
# Database: Microsoft SQL Server 11.00.6251[dbo@WCDCHCMS9999\CMSAH_DC7_999/data_xx_yyy]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   idx
         <dbl>       <dbl>        <dbl>       <dbl> <chr>   <int>
1         5.10        3.50         1.40       0.200 setosa      1
2         4.90        3.00         1.40       0.200 setosa      2
3         4.70        3.20         1.30       0.200 setosa      3
4         4.60        3.10         1.50       0.200 setosa      4
5         5.00        3.60         1.40       0.200 setosa      5

我将尝试修改我的脚本，看看这是否会提供类似的性能提升与我以前更愚蠢的方法相比。

I will try to modify my script to see if this gives similar performance boost when compared to my previous more silly method.

除了下面显示的错误信息外，我希望事情按计划进行。

Other than the error message shown as below, I hope things are working as planned.

Warning message:
In new_result(connection@ptr, statement) : Cancelling previous query

创建连续的整数，然后使用dplyr为存储在sqlserver中的表创建索引 [英] create consecutive integer and then create index to a table stored in sqlserver using dplyr

问题描述

更新02

Update 02

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

创建连续的整数，然后使用dplyr为存储在sqlserver中的表创建索引 [英] create consecutive integer and then create index to a table stored in sqlserver using dplyr

问题描述

更新02

Update 02

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭