SQLite - WHERE 子句 &UDF [英] SQLite - WHERE Clause & UDFs

查看:19
本文介绍了SQLite - WHERE 子句 &UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 SQLite 表,其中包含 198,305 个经过地理编码的葡萄牙邮政编码:

I have the following SQLite table with 198,305 geocoded portuguese postal codes:

CREATE TABLE "pt_postal" (
  "code" text NOT NULL,
  "geo_latitude" real(9,6) NULL,
  "geo_longitude" real(9,6) NULL
);

CREATE UNIQUE INDEX "pt_postal_code" ON "pt_postal" ("code");
CREATE INDEX "coordinates" ON "pt_postal" ("geo_latitude", "geo_longitude");

我在 PHP 中还有以下用户定义的函数,用于返回两个坐标之间的距离:

I also have the following user defined function in PHP that returns the distance between two coordinates:

$db->sqliteCreateFunction('geo', function ()
{
    if (count($data = func_get_args()) < 4)
    {
        $data = explode(',', implode(',', $data));
    }

    if (count($data = array_map('deg2rad', array_filter($data, 'is_numeric'))) == 4)
    {
        return round(6378.14 * acos(sin($data[0]) * sin($data[2]) + cos($data[0]) * cos($data[2]) * cos($data[1] - $data[3])), 3);
    }

    return null;
});

只有 874 条记录与 38.73311, -9.138707 的距离小于或等于 1 公里.

Only 874 records have a distance from 38.73311, -9.138707 smaller or equal to 1 km.

UDF 在 SQL 查询中完美运行,但由于某种原因我不能在 WHERE 子句中使用它的返回值 - 例如,如果我执行查询:

The UDF is working flawlessly in SQL queries, but for some reason I cannot use it's return value in WHERE clauses - for instance, if I execute the query:

SELECT
    "code",
    geo(38.73311, -9.138707, "geo_latitude", "geo_longitude") AS "distance"
    FROM "pt_postal" WHERE 1 = 1
        AND "geo_latitude" BETWEEN 38.7241268076 AND 38.7420931924
        AND "geo_longitude" BETWEEN -9.15022289523 AND -9.12719110477
        AND "distance" <= 1
    ORDER BY "distance" ASC
LIMIT 2048;

它在 ~0.05 秒内返回 1035 条记录distance 排序,然而最后一条记录有一个 "距离"1.353 公里(比我在最后一个 WHERE 中定义为最大值的 1 公里还要大).

It returns 1035 records ordered by distance in ~0.05 seconds, however the last record has a "distance" of 1.353 km (which is bigger than the 1 km I defined as the maximum in the last WHERE).

如果我删除以下条款:

AND "geo_latitude" BETWEEN 38.7241268076 AND 38.7420931924
AND "geo_longitude" BETWEEN -9.15022289523 AND -9.12719110477

现在查询需要将近 6 秒,并返回按 distance 排序的 2048 条记录(我的 LIMIT).它应该需要这么长时间,但它应该只返回具有 "distance" 的 874 条记录.<= 1.

Now the query takes nearly 6 seconds and returns 2048 records (my LIMIT) ordered by distance. It's supposed take this long, but it should only return the 874 records that have "distance" <= 1.

原始查询返回的EXPLAIN QUERY PLAN:

SEARCH TABLE pt_postal USING INDEX coordinates (geo_latitude>? AND geo_latitude<?)
#(~7500 rows)
USE TEMP B-TREE FOR ORDER BY

并且没有坐标边界:

SCAN TABLE pt_postal
#(~500000 rows)
USE TEMP B-TREE FOR ORDER BY


我想做什么

我想我知道为什么会这样,SQLite 正在这样做:


What I Would Like to Do

I think I know why this is happening, SQLite is doing:

  1. 使用索引coordinates过滤掉WHERE子句中边界外的记录
  2. 通过distance"过滤这些记录<= 1 WHERE 子句,但是 distance 仍然是 NULL =>0
  3. 填充代码"和距离"(通过第一次调用 UDF)
  4. 按距离"排序(现在已填充)
  5. 限制记录
  1. use index coordinates to filter out the records outside of the boundaries in the WHERE clauses
  2. filter those records by the "distance" <= 1 WHERE clause, but distance is still NULL => 0!
  3. populate "code" and "distance" (by calling the UDF for the first time)
  4. order by the "distance" (which is populated by now)
  5. limit the records

我希望 SQLite 做什么:

What I would like SQLite to do:

  1. 使用索引coordinates过滤掉WHERE子句中边界外的记录
  2. 对于这些记录,通过调用 UDF 填充 codedistance
  3. distance"过滤记录<= 1 WHERE 子句
  4. 按距离"排序(无需再次调用 UDF)
  5. 限制记录
  1. use index coordinates to filter out the records outside of the boundaries in the WHERE clauses
  2. for those records, populate code and distance by calling the UDF
  3. filter the records by the "distance" <= 1 WHERE clause
  4. order by the "distance" (without calling the UDF again)
  5. limit the records

谁能解释我如何让 SQLite 以我想要的方式运行(如果可能的话)?

出于好奇,我尝试对调用 UDF 两次会慢多少进行基准测试:

Just out of curiosity, I tried to benchmark how much slower calling the UDF twice would be:

SELECT
    "code",
    geo(38.73311, -9.138707, "geo_latitude", "geo_longitude") AS "distance"
    FROM "pt_postal" WHERE 1 = 1
        AND "geo_latitude" BETWEEN 38.7241268076 AND 38.7420931924
        AND "geo_longitude" BETWEEN -9.15022289523 AND -9.12719110477
        AND geo(38.73311, -9.138707, "geo_latitude", "geo_longitude") <= 1
    ORDER BY "distance" ASC
LIMIT 2048;

令我惊讶的是,它仍然在相同的 ~0.06 秒内运行 - 它仍然(错误地!)返回 1035 条记录.

To my surprise, it still runs in the same ~0.06 seconds - and it still (wrongly!) returns the 1035 records.

似乎第二个 geo() 调用甚至没有被评估......但是 应该,对吗?

Seems like the second geo() call is not even being evaluated... But it should, right?

推荐答案

基本上,我使用 sprintf() 来查看正在计算的边界坐标类型,因为我无法运行除了 PHP 之外的任何地方的查询(因为 UDF)我正在用准备好的语句生成另一个查询.问题是,我没有生成最后一个绑定参数(distance <= ? 子句中的公里),而且我被我的 sprintf() 版本愚弄了.

Basically, I was using sprintf() to see what kind of bounding coordinates where being computed, and since I couldn't run the query on any place other than PHP (because of the UDF) I was generating another query with prepared statements. The problem was, I wasn't generating the last bound parameter (the kilometers in the distance <= ? clause) and I was fooled by my sprintf() version.

我想我不应该在困的时候尝试编码.真的很抱歉浪费您的时间,谢谢大家!

Guess I shouldn't try to code when I'm sleepy. I'm truly sorry for your wasted time, and thank you all!

为了完整起见,以下在 ~ 0.04 秒内返回(正确!)873 条记录:

Just for the sake of completeness, the following returns (correctly!) 873 records, in ~ 0.04 seconds:

SELECT "code",
    geo(38.73311, -9.138707, "geo_latitude", "geo_longitude") AS "distance"
    FROM "pt_postal" WHERE 1 = 1
        AND "geo_latitude" BETWEEN 38.7241268076 AND 38.7420931924
        AND "geo_longitude" BETWEEN -9.15022289523 AND -9.12719110477
        AND "distance" <= 1
    ORDER BY "distance" ASC
LIMIT 2048;

这篇关于SQLite - WHERE 子句 &amp;UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆