为什么使用SQL Server 2008地理数据类型? [英] Why use the SQL Server 2008 geography data type?

查看:158
本文介绍了为什么使用SQL Server 2008地理数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重新设计一个客户数据库,并且我希望与标准地址字段(街道,城市等)一起存储的新信息之一是地址的地理位置。我想到的唯一用例是允许用户在地图无法找到的情况下映射Google地图上的坐标,这种情况通常发生在新开发区域或远程/乡村地区。



我的第一个倾向是将经度和纬度存储为十进制值,但后来我记得SQL Server 2008 R2有一个 geography 数据类型。我完全没有使用 geography 的经验,而且从我的初步研究来看,它对我的​​情况来说似乎过分了。



例如,要处理存储为 decimal(7,4)的纬度和经度,我可以这样做:

 插入Geotest(经度,纬度)值(47.6475,-122.1393)
从Geotest
选择经度,经度

但是 geography ,我会这样做:



<插入Geotest(Geolocation)值(geography :: Point(47.6475,-122.1393,4326))
从Geotest
中选择Geolocation.Lat,Geolocation.Long

虽然不是那更复杂,但如果我不必?



在放弃使用 geography 的想法之前,有什么我应该考虑的吗?使用空间索引与索引纬度和经度字段搜索位置会更快吗?使用 geography 是否有优势,我不知道?或者,另一方面,我应该知道哪些警告会阻止我使用 geography






更新



@Erik Philips提供了使用 geography ,这非常酷。另一方面,一个快速测试表明,一个简单的选择来获取经纬度是显着的使用 geography (详情如下)时较慢。 ,以及对另一个SO问题的接受的答案的评论关于 geography 有我这样的看法:


@SaphuA不客气。作为一个旁注非常谨慎在可空的GEOGRAPHY数据类型列上使用
空间索引。有一些
严重的性能问题,所以即使您需要重新构建您的模式,也应该使GEOGRAPHY列不可为空
。 - Tomas 6月18日11:18


总而言之,衡量执行邻近搜索的可能性与性能的折衷在这种情况下,我决定放弃使用 geography






我运行的测试的详细信息:

我创建了两个表,一个使用 geography 和另一个使用十进制(9,6)作为经度和纬度:

  CREATE TABLE [dbo]。[GeographyTest] 

[RowId] [int] IDENTITY(1,1)NOT NULL,
[Location] [geography] NOT NULL,
CONSTRAINT [PK_GeographyTest] PRIMARY KEY CLUSTERED([RowId] ASC)


CREATE TABLE [dbo]。[LatLongTest]

[RowId] [int] IDENTITY(1,1)NOT NULL,
[Latitude] [decimal](9,6)NULL,
[Longitude] [decimal](9,6)NULL,
CONSTRAINT [PK_LatLongTest ] PRIMARY KEY CLUSTERED([RowId] ASC)

和i插入到地理测试(位置)值(地理位置)值(地理位置):将单个行使用相同的经度和纬度值插入到每个表中:

 点(47.6475,-122.1393,4326))
插入LatLongTest(经度,纬度)值(47.6475,-122.1393)

最后,运行以下代码表明,在我的机器上,使用 geography 时选择经度和纬度约慢5倍。 p>

  declare @lat float,@long float,
@d datetime2,@repCount int,@trialCount int,
@geographyDuration int,@latlongDuration int,
@trials int = 3,@reps int = 100000

create table #results

GeographyDuration int,
LatLongDuration int


set @trialCount = 0

while @trialCount< @trials
begin

set @repCount = 0
set @d = sysdatetime()

@repCount< @reps
begin
select @lat = Location.Lat,@long = Location.Long from GeographyTest where RowId = 1
set @repCount = @repCount + 1
end
$ b $ set @geographyDuration = datediff(ms,@d,sysdatetime())

set @repCount = 0
set @d = sysdatetime()

而@repCount< @reps
begin
select @lat = Latitude,@long = LatLongTest中的经度,其中RowId = 1
set @repCount = @repCount + 1
结束

set @latlongDuration = datediff(ms,@d,sysdatetime())

插入#results值(@geographyDuration,@latlongDuration)

set @trialCount = @ trialCount + 1

end

从#results

中选择*
选择avg(GeographyDuration)作为AvgGeographyDuration,avg(LatLongDuration)as AvgLatLongDuration
from #results

drop table #results

结果:

  GeographyDuration LatLongDuration 
----------------- - -------------
5146 1020
5143 1016
5169 1030

AvgGeographyDuration AvgLatLongDuration
----- --------------- ------------------
5152 1022

更令人惊讶的是即使没有选择行,例如选择 RowId = 2 (不存在), geography 仍然是较慢:

  GeographyDuration LatLongDuration 
----------------- - --------------
1607 948
1610 946
1607 947

AvgGeographyDuration AvgLatLongDuration
---- ---------------- ------------------
1608 947

b
$ b

 私人设施GetNearestFacilityToJobsite(DbGeography现场)
{
var q1 =从context中获得f。设施
让距离= f.Geocode.Distance(现场)
距离< 500 * 1609.344
按距离排序
选择f;
return q1.FirstOrDefault();
}

然后有一个很好的理由使用Geography。



实体框架中空间的解释 a>。



更新为创建高性能空间数据库



正如我在 Noel Abrahams Answer


关于空间的注释,每个坐标存储为64位(8字节)的双精度浮点数,长和8字节的二进制值大致相当于十进制精度的15位数,因此比较一个只有5字节的小数(9,6)并不完全是一个公平的比较。十进制必须是每个LatLong(总共18个字节)的Decimal(15,12)(9个字节)的最小值,以便进行真正的比较。

因此,比较存储类型:

  CREATE TABLE dbo.Geo 

geo geography

GO

CREATE TABLE dbo.LatLng

lat decimal(15,12),
lng decimal(15,12 )

GO

INSERT dbo.Geo
SELECT geography :: Point(12.3456789012345,12.3456789012345,4326)
UNION ALL
SELECT geography :: Point(87.6543210987654,87.6543210987654,4326)

GO 10000

INSERT dbo.LatLng
SELECT 12.3456789012345,12.3456789012345
UNION
SELECT 87.6543210987654,87.6543210987654

GO 10000

EXEC sp_spaceused'dbo.Geo'

EXEC sp_spaceused'dbo.LatLng'

结果:

 名称行数据
Geo 20000 728 KB
LatLon 20000 560 KB

地理数据类型占用了30%以上的空间。



另外,geography数据类型不仅限于存储Point,您还可以存储 LineString,CircularString,CompoundCurve,Polygon,CurvePolygon,GeometryCollection,MultiPoint,MultiLineString和MultiPolygon等等。任何尝试将最简单的地理类型(如Lat / Long)存储在Point(例如LINESTRING(1 1,2 2)实例)之外的每个点都会产生额外的行,每个点的顺序排序列另一列用于分组线。 SQL Server也有地理数据类型的方法,其中包括计算区域,边界,长度,距离等等。



在Latitude和Longitude中存储十进制数据在Sql Server中似乎是不明智的。



更新2



如果您计划进行距离,面积等任何计算,请正确计算地球是困难的。存储在SQL Server中的每个地理类型也存储有空间参考ID 。这些身份证可以是不同的领域(地球是4326)。这意味着SQL Server中的计算将在地球表面正确计算(而不是 as-the可能会通过地球表面)。




I am redesigning a customer database and one of the new pieces of information I would like to store along with the standard address fields (Street, City, etc.) is the geographic location of the address. The only use case I have in mind is to allow users to map the coordinates on Google maps when the address cannot otherwise be found, which often happens when the area is newly developed, or is in a remote/rural location.

My first inclination was to store latitude and longitude as decimal values, but then I remembered that SQL Server 2008 R2 has a geography data type. I have absolutely no experience using geography, and from my initial research, it looks to be overkill for my scenario.

For example, to work with latitude and longitude stored as decimal(7,4), I can do this:

insert into Geotest(Latitude, Longitude) values (47.6475, -122.1393)
select Latitude, Longitude from Geotest

but with geography, I would do this:

insert into Geotest(Geolocation) values (geography::Point(47.6475, -122.1393, 4326))
select Geolocation.Lat, Geolocation.Long from Geotest

Although it's not that much more complicated, why add complexity if I don't have to?

Before I abandon the idea of using geography, is there anything I should consider? Would it be faster to search for a location using a spatial index vs. indexing the Latitude and Longitude fields? Are there advantages to using geography that I am not aware of? Or, on the flip side, are there caveats that I should know about which would discourage me from using geography?


Update

@Erik Philips brought up the ability to do proximity searches with geography, which is very cool.

On the other hand, a quick test is showing that a simple select to get the latitude and longitude is significantly slower when using geography (details below). , and a comment on the accepted answer to another SO question on geography has me leery:

@SaphuA You're welcome. As a sidenote be VERY carefull of using a spatial index on a nullable GEOGRAPHY datatype column. There are some serious performance issue, so make that GEOGRAPHY column non-nullable even if you have to remodel your schema. – Tomas Jun 18 at 11:18

All in all, weighing the likelihood of doing proximity searches vs. the trade-off in performance and complexity, I've decided to forgo the use of geography in this case.


Details of the test I ran:

I created two tables, one using geography and another using decimal(9,6) for latitude and longitude:

CREATE TABLE [dbo].[GeographyTest]
(
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Location] [geography] NOT NULL,
    CONSTRAINT [PK_GeographyTest] PRIMARY KEY CLUSTERED ( [RowId] ASC )
) 

CREATE TABLE [dbo].[LatLongTest]
(
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Latitude] [decimal](9, 6) NULL,
    [Longitude] [decimal](9, 6) NULL,
    CONSTRAINT [PK_LatLongTest] PRIMARY KEY CLUSTERED ([RowId] ASC)
) 

and inserted a single row using the same latitude and longitude values into each table:

insert into GeographyTest(Location) values (geography::Point(47.6475, -122.1393, 4326))
insert into LatLongTest(Latitude, Longitude) values (47.6475, -122.1393)

Finally, running the following code shows that, on my machine, selecting the latitude and longitude is approximately 5 times slower when using geography.

declare @lat float, @long float,
        @d datetime2, @repCount int, @trialCount int, 
        @geographyDuration int, @latlongDuration int,
        @trials int = 3, @reps int = 100000

create table #results 
(
    GeographyDuration int,
    LatLongDuration int
)

set @trialCount = 0

while @trialCount < @trials
begin

    set @repCount = 0
    set @d = sysdatetime()

    while @repCount < @reps
    begin
        select @lat = Location.Lat,  @long = Location.Long from GeographyTest where RowId = 1
        set @repCount = @repCount + 1
    end

    set @geographyDuration = datediff(ms, @d, sysdatetime())

    set @repCount = 0
    set @d = sysdatetime()

    while @repCount < @reps
    begin
        select @lat = Latitude,  @long = Longitude from LatLongTest where RowId = 1
        set @repCount = @repCount + 1
    end

    set @latlongDuration = datediff(ms, @d, sysdatetime())

    insert into #results values(@geographyDuration, @latlongDuration)

    set @trialCount = @trialCount + 1

end

select * 
from #results

select avg(GeographyDuration) as AvgGeographyDuration, avg(LatLongDuration) as AvgLatLongDuration
from #results

drop table #results

Results:

GeographyDuration LatLongDuration
----------------- ---------------
5146              1020
5143              1016
5169              1030

AvgGeographyDuration AvgLatLongDuration
-------------------- ------------------
5152                 1022

What was more surprising is that even when no rows are selected, for example selecting where RowId = 2, which doesn't exist, geography was still slower:

GeographyDuration LatLongDuration
----------------- ---------------
1607              948
1610              946
1607              947

AvgGeographyDuration AvgLatLongDuration
-------------------- ------------------
1608                 947

解决方案

If you plan on doing any spatial computation, EF 5.0 allows LINQ Expressions like:

private Facility GetNearestFacilityToJobsite(DbGeography jobsite)
{   
    var q1 = from f in context.Facilities            
             let distance = f.Geocode.Distance(jobsite)
             where distance < 500 * 1609.344     
             orderby distance 
             select f;   
    return q1.FirstOrDefault();
}

Then there is a very good reason to use Geography.

Explanation of spatial within Entity Framework.

Updated with Creating High Performance Spatial Databases

As I noted on Noel Abrahams Answer:

A note on space, each coordinate is stored as a double-precision floating-point number that is 64 bits (8 bytes) long, and 8-byte binary value is roughly equivalent to 15 digits of decimal precision, so comparing a decimal(9,6) which is only 5 bytes, isn't exactly a fair comparison. Decimal would have to be a minimum of Decimal(15,12) (9 bytes) for each LatLong (total of 18 bytes) for a real comparison.

So comparing storage types:

CREATE TABLE dbo.Geo
(    
geo geography
)
GO

CREATE TABLE dbo.LatLng
(    
    lat decimal(15, 12),   
    lng decimal(15, 12)
)
GO

INSERT dbo.Geo
SELECT geography::Point(12.3456789012345, 12.3456789012345, 4326) 
UNION ALL
SELECT geography::Point(87.6543210987654, 87.6543210987654, 4326) 

GO 10000

INSERT dbo.LatLng
SELECT  12.3456789012345, 12.3456789012345 
UNION
SELECT 87.6543210987654, 87.6543210987654

GO 10000

EXEC sp_spaceused 'dbo.Geo'

EXEC sp_spaceused 'dbo.LatLng'

Result:

name    rows    data     
Geo     20000   728 KB   
LatLon  20000   560 KB

The geography data-type takes up 30% more space.

Additionally the geography datatype is not limited to only storing a Point, you can also store LineString, CircularString, CompoundCurve, Polygon, CurvePolygon, GeometryCollection, MultiPoint, MultiLineString, and MultiPolygon and more. Any attempt to store even the simplest of Geography types (as Lat/Long) beyond a Point (for example LINESTRING(1 1, 2 2) instance) will incur additional rows for each point, a column for sequencing for the order of each point and another column for grouping of lines. SQL Server also has methods for the Geography data types which include calculating Area, Boundary, Length, Distances, and more.

It seems unwise to store Latitude and Longitude as Decimal in Sql Server.

Update 2

If you plan on doing any calculations like distance, area, etc, properly calculating these over the surface of the earth is difficult. Each Geography type stored in SQL Server is also stored with a Spatial Reference ID. These id's can be of different spheres (the earth is 4326). This means that the calculations in SQL Server will actually calculate correctly over the surface of the earth (instead of as-the-crow-flies which could be through the surface of the earth).

这篇关于为什么使用SQL Server 2008地理数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆