哈希表和套管 [英] hashtable and casing

查看:53
本文介绍了哈希表和套管的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

做一个大量使用域名的项目,例如

" www.yahoo.com。"

域名保留案例但是如果相同则名称相同,但是

情况不同。

我知道我可以将这些名称存储为带有大小写的哈希表中的键

comparer和CaseInsensitiveHashCodeProvider 。这很好。好处

是我可以存储域名而不用担心案例并返回用户

提供的案例而不存储状态等。但是,这需要付出代价

因为现在所有的字符串比较操作都必须区分大小写,例如

endswith等等。如果大小写都是更低的话。例如,字符串比较非常快,如果实习,那么真的很快。我可以将域名存储为所有

小写,然后存储一个bitArray,告诉我哪些字符在哪里上面

的情况。然而,这似乎是一个痛苦,仍然需要至少32字节

为255个字符的域名,或3字节为20个字符名称。我还可以

将原始案例存储为字符串,并使用小写版本

用于所有比较,结束,散列等操作。然而,这需要两倍的存储空间。对于

重复项,可以使用字符串实习。这是我在性能方面最具吸引力的选择我认为,但是想知道别人怎么想?干杯!


-

William Stacey,MVP

Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP

推荐答案

Go回来几天,看看IndexOf我帮助了一个用户。

你应该为这个写自己的自定义字符串操作

给你最大速度权衡,你不会是文化

意识到。在这种情况下,你的案例工作不需要文化意识

它只需要遵循相当严格的域名的RFC'。

- -

Justin Rogers

DigiTec Web Consultants,LLC。

博客: http://weblogs.asp.net/justin_rogers


" William Stacey [MVP]" < ST *********** @ mvps.org>在消息中写道

news:uT ************** @ TK2MSFTNGP10.phx.gbl ...
Go back a couple of days and look up that IndexOf I helped a user with.
You should probably write your own custom string operations for this one
that give you maximum speed with the trade-off, that you won''t be culture
aware. In this case, your case insens work does not need to be culture aware
it simply has to follow the RFC''s for domain names which are fairly strict.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
做一个制作的项目大量使用域名,例如
www.yahoo.com。
域名保留案例,但如果名称相同但情况不同,则说明相同。
我知道我可以将这些名称存储为带有大小写的
comparer和CaseInsensitiveHashCodeProvider的哈希表中的键。这很好。好处
是我可以存储域名而不用担心案例并返回用户提供的案例而不存储状态等。但是,这需要付出代价因为所有字符串比较现在的操作必须是区分大小写的,例如
endswith等。如果大小写都是低的话。例如,字符串比较非常快,如果实习,那么真的很快。我可以将域名存储为所有
小写,然后存储一个bitArray,告诉我哪些字符上面的情况。然而,这似乎是一个痛苦,仍然需要至少32个字节
为255个字符的域名,或3个字节的20个字符名称。我还可以将原始案例存储为字符串,并将小写版本用于所有比较,结束,散列等操作。然而,这需要两倍的存储空间。这可以通过字符串实习来利用
重复。就我的性能而言,这是我最有吸引力的选择,但是想知道别人怎么想?干杯!

-
William Stacey,MVP
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP



为什么你要归还在输入
的情况下返回用户域(实际上更有意义的是将其更正为小写我

认为,因为所有域都是小写的他们输入的任何其他内容都是

可能输错了。


你甚至可以通过反向DNS查找和存储来纠正域名

结果。


" William Stacey [MVP]" < ST *********** @ mvps.org>在消息中写道

news:uT ************** @ TK2MSFTNGP10.phx.gbl ...
Why do you have to return the domain back to the user in the case it was
entered (it would actually make more sense to correct it to lower case I
think, because all domains are in lower case and anything else they enter is
probably mistyped).

You could even correct the domain by doing a reverse DNS lookup and storing
the result.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
做一个制作的项目大量使用域名,例如
www.yahoo.com。
域名保留案例,但如果名称相同但情况不同,则说明相同。
我知道我可以将这些名称存储为带有大小写的
comparer和CaseInsensitiveHashCodeProvider的哈希表中的键。这很好。
benifit是我可以存储域名而不用担心大小写并返回
用户提供的案例而不存储状态等。但是,这需要
的成本因为所有字符串比较现在的操作必须是区分大小写的,例如
endswith等。如果大小写都是低的话。例如,字符串比较是
非常快,如果实习,那么真的很快。我可以将域名存储为所有
小写,然后存储一个bitArray,告诉我哪些字符上面的情况。然而,这似乎是一个痛苦,仍然需要至少32个字节
为255个字符的域名,或3个字节的20个字符名称。我还可以将原始案例存储为字符串,并将
用于所有比较,结束,散列等操作的小写版本。但是这个
会使所需的存储空间翻倍。这可以通过字符串实习来利用
重复。就我的性能而言,这是我最有吸引力的选择,但是想知道别人怎么想?干杯!

-
William Stacey,MVP
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit is I can store the domain name and not worry about case and return the user supplied case without storing an state, etc. However, this comes at a cost because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is used for all compare, endswith, hash, etc operations. However this doubles the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP



>想想,因为所有域都是小写的,所以他们输入的任何东西

> think, because all domains are in lower case and anything else they enter
is
可能是错误的类型)。


这很好,但1034-1035不允许。在域名和标签的情况下,你必须保留



你甚至可以通过反向DNS查找和存储结果的
来纠正域。 />
William Stacey [MVP]" < ST *********** @ mvps.org>在消息中写道
新闻:uT ************** @ TK2MSFTNGP10.phx.gbl ...
probably mistyped).
That would be nice, but not allowed by the 1034-1035 . You must preserve
the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and storing the result.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
做一个大量使用域名的项目名称如
www.yahoo.com。
域名保留案例,但如果名称相同
则相同,但案例不同。
我知道我可以将这些名称存储为具有大小写不兼容的哈希表中的键
comparer和CaseInsensitiveHashCodeProvider。这很好。
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The


benifit


benifit

是我可以存储域名而不用担心大小写并返回
is I can store the domain name and not worry about case and return the


用户

提供的情况没有存储状态等。但是,这来自
supplied case without storing an state, etc. However, this comes at a

成本
因为所有字符串比较操作现在必须区分大小写,例如
endswith,如果情况都是低的话。例如,字符串比较是
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is


非常

快速,如果实习,那么真的很快。我可以将域名存储为
全部小写,然后存储一个bitArray,告诉我哪些字符
大写。然而,这看起来很痛苦,并且仍然需要至少32
字节用于255个字符的域名,或3个字节用于20个字符名称。我还可以将原始案例存储为字符串,将
fast and if interned, then really fast. I could store domain name as all lower case and then store a bitArray that tells me what chars where upper case. However that seems like a pain and still requires at least 32 bytes for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that


的小写版本存储为


is

,用于所有比较,结束,散列等操作。但是这个
used for all compare, endswith, hash, etc operations. However this


加倍了

所需的存储空间。这可以通过字符串实习来利用
重复。就我的性能而言,这是我最有吸引力的选择,但是想知道别人怎么想?干杯!

-
William Stacey,MVP
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP







这篇关于哈希表和套管的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆