按域名排序? [英] Sort by domain name?

查看:80
本文介绍了按域名排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨列表,


我有一个URL列表,我想按域名对该列表进行排序。


这里,域名不包含子域名,

或者我应该说,域名''www''的一部分,邮件,新闻和en应该被排除在外。

例如,如果列表是以下

---------------------------- --------------------------------
http://mail.google.com
http://reader.google.com
http://mail.yahoo.co.uk
http://google.com
http://mail.yahoo.com
--------------------------------------------- ---------------


排序'的输出窝你好吗

---------------------------------------- --------------------
http: //google.com
http://mail.google.com
http://reader.google.com
< a rel =nofollowhref =http://mail.yahoo.co.uktarget =_ blank> http://mail.yahoo.co.uk
http://mail.yahoo.com

------- -------------------------------------------------- ---


如上所示,我不想

提前致谢。

解决方案

" js" < eb ***** @ gmail.comwrites:


这里,域名不包含子域名,

或者应该我说,域名''www'',邮件,新闻和en的一部分应该是

排除在外。



这有点复杂,你必须像.com那样对待co.uk关于

,同样的其他一些国家

但不是全部。例如,subdomain.companyname.de与

subdomain.companyname.com.au或subdomain.companyname.co.uk。

您最终需要一个表或特殊代码to say

如何对待各个国家。


>这里,域名不包含子域名,或者我应该是


>说,域名''www'',邮件,新闻和en的一部分应该被排除在外。



这有点复杂,你必须像.com那样对待co.uk关于

,同样的其他一些国家

但不是全部。例如,subdomain.companyname.de与

subdomain.companyname.com.au或subdomain.companyname.co.uk。

您最终需要一个表或特殊代码要说怎么治疗

各个国家。



此外,即使仅使用基础

域名,您也会得到非常不同的结果,例如whitehouse等。根据您是否使用

" .gov"或.com或TLD的变体。因此,我不确定是否有任何方式可以从yahoo.com中辨别出这个例子。 vs.

" yahoo.co.uk"变量没有做大量的WHOIS查询,

反过来可能会产生误导。


首过解决方案可能类似于:


########################################## ######### ############>>>

网站

[''http:/ /mail.google.com'',''http://reader.google.com'',

''http://mail.yahoo.co.uk'',''http ://google.com'',

''http://mail.yahoo.com'']


>> sitebits = [site.lower()。sltrip(''http://'')。split(''。'')for



网站中的网站]


>>适用于sitebits中的网站:site.reverse()



....


>> sorted(sitebits)



[[''com'',''google''],[''com'',''google'',''mail''],[''com'' ,''google'',

''读者'',[''a

m'',''yahoo'',''mail''] ,['''',''''',''雅虎'',''邮件'']]


>> results = [''http://''+(''。''。join(reverse(site)))for site



in sorted(sitebits)]


>>结果



[''http://google.com'', 'http://mail.google.com'',

''http://reader.google.com'',''http://mail.yahoo.com'',

''http://mail.yahoo.co.uk'']

################### ###########################################

可以像这样包裹起来:


######################### #####################################

< blockquote class =post_quotes>


>> def sort_by_domain(sites):



.... sitebits = [site.lower()。sltrip(''http://'')。split(''。'')for

site in sites]

....对于sitebits中的网站:site.reverse()

.... return [''http://''+('''''' .join(reverse(site)))for site

in sorted(sitebits)]

....

< blockquote class =post_quotes>


>> s = sites
sort_by_domain(sites)



[''http ://google.com'',''http://mail.google.com'',

''http://reader.google.com'',''http:/ /mail.yahoo.com'',

''http://mail.yahoo.co.uk'']

######### ################################################## ###


为您提供排序功能。它假定http而不是

具有混合的url类型,例如ftp或mailto。它们很容易

足以剥离,但重新开始它会变成一个更多的锻炼。


只是一些想法,


-tkc





Paul Rubin写道:


" js" < eb ***** @ gmail.comwrites:


这里,域名不包含子域名,

或者应该我说,域名''www'',邮件,新闻和en的一部分应该是

排除在外。



这有点复杂,你必须像.com那样对待co.uk关于

,同样的其他一些国家

但不是全部。例如,subdomain.companyname.de与

subdomain.companyname.com.au或subdomain.companyname.co.uk。

您最终需要一个表或特殊代码要说

如何对待各个国家。



另外,您如何订购https:,ftp,带有www。,www2的网址。 ,

命名锚点等?


温馨提醒:这是作业吗?你可以期待更好的回应

如果你在某种程度上证明你已经自己解决了这个问题。


Hi list,

I have a list of URL and I want to sort that list by the domain name.

Here, domain name doesn''t contain subdomain,
or should I say, domain''s part of ''www'', mail, news and en should be excluded.

For example, if the list was the following
------------------------------------------------------------
http://mail.google.com
http://reader.google.com
http://mail.yahoo.co.uk
http://google.com
http://mail.yahoo.com
------------------------------------------------------------

the sort''s output would be
------------------------------------------------------------
http://google.com
http://mail.google.com
http://reader.google.com
http://mail.yahoo.co.uk
http://mail.yahoo.com
------------------------------------------------------------

As you can see above, I don''t want to
Thanks in advance.

解决方案

"js " <eb*****@gmail.comwrites:

Here, domain name doesn''t contain subdomain,
or should I say, domain''s part of ''www'', mail, news and en should be
excluded.

It''s a little more complicated, you have to treat co.uk about
the same way as .com, and similarly for some other countries
but not all. For example, subdomain.companyname.de versus
subdomain.companyname.com.au or subdomain.companyname.co.uk.
You end up needing a table or special code to say
how to treat various countries.


>Here, domain name doesn''t contain subdomain, or should I

>say, domain''s part of ''www'', mail, news and en should be
excluded.


It''s a little more complicated, you have to treat co.uk about
the same way as .com, and similarly for some other countries
but not all. For example, subdomain.companyname.de versus
subdomain.companyname.com.au or subdomain.companyname.co.uk.
You end up needing a table or special code to say how to treat
various countries.

In addition, you get very different results even on just "base"
domain-name, such as "whitehouse" based on whether you use the
".gov" or ".com" variant of the TLD. Thus, I''m not sure there''s
any way to discern this example from the "yahoo.com" vs.
"yahoo.co.uk" variant without doing a boatload of WHOIS queries,
which in turn might be misleading anyways.

A first-pass solution might look something like:

################################################## ############>>>
sites
[''http://mail.google.com'', ''http://reader.google.com'',
''http://mail.yahoo.co.uk'', ''http://google.com'',
''http://mail.yahoo.com'']

>>sitebits = [site.lower().lstrip(''http://'').split(''.'') for

site in sites]

>>for site in sitebits: site.reverse()

....

>>sorted(sitebits)

[[''com'', ''google''], [''com'', ''google'', ''mail''], [''com'', ''google'',
''reader''], [''co
m'', ''yahoo'', ''mail''], [''uk'', ''co'', ''yahoo'', ''mail'']]

>>results = [''http://'' + (''.''.join(reversed(site))) for site

in sorted(sitebits)]

>>results

[''http://google.com'', ''http://mail.google.com'',
''http://reader.google.com'', ''http://mail.yahoo.com'',
''http://mail.yahoo.co.uk'']
################################################## ############

which can be wrapped up like this:

################################################## ############

>>def sort_by_domain(sites):

.... sitebits = [site.lower().lstrip(''http://'').split(''.'') for
site in sites]
.... for site in sitebits: site.reverse()
.... return [''http://'' + (''.''.join(reversed(site))) for site
in sorted(sitebits)]
....

>>s = sites
sort_by_domain(sites)

[''http://google.com'', ''http://mail.google.com'',
''http://reader.google.com'', ''http://mail.yahoo.com'',
''http://mail.yahoo.co.uk'']
################################################## ############

to give you a sorting function. It assumes http rather than
having mixed url-types, such as ftp or mailto. They''re easy
enough to strip off as well, but putting them back on becomes a
little more exercise.

Just a few ideas,

-tkc




Paul Rubin wrote:

"js " <eb*****@gmail.comwrites:

Here, domain name doesn''t contain subdomain,
or should I say, domain''s part of ''www'', mail, news and en should be
excluded.


It''s a little more complicated, you have to treat co.uk about
the same way as .com, and similarly for some other countries
but not all. For example, subdomain.companyname.de versus
subdomain.companyname.com.au or subdomain.companyname.co.uk.
You end up needing a table or special code to say
how to treat various countries.

Plus, how do you order "https:", "ftp", URLs with "www.", "www2." ,
named anchors etc?

Gentle reminder: is this homework? And you can expect better responses
if you show youve bootstrapped yourself on the problem to some extent.


这篇关于按域名排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆