更复杂的排序:如何对数据进行分类并在类别中对数据进行排序? (Python) [英] More complex sorting: How to cateorize data and sort the data within categories? (Python)
问题描述
我对我要修改的当前程序有疑问. 我当前拥有的程序:
i have a question regarding a current program that I am trying to modify. The current program I have:
def extract_names(filename):
names = []
f = open(filename, 'rU')
text = f.read()
yearmatch = re.search(r'Popularity\sin\s(\d\d\d\d)', text)
if not yearmatch:
sys.stderr.write('unavailable year\n')
sys.exit(1)
year = yearmatch.group(1)
names.append(year)
yeartuples = re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>', text)#finds all patterns of date, boyname, and girlname, creates tuple)
rankednames = {}
for rank_tuple in yeartuples:
(rank, boyname, girlname) = rank_tuple
if boyname not in rankednames:
rankednames[boyname] = rank
if girlname not in rankednames:
rankednames[girlname] = rank
sorted_names = sorted(rankednames.keys(), key=lambda x: int(rankednames[x]), reverse = True)
for name in sorted_names:
names.append(name + " " + rankednames[name])
return names[:20]
#Boilerplate from this point**
def main():
args = sys.argv[1:]
if not args:
print 'usage: [--summaryfile] file [file ...]'
sys.exit(1)
summary = False
if args[0] == '--summaryfile':
summary = True
del args[0]
for filename in args:
names = extract_names(filename)
text = '\n'.join(names)
if summary:
outf = open(filename + '.summary', 'w')
outf.write(text + '\n')
outf.close()
else:
print text
if __name__ == '__main__':
main()
从网站上获取有关表中某年最受欢迎的婴儿名字的信息,使用此数据创建列表并按从最低排名(1000)到最高排名( 1).我要进行的修改应该按字母顺序(第一个字母)对所有名称进行排序,但是在每个字母组(所有a字母组,所有b字母组等)中,我试图按降序对名称进行排序在字母组中,因此以a开头的排名最低的名称将是显示的第一个名称.我曾尝试对每个字母进行re.search,但我认为这种方式没有达到预期的效果.我在字母类别中的排序方面遇到了最大的麻烦.还有其他方法/解决方案吗?
Takes information from a website regarding the most popular babynames of a certain year in a table, uses this data to create a list and print out a list of the babynames in order from the lowest rank (1000) to the highest rank (1). The modification I am trying to make is supposed to sort all of the names by alphabet (a first) but within each group of letters (group of all a's, group of all b's etc.) I am trying to sort the names by descending order within the letter groups, so the lowest ranked name that starts with an a would be the first name to show up. I have tried re.search for each letter but I dont think it works as intended that way. I am having the most trouble with the sorting within the letter categories. Are there any other approaches/solutions to this?
推荐答案
在对sorted
的调用中,替换:
key=lambda x: int(rankednames[x]), reverse = True
具有:
key=lambda x: (x[0], -int(rankednames[x]))
通常的观点是,您始终可以使用tuple
来组合两个或多个不同的排序键,其中一个首先使用,另一个用作平局".重点是我们可以轻松模拟reverse=True
,因为键恰好是整数,因此可以取反:此技巧不适用于字符串键.
The general point is that you can always use a tuple
to combine two or more different sort keys with one used first and the other as a "tie-breaker". The specific point is that we can easily simulate reverse=True
because the key happens to be an integer and therefore can be negated: this trick wouldn't work for a string key.
这篇关于更复杂的排序:如何对数据进行分类并在类别中对数据进行排序? (Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!