更复杂的排序:如何对数据进行分类并在类别中对数据进行排序? (Python) [英] More complex sorting: How to cateorize data and sort the data within categories? (Python)

查看:189
本文介绍了更复杂的排序:如何对数据进行分类并在类别中对数据进行排序? (Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对我要修改的当前程序有疑问. 我当前拥有的程序:

i have a question regarding a current program that I am trying to modify. The current program I have:

def extract_names(filename):
  names = []
  f = open(filename, 'rU')
  text = f.read()

  yearmatch = re.search(r'Popularity\sin\s(\d\d\d\d)', text)
  if not yearmatch:
    sys.stderr.write('unavailable year\n')
    sys.exit(1)
  year = yearmatch.group(1)
  names.append(year)

  yeartuples = re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>', text)#finds all patterns of date, boyname, and girlname, creates tuple)

  rankednames = {}
  for rank_tuple in yeartuples:
    (rank, boyname, girlname) = rank_tuple
    if boyname not in rankednames:
      rankednames[boyname] = rank
    if girlname not in rankednames:
      rankednames[girlname] = rank
  sorted_names = sorted(rankednames.keys(), key=lambda x: int(rankednames[x]), reverse = True)
  for name in sorted_names:
    names.append(name + " " + rankednames[name])
  return names[:20]
#Boilerplate from this point**

def main():

  args = sys.argv[1:]

  if not args:
    print 'usage: [--summaryfile] file [file ...]'
    sys.exit(1)

  summary = False
  if args[0] == '--summaryfile':
    summary = True
    del args[0]

  for filename in args:
    names = extract_names(filename)
    text = '\n'.join(names)

    if summary:
      outf = open(filename + '.summary', 'w')
      outf.write(text + '\n')
      outf.close()
    else:
      print text

if __name__ == '__main__':
  main()

从网站上获取有关表中某年最受欢迎的婴儿名字的信息,使用此数据创建列表并按从最低排名(1000)到最高排名( 1).我要进行的修改应该按字母顺序(第一个字母)对所有名称进行排序,但是在每个字母组(所有a字母组,所有b字母组等)中,我试图按降序对名称进行排序在字母组中,因此以a开头的排名最低的名称将是显示的第一个名称.我曾尝试对每个字母进行re.search,但我认为这种方式没有达到预期的效果.我在字母类别中的排序方面遇到了最大的麻烦.还有其他方法/解决方案吗?

Takes information from a website regarding the most popular babynames of a certain year in a table, uses this data to create a list and print out a list of the babynames in order from the lowest rank (1000) to the highest rank (1). The modification I am trying to make is supposed to sort all of the names by alphabet (a first) but within each group of letters (group of all a's, group of all b's etc.) I am trying to sort the names by descending order within the letter groups, so the lowest ranked name that starts with an a would be the first name to show up. I have tried re.search for each letter but I dont think it works as intended that way. I am having the most trouble with the sorting within the letter categories. Are there any other approaches/solutions to this?

推荐答案

在对sorted的调用中,替换:

key=lambda x: int(rankednames[x]), reverse = True

具有:

key=lambda x: (x[0], -int(rankednames[x]))

通常的观点是,您始终可以使用tuple来组合两个或多个不同的排序键,其中一个首先使用,另一个用作平局".重点是我们可以轻松模拟reverse=True,因为键恰好是整数,因此可以取反:此技巧不适用于字符串键.

The general point is that you can always use a tuple to combine two or more different sort keys with one used first and the other as a "tie-breaker". The specific point is that we can easily simulate reverse=True because the key happens to be an integer and therefore can be negated: this trick wouldn't work for a string key.

这篇关于更复杂的排序:如何对数据进行分类并在类别中对数据进行排序? (Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆