Featured image of post 常用内建模块Collections模块的使用

常用内建模块Collections模块的使用

常用内建模块Collections模块的使用

Collections模块介绍

前面介绍了python内建数据结构包括 列表(list)元组(tuple)字典(dict)collections模块在这些内置数据类型的基础上,提供了几个额外的数据类型:

  • namedtuple: 生成可以使用名字来访问元素内容的tuple子类
  • deque: 双端队列,可以快速的从另外一侧追加和推出对象
  • Counter: 计数器,主要用来计数
  • OrderedDict: 有序字典
  • defaultdict: 带有默认值的字典

下面对collections模块中的这几个数据类型进行详细的介绍。

常用内建模块之双端队列deque

  • collections模块中双端队列deque结构可以看作是内置list结构的加强版,且比队列提供更强大的方法。
  • dequedouble-ended queue的缩写,提供在两端插入和删除的操作。
  • deque([iterable[, maxlen]]) --> deque objectmaxlen为双端队列的最大长度

双端队列的使用方法如下:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
>>> from collections import deque
>>> deque=deque((),5)
>>> deque.
deque.append(     deque.copy(       deque.extendleft( deque.maxlen      deque.remove(
deque.appendleft( deque.count(      deque.index(      deque.pop(        deque.reverse(
deque.clear(      deque.extend(     deque.insert(     deque.popleft(    deque.rotate(
>>> deque
deque([], maxlen=5)
deque.append(item)    # 在队列右边(末尾)添加项目[Add an element to the right side of the deque.]
deque.appendleft(item)    # 在队列左边(开始)添加项目[Add an element to the left side of the deque.]
deque.clear()            # 清空队列,也就是删除deque中的所有项目[Remove all elements from the deque.]
deque.extend(iterator)  # 在deque的右边(末尾)添加iterator中的所有项目[Extend the right side of the deque with elements from the iterable]
deque.extendleft(iterator)    # 在deque的左边(开始)添加iterator中的所有项目[Extend the left side of the deque with elements from the iterable]
deque.copy()            # 返回deque队列的一个浅拷贝[Return a shallow copy of a deque.]
deque.count(item)        # 返回deque队列中元素item出现的次数[return number of occurrences of value]
deque.index(value, [start, [stop]]) # 返回value在deque队列中的索引index[integer -- return first index of value.]
deque.index(index, object)     # 在deque队列索引号Index前插入对象object[insert object before index]
deque.pop()                # 移除并返回队列右边(末尾)的元素[Remove and return the rightmost element.]
deque.popleft()            # 移除并返回队列左边(开始)的元素[Remove and return the leftmost element.]
deque.remove(value)        # 移除队列中指定的元素[remove first occurrence of value.]
deque.reverse()            # 翻转队列,即队列前后翻转
deque.rotate(step)        # 向右旋转step步,不设置步数是,则默认向右旋转1步,如果step小于0,则向左旋转。
deque.maxlen            # 队列的最大长度

>>> deque
deque([], maxlen=5)
>>> deque.maxlen
5
>>> deque.append('first')
>>> deque
deque(['first'], maxlen=5)
>>> deque.append('second')
>>> deque
deque(['first', 'second'], maxlen=5)
>>> deque.append('third')
>>> deque
deque(['first', 'second', 'third'], maxlen=5)
>>> deque.appendleft('four')
>>> deque
deque(['four', 'first', 'second', 'third'], maxlen=5)
>>> deque.extend(['four','five'])
>>> deque
deque(['first', 'second', 'third', 'four', 'five'], maxlen=5)
>>> deque.extendleft(['four','five'])
>>> deque
deque(['five', 'four', 'first', 'second', 'third'], maxlen=5)
>>> deque1=deque.copy()
>>> type(deque1)
<class 'collections.deque'>
>>> deque1
deque(['five', 'four', 'first', 'second', 'third'], maxlen=5)
>>> deque.extend(('fourth','fifth'))
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)

>>> deque.count('first')
1
>>> deque.count('second')
1
>>> deque.count('third')
1

>>> deque.index('first')
0
>>> deque.index('second')
1
>>> deque.index('third')
2
>>> deque.index('third',0,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'third' is not in deque
>>> deque.index('third',0,3)
2

>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.reverse()
>>> deque
deque(['fifth', 'fourth', 'third', 'second', 'first'], maxlen=5)
>>> deque.reverse()
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)

>>> deque.rotate()
>>> deque
deque(['fifth', 'first', 'second', 'third', 'fourth'], maxlen=5)
>>> deque.rotate(-1)
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.rotate(3)
>>> deque
deque(['third', 'fourth', 'fifth', 'first', 'second'], maxlen=5)
>>> deque.rotate(-3)
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)

>>> deque.pop()
'fifth'
>>> deque
deque(['first', 'second', 'third', 'fourth'], maxlen=5)
>>> deque.popleft()
'first'
>>> deque
deque(['second', 'third', 'fourth'], maxlen=5)
>>> deque.remove('fourth')
>>> deque
deque(['second', 'third'], maxlen=5)
>>> len(deque)
2
>>> deque.maxlen
5
>>> deque.remove('third')
>>> deque
deque(['second'], maxlen=5)
>>> len(deque)
1
>>> deque.maxlen
5
 
>>> deque.clear()
>>> deque
deque([], maxlen=5)

常用内建模块之计数器Counter

  • Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型,以字典的键值对形式存储,其中元素作为key,其计数作为value。
  • Counter() 创建一个空的Counter()类对象。
  • Counnter(iterable):从一个可iterable对象(list、tuple、dict、字符串等)创建Counter对象。
  • 当所访问的键不存在时,返回0,而不是KeyError;否则返回它的计数。
  • 函数most_common([num])以降序返回所有元素,如果指定num值,则返回该数字个数值对。
  • 函数elements()返回一个迭代器。元素被重复了多少次,在该迭代器中就包含多少个该元素。元素排列无确定顺序。

示例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
In [1]: list1 = ['a', 'b', 'c', 'd', 'a', 'b', 'a', 'c']                                
                                                                                        
In [2]: list1                                                                           
Out[2]: ['a', 'b', 'c', 'd', 'a', 'b', 'a', 'c']                                        
                                                                                        
In [3]: from collections import Counter as ct                                           
                                                                                        
In [4]: ct(list1)                                                                       
Out[4]: Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1})                                       
                                                                                        
In [5]: a = ct(list1)                                                                   
                                                                                        
In [6]: a                                                                               
Out[6]: Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1})                                       
                                                                                        
In [7]: a.most_common()                                                                 
Out[7]: [('a', 3), ('b', 2), ('c', 2), ('d', 1)]                                        
                                                                                        
In [8]: a.most_common(2)                                                                
Out[8]: [('a', 3), ('b', 2)]                                                            
                                                                                        
In [9]: a.most_common(1)                                                                
Out[9]: [('a', 3)]                                                                      
                                                                                        
In [10]: a.values()                                                                     
Out[10]: dict_values([3, 2, 2, 1])                                                      
                                                                                        
In [11]: a.items()                                                                      
Out[11]: dict_items([('a', 3), ('b', 2), ('c', 2), ('d', 1)])                           

In [12]: a.elements()
Out[12]: <itertools.chain at 0x19918ddfeb8>

In [13]: a.elements
Out[13]: <bound method Counter.elements of Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1})>

In [14]: a['a']
Out[14]: 3

In [15]: a['b']
Out[15]: 2

In [16]: a['e']
Out[16]: 0

In [17]: list(a.elements())
Out[17]: ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']

In [18]: ct.
             clear()       fromkeys()    keys()        pop()         subtract()
             copy()        get()         most_common() popitem()     update()
             elements()    items()       mro()         setdefault()  values()
             

常用内建模块之命名元组namedtuple

访问元组数据时是通过索引下标来获取相应元素的值,需要熟记每个下标对应的具体含义。

当元组元素量较大时,记住每一个下标对应的意义那是相当困难的。于是就出现了命名元组namedtuple

命名元组的对象的定义如下:

1
2
3
4
5
6
7
8
collections.namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)
from collections import namedtuple 导入命名元组namedtuple
typename:此元组的名称
field_names:字段名称可以是whitespace或逗号分隔开的字符串或列表'x y z''x,y,z'['x','y','z']
            保留字不要作为字段名称数字和下划线不能作为字段开头字符
verbose=False:如果verbose为true则在构建完成后打印类定义 
            这个选项已经过时了 相反打印_source属性更简单
rename=False:是否重命名字段名称如果rename=True则当字段名称无效时会被自动替换成下划线 加元素所在索引数如_1等

命名元组namedtuple的使用方法如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
## 定义,导入namedtuple包
>>> from collections import namedtuple

## 下面5种方式都是定义的名称为student的命名元组,并且有三个字段名称name/年龄age/性别sex
>>> student=namedtuple('student','name age sex')
>>> student=namedtuple('student','name,age,sex')
>>> student=namedtuple('student','name\tage\tsex')
>>> student=namedtuple('student',['name','age','sex'])
>>> student=namedtuple('student',(['name','age','sex']))
>>> sa=student('Manu',40,'male')
>>> sb=student(name='Danny Green',age=30,sex='male')
>>> sc=student('Tony Parker',36,sex='male')
>>> sa
student(name='Manu', age=40, sex='male')
>>> sb
student(name='Danny Green', age=30, sex='male')
>>> sc
student(name='Tony Parker', age=36, sex='male')
>>> sa.name
'Manu'
>>> sa.age
40
>>> sa.sex
'male'

## 定义球员的名称、国家,球衣号码组成的命名元组player
>>> player=namedtuple('player','name country number')
>>> player
<class '__main__.player'>
>>> manu=player('Manu Ginóbili','阿根廷',20)
>>> manu.name
'Manu Ginóbili'
>>> manu.cou
manu.count(  manu.country
>>> manu.country
'阿根廷'
>>> manu.number
20
>>> Parker=player('Tony Parker','法国',9)
>>> Parker
player(name='Tony Parker', country='法国', number=9)
>>> Parker.name
'Tony Parker'
>>> Parker.count
Parker.count(  Parker.country
>>> Parker.country
'法国'
>>> Parker.number
9
>>> type(Parker)
<class '__main__.player'>

## rename的使用
## 默认情况下rename=False,即当字段名称无效时,不重命名字段名称

## 不带rename属性时,带def和return等保留字时,定义会报错:
>>> with_def_return=namedtuple('player','name def country return number')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\ProgramFiles\Python3.6.2\lib\collections\__init__.py", line 406, in namedtuple
    'keyword: %r' % name)
ValueError: Type names and field names cannot be a keyword: 'def'

>>> with_two_name=namedtuple('player','name country name number')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\ProgramFiles\Python3.6.2\lib\collections\__init__.py", line 413, in namedtuple
    raise ValueError('Encountered duplicate field name: %r' % name)
ValueError: Encountered duplicate field name: 'name'

## 带rename属性时,带def和return等保留字时,定义不会报错,但保留字会被替换成下划线加元素所在索引数:
>>> with_def_return=namedtuple('player','name def country return number',rename=True)
>>> with_def_return
<class '__main__.player'>
>>> with_def_return._fields
('name', '_1', 'country', '_3', 'number')

>>> with_two_name=namedtuple('player','name country name number',rename=True)
>>> with_two_name
<class '__main__.player'>
>>> with_two_name._fields
('name', 'country', '_2', 'number')

namedtuple命名元组的一些方法

  • somenamedtuple._fields 列出字段名称的字符串元组。
  • somenamedtuple._make(iterable) 从现有序列或迭代中创建新实例的类方法。
  • somenamedtuple._asdict() 返回一个新的有序字典OrderedDict,它将字段名称映射到相应的值
  • somenamedtuple._replace(**kwargs) 用新值替换命名元组的字段的值,并返回新命名元组
  • somenamedtuple._source python源码的字符串
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
## 使用_make将列表转换成命名元组实例
>>> list1=['Kawhi Leonard','美国',2]
>>> kawhi=player._make(list1)
>>> kawhi
player(name='Kawhi Leonard', country='美国', number=2)
>>> kawhi.name
'Kawhi Leonard'
>>> kawhi.country
'美国'
>>> kawhi.number
2
>>> kawhi._fields
('name', 'country', 'number')
>>> kawhi._asdict()
OrderedDict([('name', 'Kawhi Leonard'), ('country', '美国'), ('number', 2)])

## 使用_make将元组转换成命名元组实例
>>> tuple1=('Danny Green','美国',14)
>>> green=player._make(tuple1)
>>> green
player(name='Danny Green', country='美国', number=14)
>>> green.name
'Danny Green'
>>> green.country
'美国'
>>> green.number
14
>>> green._fields
('name', 'country', 'number')
>>> green._asdict()
OrderedDict([('name', 'Danny Green'), ('country', '美国'), ('number', 14)])

## 不能使用_make将字典转换成命名元组实例,需要使用double-star-operator双*操作:
>>> p1={'name':'Tim Duncan','country':'USA','number':11}
>>> tim=player._make(p1)
>>> tim   # 转换出来的结果并不是自己想要的
player(name='name', country='country', number='number')
>>> tim=player(**p1)
>>> tim
player(name='Tim Duncan', country='USA', number=11)

## 使用_replace替换命名元组的字段的值,并返回新命名元组
>>> green
player(name='Danny Green', country='美国', number=14)
>>> green._replace(number=4)
player(name='Danny Green', country='美国', number=4)
>>> green.number
14
>>> new_green=green._replace(number=4)
>>> new_green
player(name='Danny Green', country='美国', number=4)
>>> new_green.number
4

## 使用_fields构建新的命名元组
>>> location=namedtuple('location','row column')
>>> location
<class '__main__.location'>
>>> location._fields
('row', 'column')
>>> color=namedtuple('color','red green blue')
>>> color._fields
('red', 'green', 'blue')
>>> pixel=namedtuple('pixel',location._fields+color._fields)
>>> pixel._fields
('row', 'column', 'red', 'green', 'blue')

常用内建模块之有序字典OrderedDict

python自带的字典dict是无序的,因为字典dict是按hash来存储的。

collections模块下的OrderedDict实现了对字典中元素的排序;由于有序字典会记住它的插入顺序,所以它可以与排序结合使用来创建一个已排序的字典。

有序字典OrderedDict的使用方法如下:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
>>> from collections import OrderedDict as od
>>> od.
od.clear(       od.fromkeys(    od.items(       od.move_to_end( od.pop(         od.setdefault(  od.values(
od.copy(        od.get(         od.keys(        od.popitem(     od.update(

od.fromkeys(iterator)    # 从可迭代序列中生成有序键
od.items()                # 返回有序字典的所有元素
od.get(key)                # 获取键key对应的value值
od.values()                # 返回有序字典的所有的value值
od.keys()                # 返回有序字典的所有的key值
od.pop(key)                # 从有序字典中移除键key,并返回key对应的值value
od.popitem(key,last=True)    # 从有序字典中移除键key,返回元组(key,value)
                            # 不指定key时,则移除最后加入的key
                            # 如果指定last=True(默认),则LIFO(last-in,first-out后进先出)
                            # 如果指定last=False,则FIFO(first-in,first-out先进先出)
od.copy()                # 复制有序字典
od.setdefault(key,value)    # 获取有序字典中key对应的值
                            # 如果key不存在,则创建对应的key,并赋值为value
                            # 如果key不存在,则未指定value,则value值为None
od.update(key_value)        # 更新有序字典中key对应的值为新value
od.clear()                    # 清空有序字典
od.move_to_end(key,last=True)        # 将有序字典中key对应的键值对移动到有序字典有结尾处
                                    # 如果指定last=False(默认为True),则移动到开始处
## 普通字典
>>> dict1 = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
>>> dict1
{'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
## 按键排序
>>> dict2=od(sorted(dict1.items(),key=lambda t:t[0]))
>>> dict2
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
## 按值升序排序
>>> dict3=od(sorted(dict1.items(),key=lambda t:t[1]))
>>> dict3
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
## 按值降序排序
>>> dict3=od(sorted(dict1.items(),key=lambda t:t[1],reverse=True))
>>> dict3
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
## 按键对应的字符串的长度升序排序
>>> dict4=od(sorted(dict1.items(),key=lambda t:len(t[0])))
>>> dict4
OrderedDict([('pear', 1), ('apple', 4), ('banana', 3), ('orange', 2)])
## 按键对应的字符串的长度降序排序
>>> dict5=od(sorted(dict1.items(),key=lambda t:len(t[0]),reverse=True))
>>> dict5
OrderedDict([('banana', 3), ('orange', 2), ('apple', 4), ('pear', 1)])

>>> od1 = od([('name','meichaohui'),('lang','python')])
>>> od1
OrderedDict([('name', 'meichaohui'), ('lang', 'python')])
>>> od1['age']=28
>>> od1
OrderedDict([('name', 'meichaohui'), ('lang', 'python'), ('age', 28)])
>>> od2=od.fromkeys('abcdefg')
>>> od2
OrderedDict([('a', None), ('b', None), ('c', None), ('d', None), ('e', None), ('f', None), ('g', None)])
>>> od3=od.fromkeys(['a','b','c','d'])
>>> od3
OrderedDict([('a', None), ('b', None), ('c', None), ('d', None)])
>>> od4=od.fromkeys({"a":1})
>>> od4
OrderedDict([('a', None)])

>>> od3.items()
odict_items([('a', None), ('b', None), ('c', None), ('d', None)])
>>> od4.items()
odict_items([('a', None)])

>>> od1
OrderedDict([('name', 'meichaohui'), ('lang', 'python'), ('age', 28)])
>>> od1.get('name')
'meichaohui'
>>> od1.get('age')
28
>>> od1.get('lang')
'python'

>>> od1.values()
odict_values(['meichaohui', 'python', 28])
>>> od2.values()
odict_values([None, None, None, None, None, None, None])
>>> od2.keys()
odict_keys(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> od1.keys()
odict_keys(['name', 'lang', 'age'])

>>> dict1=od([('a',1),('b',2),('c',3)])
>>> dict1
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> dict1.pop()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Required argument 'key' (pos 1) not found
>>> dict1.pop('b')
2
>>> dict1
OrderedDict([('a', 1), ('c', 3)])
>>> dict1.popitem()
('c', 3)
>>> dict1
OrderedDict([('a', 1)])
>>> dict1.setdefault('b',2)
2
>>> dict1
OrderedDict([('a', 1), ('b', 2)])
>>> dict1.popitem('b')
('b', 2)
>>> dict1
OrderedDict([('a', 1)])
>>> dict1.setdefault('b')
>>> dict1
OrderedDict([('a', 1), ('b', None)])
>>> dict1.update('b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
>>> dict1.update('b',1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: update() takes at most 1 positional argument (2 given)
>>> dict1.update(('b',1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
>>> dict1.update([('b',1)])
>>> dict1
OrderedDict([('a', 1), ('b', 1)])
>>> dict1.update([('b',2)])
>>> dict1
OrderedDict([('a', 1), ('b', 2)])
>>> dict1.update({'b':3})
>>> dict1
OrderedDict([('a', 1), ('b', 3)])
>>> dict2=dict1.copy()
>>> dict2
OrderedDict([('a', 1), ('b', 3)])
>>> dict2.clear()
>>> dict2
OrderedDict()

>>> dict1
OrderedDict([('a', 1), ('b', 3)])
>>> dict1['c']=2
>>> dict1
OrderedDict([('a', 1), ('b', 3), ('c', 2)])
>>> dict1['d']=4
>>> dict1
OrderedDict([('a', 1), ('b', 3), ('c', 2), ('d', 4)])
>>> dict1.move_to_end('b')
>>> dict1
OrderedDict([('a', 1), ('c', 2), ('d', 4), ('b', 3)])
>>> dict1.move_to_end('d')
>>> dict1
OrderedDict([('a', 1), ('c', 2), ('b', 3), ('d', 4)])

常用内建模块之defaultdict字典缺省默认值

在Python中如果访问字典中不存在的键,则会引发KeyError异常。

示例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
In [1]: dict1={'a':1,'b':2}                                                                   
                                                                                              
In [2]: dict1['a']                                                                            
Out[2]: 1                                                                                     
                                                                                              
In [3]: dict1['b']                                                                            
Out[3]: 2                                                                                     
                                                                                              
In [4]: dict1['c']                                                                            
---------------------------------------------------------------------------                   
KeyError                                  Traceback (most recent call last)                   
<ipython-input-4-6bf0c4d0a790> in <module>                                                    
----> 1 dict1['c']                                                                            
                                                                                              
KeyError: 'c'                                                                                 

访问dict1['c']时提示’c‘键不存在。

假设我有下面这样的一段文章需要统计每个单词的数量:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict  dict subclass that remembers the order entries were added
* defaultdict  dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list objects for easier list subclassing
* UserString   wrapper around string objects for easier string subclassing
  • 不使用defaultdict,按普通的字典统计方式进行统计,在单词第一次统计的时候,在counts中相应的键存下默认值1。这需要在处理的时候添加一个判断语句。

代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Filename: defaultdict_count_word.py
## Author: meizhaohui

def count_words(article):
    # replace \n to space,then split to list
    article_list = article.replace('\n',' ').split()
    counts = {}
    for word in article_list:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1
    print(counts)
    
    
if __name__ == '__main__':
    article='''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict  dict subclass that remembers the order entries were added
* defaultdict  dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list objects for easier list subclassing
* UserString   wrapper around string objects for easier string subclassing

'''
    count_words(article)

运行:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ python defaultdict_count_word.py                                                                                      
{'This': 1, 'module': 1, 'implements': 1, 'specialized': 1, 'container': 2, 'datatypes': 1, 'providing': 1, 'alternative
s': 1, 'to': 2, "Python's": 1, 'general': 1, 'purpose': 1, 'built-in': 1, 'containers,': 1, 'dict,': 1, 'list,': 1, 'set
,': 1, 'and': 2, 'tuple.': 1, '*': 9, 'namedtuple': 1, 'factory': 2, 'function': 2, 'for': 6, 'creating': 2, 'tuple': 1,
 'subclasses': 1, 'with': 2, 'named': 1, 'fields': 1, 'deque': 1, 'list-like': 1, 'fast': 1, 'appends': 1, 'pops': 1, 'o
n': 1, 'either': 1, 'end': 1, 'ChainMap': 1, 'dict-like': 1, 'class': 1, 'a': 2, 'single': 1, 'view': 1, 'of': 1, 'multi
ple': 1, 'mappings': 1, 'Counter': 1, 'dict': 4, 'subclass': 3, 'counting': 1, 'hashable': 1, 'objects': 4, 'OrderedDict
': 1, 'that': 2, 'remembers': 1, 'the': 1, 'order': 1, 'entries': 1, 'were': 1, 'added': 1, 'defaultdict': 1, 'calls': 1
, 'supply': 1, 'missing': 1, 'values': 1, 'UserDict': 1, 'wrapper': 3, 'around': 3, 'dictionary': 1, 'easier': 3, 'subcl
assing': 3, 'UserList': 1, 'list': 2, 'UserString': 1, 'string': 2}                                                     
  • 使用defaultdict,不需要对键进行判断,直接添加。

代码如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Filename: defaultdict_count_word.py
## Author: meizhaohui

def count_words(article):
    from collections import defaultdict as dt
    # replace \n to space,then split to list
    article_list = article.replace('\n',' ').split()
    # counts = {}
    counts = dt(int)
    for word in article_list:
        # if word not in counts:
        #     counts[word] = 1
        # else:
        #     counts[word] += 1
        counts[word] += 1
    print(counts)
    
    
if __name__ == '__main__':
    article='''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict  dict subclass that remembers the order entries were added
* defaultdict  dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list objects for easier list subclassing
* UserString   wrapper around string objects for easier string subclassing

'''
    count_words(article)

运行:

1
2
$ python defaultdict_count_word.py
defaultdict(<class 'int'>, {'This': 1, 'module': 1, 'implements': 1, 'specialized': 1, 'container': 2, 'datatypes': 1, 'providing': 1, 'alternatives': 1, 'to': 2, "Python's": 1, 'general': 1, 'purpose': 1, 'built-in': 1, 'containers,': 1, 'dict,': 1, 'list,': 1, 'set,': 1, 'and': 2, 'tuple.': 1, '*': 9, 'namedtuple': 1, 'factory': 2, 'function': 2, 'for': 6, 'creating': 2, 'tuple': 1, 'subclasses': 1, 'with': 2, 'named': 1, 'fields': 1, 'deque': 1, 'list-like': 1, 'fast': 1, 'appends': 1, 'pops': 1, 'on': 1, 'either': 1, 'end': 1, 'ChainMap': 1, 'dict-like': 1, 'class': 1, 'a': 2, 'single': 1, 'view': 1, 'of': 1, 'multiple': 1, 'mappings': 1, 'Counter': 1, 'dict': 4, 'subclass': 3, 'counting': 1, 'hashable': 1, 'objects': 4, 'OrderedDict': 1, 'that': 2, 'remembers': 1, 'the': 1, 'order': 1, 'entries': 1, 'were': 1, 'added': 1, 'defaultdict': 1, 'calls': 1, 'supply': 1, 'missing': 1, 'values': 1, 'UserDict': 1, 'wrapper': 3, 'around': 3, 'dictionary': 1, 'easier': 3, 'subclassing': 3, 'UserList': 1, 'list': 2, 'UserString': 1, 'string': 2})

上面示例中defaultdict使用int给不存在的键设定默认值为int类型的默认值0,counts[word] += 1 实质上是先给counts[word]赋值0,遇到重复的单词的话就加1。使用这种方式不需要再进行判断。

::: tip 说明 上面的例子并没有对标点符号进行再进一步的处理,只是粗略的计算了一下单词量。 :::

  • defaultdict可以使用int,list,dict等的默认值作为期字典缺省默认值。
Licensed under the GNU General Public License v3.0