Python标准库-数据结构

  |  

摘要: Python 标准库中关于数据结构的组件。

【对算法,数学,计算机感兴趣的同学,欢迎关注我哈,阅读更多原创文章】
我的网站:潮汐朝夕的生活实验室
我的公众号:算法题刷刷
我的知乎:潮汐朝夕
我的github:FennelDumplings
我的leetcode:FennelDumplings


Python 中有几个作为内置类型的标准的数据结构:list, tuple, dict, set,除此之外在标准库中还有很多跟数据结构相关的模块。

参考书(中英文版):

思维导图是 Python 标准库中的数据结构相关模块和主要模块内的主要组件。


总览

enum

enum 模块提供了枚举类型的一个实现。该模块实现的枚举类型具有迭代和比较的能力。它可以用来为值创建定义良好的符号,而不是使用字符串或整数字面值。

collections

collections 模块提供了若干种数据结构的实现。例如: deque 是双端队列,可以在两端压入和弹出。defaultdict 是对缺失键具有默认值的字典。OrderedDict 是记录插入顺序的字典。namedtuple 在元组的基础上,可以通过名称访问元素,通过下标访问的方式也支持。

array

对于大量数据,array 可以比 list 更高效地使用内存,但是只能存一种数据类型。

heapq 和 bisect

将序列中的对象排序是数据处理中的常见操作。对于 list,有 sort 方法。heapq 通过堆的方式维护对象的有序性。bisect 通过插入排序的方式维护对象的有序性。

queue

内建的 list 可以通过 insert 和 pop 方法模拟队列,但它不是线程安全的。queue 模块提供了线程安全的队列,可以实现线程之间的真正有序通信。multiprocessing 模块也有一个队列,用于进程间的真正有序通信。

struct

struct 用于解码其它应用中的数据, 可能来自数据流,也可能来自二进制文件,解码后得到 Python 的类型。

weakref 和 copy

weakrefcopy 是两种与数据结构内存管理相关的模块。对于内部有链接的数据结构,例如图和树,用 weakref 维护引用可以使得 gc 可以在当数据结构对象不需要时取回收它们。copy 用于复制数据结构及其内容,包括用 deepcopy 进行递归地复制。

pprint

调试数据结构有时是很耗时间的,尤其是需要打印较大的字典或序列的时候,pprint 可以将数据结构打印成比较易于阅读的形式,输出到标准输出或者 log 文件。


如果现有组件不满足需求,可以继承某个数据结构然后重写它。或者用抽象基类创建新的容器类型。

下面是几个代码实例,如下:

  • 自定义 __repr__ 后 print 和 pprint 的区别
  • 优先级队列维护自定义对象
  • Type Codes for array Members
  • Python 中常见的抽象基类
  • 多线程从两端消费 deque

多线程从两端消费 deque

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from collections import deque
import threading
import time

candle = deque(range(100))

def burn(direction, next_source):
while True:
try:
next_ = next_source()
except IndexError:
break
else:
print("{:>8}: {}".format(direction, next_))
time.sleep(0.1)
print("{:>8}: done".format(direction))
return

left = threading.Thread(target=burn, args=("Left", candle.popleft))
right = threading.Thread(target=burn, args=("Right", candle.pop))

left.start()
right.start()
left.join()
right.join()

Python 中常见的抽象基类

Class Base Class(es) API Purpose
Container Basic container features, such as in operator
Hashable Adds support for providing a hash value for the container instance
Iterable Can create an iterator over the container contents
Iterator Iterable Is an iterator over the container contents
Generator Iterator Extends iterators with the generator protocol from PEP 342
Sized Adds methods for containers that know how big they are
Callable For containers that can be invoked as a function
Sequence Sized, Iterable, Container Supports retrieving individual items, iterating, and changing the order of items
MutableSequence Sequence Supports adding and removing items to an instance after it has been created
ByteString Sequence Combined API of bytes and bytearray
Set Sized, Iterable, Container Supports set operations such as intersection and union
MutableSet set Adds methods for manipulating the set contents after it is created
Mapping Sized, Iterable, Container Defines the read-only API used by dict
MutableMapping Mapping Defines the methods for manipulating the contents of a mapping after it is created
MappingView Sized Defines the view API for accessing a mapping from an iterator
ItemsView MappingView, Set Part of the view API
KeysView MappingView, Set Part of the view API
ValuesView MappingView, Set Part of the view API
Awaitable API for objects that can be used in await expressions, such as coroutines
Coroutine Awaitable API for classes that implement the coroutine protocol
AsyncIterable API for iterables compatible with async for , as defined in PEP 492
AsyncIterator AsyncIterable API for asynchronous iterators

Type Codes for array Members

Code Type Minimum Size (Bytes)
b Int 1
B Int 1
h Signed short 2
H Unsigned short 2
i Signed int 2
I Unsigned int 2
l Signed long 4
L Unsigned long 4
q Signed long long 8
Q Unsigned long long 8
f Float 4
d Double float 8

优先级队列维护自定义对象

例子: 多线程消费优先级队列中的任务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import functools
import queue
import threading

@functools.total_ordering
class Job:
def __init__(self, priority, description):
self.priority = priority
self.description = description
print('New job:', description)
return

def __eq__(self, other):
try:
return self.priority == other.priority
except AttributeError:
return NotImplemented

def __lt__(self, other):
try:
return self.priority < other.priority
except AttributeError:
return NotImplemented

q = queue.PriorityQueue()

q.put(Job(3, 'Mid-level job'))
q.put(Job(10, 'Low-level job'))
q.put(Job(1, 'Important job'))

def process_job(q):
while True:
next_job = q.get()
print('Processing job:', next_job.description)
q.task_done()

workers = [threading.Thread(target=process_job, args=(q,))
,threading.Thread(target=process_job, args=(q,)),
]

for w in workers:
w.setDaemon(True)
w.start()

q.join()
1
2
3
4
5
6
New job: Mid-level job
New job: Low-level job
New job: Important job
Processing job: Important job
Processing job: Mid-level job
Processing job: Low-level job

自定义 __repr__ 后 print 和 pprint 的区别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from pprint import pprint

class node:
def __init__(self, name, contents=[]):
self.name = name
self.contents = contents[:]

def __repr__(self):
return ('node(' + repr(self.name) + ', ' + repr(self.contents) + ')')

trees = [node('node-1')
,node('node-2', [node('node-2-1')])
,node('node-3', [node('node-3-1')])
]

print("---print---")
print(trees)
print("---pprint---")
pprint(trees)
1
2
3
4
5
6
---print---
[node('node-1', []), node('node-2', [node('node-2-1', [])]), node('node-3', [node('node-3-1', [])])]
---pprint---
[node('node-1', []),
node('node-2', [node('node-2-1', [])]),
node('node-3', [node('node-3-1', [])])]

Share