Python标准库-数据结构

字数统计: 1.5k字 | 阅读时长: 7分

2021-07-01

摘要: Python 标准库中关于数据结构的组件。

【对算法，数学，计算机感兴趣的同学，欢迎关注我哈，阅读更多原创文章】
我的网站：潮汐朝夕的生活实验室
我的公众号：算法题刷刷
我的知乎：潮汐朝夕
我的github：FennelDumplings
我的leetcode：FennelDumplings

Python 中有几个作为内置类型的标准的数据结构：list, tuple, dict, set，除此之外在标准库中还有很多跟数据结构相关的模块。

参考书（中英文版）：

思维导图是 Python 标准库中的数据结构相关模块和主要模块内的主要组件。

总览

enum

enum 模块提供了枚举类型的一个实现。该模块实现的枚举类型具有迭代和比较的能力。它可以用来为值创建定义良好的符号，而不是使用字符串或整数字面值。

collections

collections 模块提供了若干种数据结构的实现。例如: deque 是双端队列，可以在两端压入和弹出。defaultdict 是对缺失键具有默认值的字典。OrderedDict 是记录插入顺序的字典。namedtuple 在元组的基础上，可以通过名称访问元素，通过下标访问的方式也支持。

array

对于大量数据，array 可以比 list 更高效地使用内存，但是只能存一种数据类型。

heapq 和 bisect

将序列中的对象排序是数据处理中的常见操作。对于 list，有 sort 方法。heapq 通过堆的方式维护对象的有序性。bisect 通过插入排序的方式维护对象的有序性。

queue

内建的 list 可以通过 insert 和 pop 方法模拟队列，但它不是线程安全的。queue 模块提供了线程安全的队列，可以实现线程之间的真正有序通信。multiprocessing 模块也有一个队列，用于进程间的真正有序通信。

struct

struct 用于解码其它应用中的数据, 可能来自数据流，也可能来自二进制文件，解码后得到 Python 的类型。

Python标准库-二进制数据服务

weakref 和 copy

weakref 和 copy 是两种与数据结构内存管理相关的模块。对于内部有链接的数据结构，例如图和树，用 weakref 维护引用可以使得 gc 可以在当数据结构对象不需要时取回收它们。copy 用于复制数据结构及其内容，包括用 deepcopy 进行递归地复制。

pprint

调试数据结构有时是很耗时间的，尤其是需要打印较大的字典或序列的时候，pprint 可以将数据结构打印成比较易于阅读的形式，输出到标准输出或者 log 文件。

如果现有组件不满足需求，可以继承某个数据结构然后重写它。或者用抽象基类创建新的容器类型。

下面是几个代码实例，如下:

自定义 __repr__ 后 print 和 pprint 的区别
优先级队列维护自定义对象
Type Codes for array Members
Python 中常见的抽象基类
多线程从两端消费 deque

多线程从两端消费 deque

from collections import deque
import threading
import time

candle = deque(range(100))

def burn(direction, next_source):
    while True:
        try:
            next_ = next_source()
        except IndexError:
            break
        else:
            print("{:>8}: {}".format(direction, next_))
            time.sleep(0.1)
    print("{:>8}: done".format(direction))
    return

left = threading.Thread(target=burn, args=("Left", candle.popleft))
right = threading.Thread(target=burn, args=("Right", candle.pop))

left.start()
right.start()
left.join()
right.join()

Python 中常见的抽象基类

Class Base	Class(es)	API Purpose
Container		Basic container features, such as `in` operator
Hashable		Adds support for providing a hash value for the container instance
Iterable		Can create an iterator over the container contents
Iterator	Iterable	Is an iterator over the container contents
Generator	Iterator	Extends iterators with the generator protocol from PEP 342
Sized		Adds methods for containers that know how big they are
Callable		For containers that can be invoked as a function
Sequence	Sized, Iterable, Container	Supports retrieving individual items, iterating, and changing the order of items
MutableSequence	Sequence	Supports adding and removing items to an instance after it has been created
ByteString	Sequence	Combined API of bytes and bytearray
Set	Sized, Iterable, Container	Supports set operations such as intersection and union
MutableSet	set	Adds methods for manipulating the set contents after it is created
Mapping	Sized, Iterable, Container	Defines the read-only API used by dict
MutableMapping	Mapping	Defines the methods for manipulating the contents of a mapping after it is created
MappingView	Sized	Defines the view API for accessing a mapping from an iterator
ItemsView	MappingView, Set	Part of the view API
KeysView	MappingView, Set	Part of the view API
ValuesView	MappingView, Set	Part of the view API
Awaitable		API for objects that can be used in await expressions, such as coroutines
Coroutine	Awaitable	API for classes that implement the coroutine protocol
AsyncIterable		API for iterables compatible with async for , as defined in PEP 492
AsyncIterator	AsyncIterable	API for asynchronous iterators

Type Codes for array Members

Code	Type	Minimum Size (Bytes)
b	Int	1
B	Int	1
h	Signed short	2
H	Unsigned short	2
i	Signed int	2
I	Unsigned int	2
l	Signed long	4
L	Unsigned long	4
q	Signed long long	8
Q	Unsigned long long	8
f	Float	4
d	Double float	8

优先级队列维护自定义对象

例子: 多线程消费优先级队列中的任务

import functools
import queue
import threading

@functools.total_ordering
class Job:
    def __init__(self, priority, description):
        self.priority = priority
        self.description = description
        print('New job:', description)
        return

    def __eq__(self, other):
        try:
            return self.priority == other.priority
        except AttributeError:
            return NotImplemented

    def __lt__(self, other):
        try:
            return self.priority < other.priority
        except AttributeError:
            return NotImplemented

q = queue.PriorityQueue()

q.put(Job(3, 'Mid-level job'))
q.put(Job(10, 'Low-level job'))
q.put(Job(1, 'Important job'))

def process_job(q):
    while True:
        next_job = q.get()
        print('Processing job:', next_job.description)
        q.task_done()

workers = [threading.Thread(target=process_job, args=(q,))
          ,threading.Thread(target=process_job, args=(q,)),
          ]

for w in workers:
    w.setDaemon(True)
    w.start()

q.join()

New job: Mid-level job
New job: Low-level job
New job: Important job
Processing job: Important job
Processing job: Mid-level job
Processing job: Low-level job

自定义 `repr` 后 print 和 pprint 的区别

from pprint import pprint

class node:
    def __init__(self, name, contents=[]):
        self.name = name
        self.contents = contents[:]

    def __repr__(self):
        return ('node(' + repr(self.name) + ', ' + repr(self.contents) + ')')

trees = [node('node-1')
        ,node('node-2', [node('node-2-1')])
        ,node('node-3', [node('node-3-1')])
        ]

print("---print---")
print(trees)
print("---pprint---")
pprint(trees)

---print---
[node('node-1', []), node('node-2', [node('node-2-1', [])]), node('node-3', [node('node-3-1', [])])]
---pprint---
[node('node-1', []),
 node('node-2', [node('node-2-1', [])]),
 node('node-3', [node('node-3-1', [])])]

潮汐朝夕的生活实验室 \Doge 陪伴一个算法工程师的职业生涯

编程Python

Python标准库-数据结构

总览

enum

collections

array

heapq 和 bisect

queue

struct

weakref 和 copy

pprint

多线程从两端消费 deque

Python 中常见的抽象基类

Type Codes for array Members

优先级队列维护自定义对象

自定义 `repr` 后 print 和 pprint 的区别

编程Python

总览

enum

collections

array

heapq 和 bisect

queue

struct

weakref 和 copy

pprint

多线程从两端消费 deque

Python 中常见的抽象基类

Type Codes for array Members

优先级队列维护自定义对象

自定义 __repr__ 后 print 和 pprint 的区别

自定义 `repr` 后 print 和 pprint 的区别