回炉-Python基础教程-4

字数统计: 7.7k字 | 阅读时长: 46分

2021-06-19

Python66

摘要: 《Python基础教程》笔记 part4

【对算法，数学，计算机感兴趣的同学，欢迎关注我哈，阅读更多原创文章】
我的网站：潮汐朝夕的生活实验室
我的公众号：算法题刷刷
我的知乎：潮汐朝夕
我的github：FennelDumplings
我的leetcode：FennelDumplings

写在前面

【回炉-Python基础教程】系列连载主要回炉我之前看过的一本比较入门但是很系统的 Python 的书。

本书信息:

《Python基础教程（第3版）》
微信读书
作者: Magnus Lie Hetland
时间: 2018

本书包括Python程序设计的方方面面，主要分为 5 个部分

Python基础知识和基本概念，包括列表、元组、字符串、字典以及各种语句
Python相对高级的主题，包括抽象、异常、魔法方法、特性、迭代器、模块、文件处理
Python与数据库、网络、C语言等工具结合使用，发挥出Python的强大功能
Python程序测试、打包、发布等知识
10个具有实际意义的Python项目的开发过程

往期回顾

本文是《Python基础教程》回炉系列的第二部分中的魔法方法、特性、迭代器、模块这个话题中的一些容易忘的点。涉及书中章节 Chap9 ~ Chap10，各章节记录的要点如下：

Chap 9 Magic Methods, Properties, and Iterators
- constructors
  - Overriding Methods in General, and the Constructor in Particular
  - Calling the Unbound Superclass Constructor
  - Using the super Function
- Item Access
  - The Basic Sequence and Mapping Protocol
  - Subclassing list, dict, and str
- Properties
  - The property Function
  - Static Methods and Class Methods
  - __getattr__, __setattr__, and Friends
- Iterators
  - The Iterator Protocol
  - Making Sequences from Iterators
- Generators
  - Making a Generator
  - A Recursive Generator
  - Generators in General
  - Generator Methods
  - Simulating Generators
- The Eight Queens
  - Generators and Backtracking
  - The Problem
  - State Representation
  - Finding Conflicts
  - The Base Case
  - The Recursive Case
Chap 10 Batteries Included
- Modules
- Exploring Modules
  - What’s in a Module?
  - Documentation
  - Use the Source
- The Standard Library: A Few Favorites
  - sys
  - os
  - fileinput
  - Sets, Heaps, and Deques
  - time
  - random
  - shelve and json
  - re
  - Other Interesting Standard Modules

9. Magic Methods, Properties, and Iterators

new-style classes vs. old-style classes

# new style
class NewStyle(object):
    ...

# old style
class OldStyle:
    ...

If the file begin with __metaclass__ = type, both class would be new-style
Metaclasses ate the classes of other classes
There are no old style classes in python3 and no need to explicitly subclass object or set the metaclass to type.

(1) constructors

__init__

python has a magic method called __del__, also known as the destructor. it is called just before the object is destroyed (garbage-collected), but because you cannot really know when (or if) this happens, I advise you to stay away from __del__ if at all possible.

Overriding Methods in General, and the Constructor in Particular

If a method is called (or an attribute is accessed) on an instance of class B and it is not found, its superclass A will be searched.

if you override the constructor of a class, you need to call the constructor of the superclass if you override the constructor of a class, you need to call the constructor of the superclass

class Bird:
    def __init__(self):
        self.hungry = True
    def eat(self):
        if self.hungry:
            print('Aaaah ...')
            self.hungry = False    
        else:
            print('No, Thanks')    
            
class SongBird(Bird):
    def __init__(self):
        # eat 方法继承来了，self.hungry 属性并没有继承来，原因如下
        # the constructor is overridden, and the new constructor doesn’t contain 
        # any initialization code dealing with the hungry attribute.
        self.sound = 'Squawk!'
    def sing(self):            
        print(self.sound)

Calling the Unbound Superclass Constructor

class SongBird(Bird):
    def __init__(self):
        Bird.__init__(self) 
        self.sound = 'Squawk!' 
    def sing(self):
        print(self.sound)

When you retrieve a method from an instance, the self argument of the method is automatically bound to the instance (a so-called bound method). However, if you retrieve the method directly from the class (such as in Bird.__init__), there is no instance to which to bind. Therefore, you are free to supply any self you want to. Such a method is called unbound

By supplying the current instance as the self argument to the unbound method, the songbird gets the full treatment from its superclass’s constructor (which means that it has its hungry attribute set).

Using the super Function

If you’re not stuck with an old version of Python, the super function is really the way to go. It works only with new-style classes,

It is called with the current class and instance as its arguments, and any method you call on the returned object will be fetched from the superclass rather than the current class.

instead of using Bird in the SongBird constructor, you can use super(SongBird, self). Also, the __init__ method can be called in a normal (bound) fashion.

class SongBird(Bird):
    def __init__(self):
        # In Python 3, super can—and generally should—be 
        # called without any arguments and will do its job
        super().__init__() 
        self.sound = 'Squawk!' 
    def sing(self):
        print(self.sound)

in my opinion, the super function is more intuitive than calling unbound methods on the superclass directly, but that is not its only strength. the super function is actually quite smart, so even if you have multiple superclasses, you only need to use super once (provided that all the superclass constructors also use super). also, some obscure situations that are tricky when using old-style classes (for example, when two of your superclasses share a superclass) are automatically dealt with by new-style classes and super. You don’t need to understand exactly how it works internally, but you should be aware that, in most cases, it is clearly superior to calling the unbound constructors (or other methods) of your superclasses.

What super actually does is return a super object, which will take care of method resolution for you. When you access an attribute on it, it will look through all your superclasses (and super-superclasses, and so forth) until it finds the attribute, or raises an AttributeError.

(2) Item Access

The basic sequence and mapping protocol is pretty simple. However, to implement all the functionality of sequences and mappings, there are many magic methods to implement.

the word protocol is often used in python to describe the rules governing some form of behavior.

the protocol says something about which methods you should implement and what those methods should do. Because polymorphism in python is based on only the object’s behavior.

where other languages might require an object to belong to a certain class or to implement a certain interface, python often simply requires it to follow some given protocol.

so, to be a sequence, all you have to do is follow the sequence protocol.

The Basic Sequence and Mapping Protocol

To implement their basic behavior (protocol), you need two magic methods if your objects are immutable, or four if they are mutable.

__len__(self): This method should return the number of items contained in the collection. For a sequence, this would simply be the number of elements. For a mapping, it would be the number of key-value pairs. If __len__ returns zero (and you don’t implement __nonzero__, which overrides this behavior), the object is treated as false in a Boolean context (as with empty lists, tuples, strings, and dictionaries).
__getitem__(self, key): This should return the value corresponding to the given key. For a sequence, the key should be an integer from zero to n–1 (or, it could be negative, as noted later), where n is the length of the sequence. For a mapping, you could really have any kind of keys.
__setitem__(self, key, value): This should store value in a manner associated with key, so it can later be retrieved with __getitem__. Of course, you define this method only for mutable objects.
__delitem__(self, key): This is called when someone uses the __del__ statement on a part of the object and should delete the element associated with key. Again, only mutable objects (and not all of them—only those for which you want to let items be removed) should define this method.

Some extra requirements are imposed on these methods.

For a sequence, if the key is a negative integer, it should be used to count from the end. In other words, treat x[-n] the same as x[len(x)-n].
If the key is of an inappropriate type (such as a string key used on a sequence), a TypeError may be raised.
If the index of a sequence is of the right type, but outside the allowed range, an IndexError should be raised.

Subclassing list, dict, and str

While the four methods of the basic sequence/mapping protocol will get you far, sequences may have many other useful magic and ordinary methods, including the __iter__ method, which I describe in the section “Iterators” later in this chapter.

The standard library comes with abstract and concrete base classes in the collections module, but you can also simply subclass the built-in types themselves. So, if you want to implement a sequence type that behaves similarly to the built-in lists, you can simply subclass list.

class CounterList(list):
    def __init__(self, *args):
        super().__init__(*args)
        self.counter = 0
        
    def __getitem__(self, index):
        # super is used to call the superclass version of the method
        self.counter += 1
        return super(CounterList, self).__getitem__(index)

More Magic

For more information about which magic methods are available, see section “Special method names” in the Python Reference Manual.

(3) Properties

In Chapter 7, I mentioned accessor methods, Ref: 回炉-Python基础教程-2. Accessors are simply methods with names such as getHeight and setHeight and are used to retrieve or rebind some attribute (which may be private to the class — see the section “Privacy Revisited” in Chapter 7).

Should you wrap all your attributes in accessors? That is a possibility, of course. However, it would be impractical (and kind of silly) if you had a lot of simple attributes, because you would need to write many accessors that did nothing but retrieve or set these attributes, with no useful action taken.
Python can hide your accessors for you, making all of your attributes look alike. Those attributes that are defined through their accessors are often called properties.

example:

class Rectangle:
    def __init__ (self):
        self.width = 0 
        self.height = 0 
        
    def set_size(self, size):
        self.width, self.height = size 
    
    def get_size(self):
        return self.width, self.height

The get_size and set_size methods are accessors for a fictitious attribute called size — which is simply the tuple consisting of width and height.

r = Rectangle()
r.width = 10
r.height = 5
print(r.get_size())
r.set_size((150, 100))
print(r.width)

The programmer using this class shouldn’t need to worry about how it is implemented (encapsulation). If you someday wanted to change the implementation so that size was a real attribute and width and height were calculated on the fly, you would need to wrap them in accessors, and any programs using the class would also have to be rewritten.

Python actually has two mechanisms for creating properties in Python. I’ll focus on the most recent one, the property function, which works only on new-style classes. Then I’ll give you a short description of how to implement properties with magic methods.

The property Function

class Rectangle:
    def __init__ (self):
        self.width = 0 
        self.height = 0 
        
    def set_size(self, size):
        self.width, self.height = size 
    
    def get_size(self):
        return self.width, self.height
    
    size = property(get_size, set_size)

a property is created with the property function with the accessor functions as arguments (the getter first, then the setter), and the name size is then bound to this property.

After this, you no longer need to worry about how things are implemented but can treat width, height, and size the same way.

r = Rectangle()
r.width = 10
r.height = 5
print(r.size)
r.size = 150, 100
print(r.width)

the size attribute is still subject to the calculations in get_size and set_size, but it looks just like a normal attribute.

In fact, the property function may be called with zero, one, three, or four arguments as well. If called without any arguments, the resulting property is neither readable nor writable. If called with only one argument (a getter method), the property is readable only. The third (optional) argument is a method used to delete the attribute (it takes no arguments). The fourth (optional) argument is a docstring. The parameters are called fget, fset, fdel, and doc — you can use them as keyword arguments if you want a property that, say, is only writable and has a docstring.

how property does its magic

the fact is that property isn’t really a function — it’s a class whose instances have some magic methods that do all the work. the methods in question are __get__, __set__, and __delete__. together, these three methods define the so-called descriptor protocol. an object implementing any of these methods is a descriptor. the special thing about descriptors is how they are accessed. For example, when reading an attribute (specifically, when accessing it in an instance, but when the attribute is defined in the class), if the attribute is bound to an object that implements __get__, the object won’t simply be returned; instead, the __get__ method will be called, and the resulting value will be returned. this is, in fact, the mechanism underlying properties, bound methods, static and class methods (see the following section for more information), and super.

Static Methods and Class Methods

Static methods and class methods are created by wrapping methods in objects of the staticmethod and classmethod classes

Static methods are defined without self arguments, and they can be called directly on the class itself. Class methods are defined with a self-like parameter normally called cls. You can call class methods directly on the class object too, but the cls parameter then automatically is bound to the class.

class MyClass:
    def smeth():
        print('This is a static method') 
    smeth = staticmethod(smeth) 
    
    def cmeth(cls):
        print('This is a class method of', cls) 
    cmeth = classmethod(cmeth)

The technique of wrapping and replacing the methods manually like this is a bit tedious. In Python 2.4, a new syntax was introduced for wrapping methods like this, called decorators. (They actually work with any callable objects as wrappers and can be used on both methods and functions.) You specify one or more decorators (which are applied in reverse order) by listing them above the method (or function), using the @ operator.

class MyClass:
    @staticmethod 
    def smeth():
        print('This is a static method')

    @classmethod 
    def cmeth(cls):
        print('This is a class method of', cls)

they do have their uses such as factory functions

1 2	MyClass.smeth() MyClass.cmeth()

1 2	This is a static method This is a class method of <class '__main__.MyClass'>

You can actually use the decorator syntax with properties as well.

`getattr`, `setattr`, and Friends

To have code executed when an attribute is accessed, you must use a couple of magic methods. The following four provide all the functionality you need (in old-style classes, you only use the last three):

__getattribute__(self, name): Automatically called when the attribute name is accessed. (This works correctly on new-style classes only.)
__getattr__(self, name): Automatically called when the attribute name is accessed and the object has no such attribute. 访问不存在的属性时调用
__setattr__(self, name, value): Automatically called when an attempt is made to bind the attribute name to value.
__delattr__(self, name): Automatically called when an attempt is made to delete the attribute name.

you can write code in one of these methods that deals with several properties.

class Rectangle:
    def __init__ (self):
        self.width = 0 
        self.height = 0 
        
    def __setattr__(self, name, value):
        if name == 'size':
            self.width, self.height = value 
        else:
            self.__dict__[name] = value 
    
    def __getattr__(self, name):
        if name == 'size':
            return self.width, self.height 
        else:
            raise AttributeError()

The __setattr__ method is called even if the attribute in question is not size. Therefore, the method must take both cases into consideration: if the attribute is size, the same operation is performed as before; otherwise, the magic attribute __dict__ is used. It contains a dictionary with all the instance attributes. It is used instead of ordinary attribute assignment to avoid having __setattr__ called again (which would cause the program to loop endlessly).
The __getattr__ method is called only if a normal attribute is not found, which means that if the given name is not size, the attribute does not exist, and the method raises an AttributeError. This is important if you want the class to work correctly with built-in functions such as hasattr and getattr. If the name is size, the expression found in the previous implementation is used.

Just as there is an “endless loop” trap associated with __setattr__, there is a trap associated with __getattribute__. Because it intercepts all attribute accesses (in new-style classes), it will intercept accesses to __dict__ as well! the only safe way to access attributes on self inside __getattribute__ is to use the __getattribute__ method of the superclass (using super).

(3) Iterators

__iter__ is the basis of the iterator protocol.

The Iterator Protocol

To iterate means to repeat something several times — what you do with loops.

The __iter__ method returns an iterator, which is any object with a method called __next__, which is callable without any arguments. When you call the __next__ method, the iterator should return its “next value.” If the method is called and the iterator has no more values to return, it should raise a StopIteration exception. There is a built-in convenience function called next that you can use, where next(it) is equivalent to it.__next__().

the iterator protocol is changed a bit in python 3. in the old protocol, iterator objects should have a method called next rather than __next__.

Example: the sequence of Fibonacci numbers, infinite length

class Fibs:
    def __init__(self):
        self.a = 0 
        self.b = 1 
        
    def __next__(self):
        self.a, self.b = self.b, self.a + self.b 
        return self.a 
        
    def __iter__(self):
        return self

In many cases, you would put the __iter__ method in another object, which you would use in the for loop. That would then return your iterator. It is recommended that iterators implement an __iter__ method of their own in addition (returning self, just as I did here), so they themselves can be used directly in for loops.

More formally, an object that implements the __iter__ method is iterable, and the object implementing next is the iterator.

Making Sequences from Iterators

class TestIterator:
    value = 0
    
    def __next__(self):
        self.value += 1 
        if self.value > 10: 
            raise StopIteration 
        return self.value 
    
    def __iter__(self):
        return self

ti = TestIterator()
list(ti)

iterator 和 iterable 的区别
可以直接作用于for循环的数据类型有以下几种：
一类是集合数据类型，如list、tuple、dict、set、str等；
另一类是generator，包括生成器和带yield的generator function。
这些可以直接作用于for循环的对象统称为可迭代对象：Iterable。

判断是不是可迭代，用Iterable
判断是不是迭代器，用iterator

from collections import Iterable, Iterator
isinstance({}, Iterable})) -> True
isinstance(100, Iterable) --> False
isinstance({}, Iterator})) -> False
isinstance( (x for x in range(10)), Iterator)  --> True

凡是可以for循环的，都是Iterable
凡是可以next()的，都是Iterator

集合数据类型如list，truple，dict，str，都是Itrable不是Iterator，但可以通过iter()函数获得一个Iterator对象

而生成器不但可以作用于for循环，还可以被next()函数不断调用并返回下一个值，直到最后抛出StopIteration错误表示无法继续返回下一个值了。
可以被 next() 函数调用并不断返回下一个值的对象称为迭代器：Iterator。

这是因为Python的Iterator对象表示的是一个数据流，Iterator对象可以被next()函数调用并不断返回下一个数据，直到没有数据时抛出StopIteration错误。可以把这个数据流看做是一个有序序列，但我们却不能提前知道序列的长度，只能不断通过next()函数实现按需计算下一个数据，所以Iterator的计算是惰性的，只有在需要返回下一个数据时它才会计算。

Python的for循环本质上就是通过不断调用next()函数实现的

(4) Generators

A generator is a kind of iterator that is defined with normal function syntax.

Making a Generator

Any function that contains a yield statement is called a generator.

def flatten(nested):
    for sublist in nested:
        for element in sublist:
            yield element

The difference is that instead of returning one value, as you do with return, you can yield several values, one at a time. Each time a value is yielded (with yield), the function freezes; that is, it stops its execution at exactly that point and waits to be reawakened.

in python 2.4, a relative of list comprehension (see Chapter 5) was introduced: generator comprehension (or generator expressions). it works in the same way as list comprehension, except that a list isn’t constructed (and the “body” isn’t looped over immediately).

A Recursive Generator

The generator I designed in the previous section could deal only with lists nested two levels deep, and to do that it used two for loops

def flatten(nested):
   try:
       for sublist in nested:
           for element in flatten(sublist): 
               yield element 
   except TypeError:
       yield nested

When flatten is called, you have two possibilities (as is always the case when dealing with recursion): the base case and the recursive case. In the base case, the function is told to flatten a single element (for example, a number), in which case the for loop raises a TypeError (because you’re trying to iterate over a number), and the generator simply yields the element.

1	list(flatten([[[1], 2], 3, 4, [5, [6, 7]], 8]))

There is one problem with this, however. If nested is a string or string-like object, it is a sequence and will not raise TypeError, yet you do not want to iterate over it.

there are two main reasons why you shouldn’t iterate over string-like objects in the flatten function.
First, you want to treat string-like objects as atomic values, not as sequences that should be flattened.
second, iterating over them would actually lead to infinite recursion because the first element of a string is another string of length one, and the first element of that string is the string itself!

def flatten(nested):
    try:
        # Don't iterate over string-like objects: 
        try: 
            nested + '' 
        except TypeError: 
            pass 
        else: 
            raise TypeError 
        for sublist in nested:
            for element in flatten(sublist): 
                yield element 
    except TypeError:
        yield nested

if the expression nested + ‘’ raises a TypeError, it is ignored; however, if the expression does not raise a TypeError, the else clause of the inner try statement raises a TypeError of its own. This causes the string-like object to be yielded as is.

there is no type checking going on here. I don’t test whether nested is a string, only whether it behaves like one
A natural alternative to this test would be to use isinstance with some abstract superclass for strings and string-like objects, but unfortunately there is no such standard class. And type checking against str would not work even for UserString.

Generators in General

You’ve seen that a generator is a function that contains the keyword yield. When it is called, the code in the function body is not executed. Instead, an iterator is returned. Each time a value is requested, the code in the generator is executed until a yield or a return is encountered. A yield means that a value should be yielded. A return means that the generator should stop executing

In other words, generators consist of two separate components: the generator-function and the generator-iterator.

Generator Methods

We may supply generators with values after they have started running, by using a communications channel between the generator and the “outside world,” with the following two end points:

The outside world has access to a method on the generator called send, which works just like next, except that it takes a single argument (the “message” to send—an arbitrary object).
Inside the suspended generator, yield may now be used as an expression, rather than a statement. In other words, when the generator is resumed, yield returns a value—the value sent from the outside through send. If next was used, yield returns None.

Note that using send (rather than next) makes sense only after the generator has been suspended (that is, after it has hit the first yield). If you need to give some information to the generator before that, you can simply use the parameters of the generator-function.

if you really want to use send on a newly started generator, you can use it with None as its parameter.

Example：

def repeater(value):
    while True:
        new = (yield value) 
        if new is not None: 
           value = new

r = repeater(42)
next(r) # 42
r.send("Hello, world") # 'Hello, world'

Note the use of parentheses around the yield expression. While not strictly necessary in some cases, it is probably better to be safe than sorry and simply always enclose yield expressions in parentheses if you are using the return value in some way.

Generators also have two other methods.

The throw method (called with an exception type, an optional value, and traceback object) is used to raise an exception inside the generator (at the yield expression).
The close method (called with no arguments) is used to stop the generator.

The close method (which is also called by the Python garbage collector, when needed) is also based on exceptions. It raises the GeneratorExit exception at the yield point, so if you want to have some cleanup code in your generator, you can wrap your yield in a try/finally statement. If you wish, you can also catch the GeneratorExit exception, but then you must reraise it (possibly after cleaning up a bit), raise another exception, or simply return. Trying to yield a value from a generator after close has been called on it will result in a RuntimeError.

Simulating Generators

If you need to use an older version of Python, generators aren’t available. What follows is a simple recipe for simulating them with normal functions.

def flatten(nested):
    result = [] 
    try:
        # Don't iterate over string-like objects: 
        try: 
            nested + '' 
        except TypeError: 
            pass 
        else: 
            raise TypeError 
        for sublist in nested:
            for element in flatten(sublist): 
                result.append(element) 
    except TypeError:
        result.append(nested) 
    return result

this may not work with all generators, it works with most. (For example, it fails with infinite generators, which of course can’t stuff their values into a list.)

(5) The Eight Queens

Generators and Backtracking

Generators are ideal for complex recursive algorithms that gradually build a result. With generators, all the recursive calls need to do is yield their part. That is what I did with the preceding recursive version of flatten, and you can use the exact same strategy to traverse graphs and tree structures.

The Problem

you have a chessboard and eight queen pieces to place on it. The only requirement is that none of the queens threatens any of the others; that is, you must place them so that no two queens can capture each other. How do you do this? Where should the queens be placed?

This is a typical backtracking problem

In the problem as stated, you are provided with information that there will be only eight queens, but let’s assume that there can be any number of queens. (This is more similar to real-world backtracking problems.) How do you solve that?

State Representation

tuple

if state[0] == 3, you know that the queen in row one is positioned in column four
state tuple length is less than eight

Finding Conflicts

def conflict(state, nextX):
   nextY = len(state) 
   for i in range(nextY):
       if abs(state[i] - nextX) in (0, nextY - i):
           return True 
   return False

The Base Case

Note that this solution isn’t particularly efficient, so with a very large number of queens, it might be a bit slow.

def queens(num, state):
    if len(state) == num-1:
        for pos in range(num):
            if not conflict(state, pos): 
                yield pos

“If all queens but one have been placed, go through all possible positions for the last one, and return the positions that don’t give rise to any conflicts.” The num parameter is the number of queens in total, and the state parameter is the tuple of positions for the previous queens.

The Recursive Case

def queens(num, state):
    if len(state) == num-1:
        for pos in range(num):
            if not conflict(state, pos): 
                yield pos
    else:
        for pos in range(num):
            if not conflict(state, pos):
                for result in queens(num, state + (pos,)): 
                    yield (pos,) + result

The for pos and if not conflict parts of this are identical to what you had before, so you can rewrite this a bit to simplify the code. Let’s add some default arguments as well.

def queens(num=8, state=()):
    for pos in range(num):
        if not conflict(state, pos):
            if len(state) == num-1: 
                yield (pos,)
            else:
                for result in queens(num, state + (pos,)):
                    yield (pos,) + result

The queens generator gives you all the solutions

list(queens(3)) # []
list(queens(4)) # [(1, 3, 0, 2), (2, 0, 3, 1)]
for solution in queens(5):
    print(solution)

(0, 2, 4, 1, 3)
(0, 3, 1, 4, 2)
(1, 3, 0, 2, 4)
(1, 4, 2, 0, 3)
(2, 0, 3, 1, 4)
(2, 4, 1, 3, 0)
(3, 0, 2, 4, 1)
(3, 1, 4, 2, 0)
(4, 1, 3, 0, 2)
(4, 2, 0, 3, 1)

Wrapping It Up

def prettyprint(solution):
    def line(pos, length=len(solution)):
        return '. ' * (pos) + 'X ' + '. ' * (length-pos-1) 
    for pos in solution:
        print(line(pos))

made a little helper function inside prettyprint. I put it there because I assumed I wouldn’t need it anywhere outside.

1 2	import random prettyprint(random.choice(list(queens(8))))

. . . X . . . .
X . . . . . . .
. . . . X . . .
. . . . . . . X
. . . . . X . .
. . X . . . . .
. . . . . . X .
. X . . . . . .

10. Batteries Included

show you a bit about how modules work and how to explore them and learn what they have to offer.
offers an overview of the standard library
focusing on a few selected useful modules.

(1) Modules

Any Python program can be imported as a module.

tell your interpreter where to look for the module by executing the following

1	sys.path.append()

In UNIX, you cannot simply append the string ‘~/python’ to sys.path . You must use the full path (such as ‘/home/yourusername/python’ ) or, if you want to automate it, use sys.path.expanduser(‘~/python’) .

When you import a module, you may notice the appearance of a new directory, called __pycache__ , alongside your source file. (In older versions, you’ll see files with the suffix .pyc instead.) This directory contains files with processed files that Python can handle more efficiently. If you import the same module later, Python will import these files and then your .py file, unless the .py file has changed; in that case, a new processed file is generated. Deleting the pycache directory does no harm—a new one is created as needed.

importing a module several times has the same effect as importing it once
Because modules aren’t really meant to do things (such as printing text) when they’re imported. They are mostly meant to define things, such as variables, functions, classes, and so on. And because you need to define things only once

可以避免循环import

If you insist on reloading your module, you can use the reload function from the importlib module.
If you’ve created an object x by instantiating the class Foo from the module bar and you then reload bar , the object x refers to will not be re-created in any way. x will still be an instance of the old version of Foo (from the old version of bar ). If, instead, you want x to be based on the new Foo from the reloaded module, you will need to create it anew.

they (just like classes) keep their scope around afterward. That means that any classes or functions you define, and any variables you assign a value to, become attributes of the module.

Adding Test Code in a Module(Chap 16 for more on test)

def hello():
    print("Hello, world!")
def test():
    hello()
if __name__ == '__main__': 
    test()

Making Your Modules Available

Python Packaging Authority
Python Packaging User Guide, available at packaging.python.org.

1> Putting Your Module in the Right Place

1 2	import sys, pprint pprint.pprint(sys.path)

If you have a data structure that is too big to fit on one line, you can use the pprint function from the pprint module instead of the normal print statement. pprint is a pretty-printing function, which makes a more intelligent printout.

2> Telling the Interpreter Where to Look

Putting your module in the correct place might not be the right solution for you for a number of reasons.

You don’t want to clutter the Python interpreter’s directories with your own modules.
You don’t have permission to save files in the Python interpreter’s directories.
You would like to keep your modules somewhere else.

one way of doing this is to modify sys.path directly, but that is not a common way to do it. The standard method is to include your module directory (or directories) in the environment variable PYTHONPATH.

For an alternative to using the PYTHONPATH environment variable, you might want to consider so-called path configuration files. These are files with the extension .pth, located in certain particular directories and containing names of directories that should be added to sys.path. For details, please consult the standard library documentation for the site module.

Package

To structure your modules, you can group them into packages. A package is basically just another type of module. The interesting thing about them is that they can contain other modules. While a module is stored in a file (with the file name extension .py), a package is a directory.

To make Python treat it as a package, it must contain a file named __init__.py. The contents of this file will be the contents of the package, if you import it as if it were a plain module.

在这样的布局下，有以下几种合法的写法

1
2
3

import drawing             # Imports the drawing package
import drawing.colors      # Imports the colors module
from drawing import shapes # Imports the shapes module

After the first statement, the contents of the init.py file in drawing would be available; the shapes and colors modules, however, would not be. After the second statement, the colors module would be available, but only under its full name, drawing.colors. After the third statement, the shapes module would be available, under its short name (that is, simply shapes).

(2) Exploring Modules

To find out what a module contains, you can use the dir function, which lists all the attributes of an object (and therefore all functions, classes, variables, and so on, of a module).

What’s in a Module?

dir

1 2	import copy dir(copy)

['Error',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_copy_dispatch',
 '_copy_immutable',
 '_deepcopy_atomic',
 '_deepcopy_dict',
 '_deepcopy_dispatch',
 '_deepcopy_list',
 '_deepcopy_method',
 '_deepcopy_tuple',
 '_keep_alive',
 '_reconstruct',
 'copy',
 'deepcopy',
 'dispatch_table',
 'error']

__all__

1	copy.__all__

1	['Error', 'copy', 'deepcopy']

help

1	help(copy.copy)

help on function copy in module copy:

copy(x)
    Shallow copy operation on arbitrary Python objects.
    
    See the module's __doc__ string for more info.

The advantage of using help over just examining the docstring directly like this is that you get more information, such as the function signature (that is, the arguments it takes).

Documentation

it’s often much quicker to just examine the module a bit yourself first. For example, you may wonder, “What were he arguments to range again?” Instead of searching through a Python book or the standard Python ocumentation for a description of range, you can just check it directly.

1	print(range.__doc__)

range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).

However, not every module and every function has a good docstring
https://docs.python.org/library

Use the Source

1	print(copy.__file__)

1	/home/ppp/anaconda3/envs/python-3.6/lib/python3.6/copy.py

Note that some modules don’t have any Python source you can read. They may be built into the interpreter (such as the sys module), or they may be written in the C programming language. Ref: Chap17

(3) The Standard Library: A Few Favorites

sys

Function/Variable	Description
argv	The command-line arguments, including the script name
exit([arg])	Exits the current program, optionally with a given return value or error message
modules	A dictionary mapping modelu names to loaded modules
path	A list of directory names where modules can be found
platform	A platform identifier such as sumos5 or win32
stdin	Standard input stream — a file-like object
stdout	Standard output stream — a file-like object
stderr	Standard error stream — a file-like object

os

Function/Cariable	Description
environ	Mapping with environment variables
system(command)	Executes an operating system command in a subshell
sep	Separator used in paths
pathsep	Separator to separate paths
linesep	Line separator(“\n”, “\r” or “\r\n”)
urandom(n)	Return n bytes of crytographically strong random data

The os.system function is useful for a lot of things, but for the specific task of launching a web browser, there’s an even better solution: the webbrowser module. It contains a function called open , which lets you automatically launch a web browser to open the given URL.

1 2	import webbrowser webbrowser.open("https://chengzhaoxi.xyz")

fileinput

Function	Description
input([files[, inplace[, backup]]])	Facilitates iteration voer lines in multiple input streams
filename()	Return the name of the current file
lineno()	Return the current(cumulative) line number
filelineno()	Return the line number within current file
isfirstline()	Checks whether the current line is first in file
isstdin()	Checks whether the last line was from sys.stdin
nextfile()	Closed the current file and moves to the next
close()	Closes the sequence

Sets, Heaps, and Deques

Set

Sets are constructed from a sequence (or some other iterable object), or specified explicitly with curly braces. Note that you can’t specify an empty set with braces, as you then end up with an empty dictionary.

Set operations

union(|)
intersection(&)
issubset(<=)
issuperset(>=)
difference(-)
symmetric_difference(^)
copy()
add()
remove()

Sets are mutable and may therefore not be used as keys in dictionaries. Another problem is that sets themselves may contain only immutable (hashable) values and thus may not contain other sets.

Because sets of sets often occur in practice, this could be a problem. Luckily, there is the frozenset type, which represents immutable (and, therefore, hashable) sets.

Heap

only a module with some heap-manipulatingfunctions. The module is called heapq

heappush(heap, x): Pushes x onto the heap
heappop(heap) Pops off the smallest element in the heap
heapify(heap) Enforces the heap property on an arbitrary list
heapreplace(heap, x) Pops off the smallest element and pushes x
nlargest(n, iter) Returns the n largest elements of iter
nsmallest(n, iter) Returns the n smallest elements of iter

you shouldn’t use it on any old list—onlyone that has been built through the use of the various heap functions.

列表中的元素次序很重要，需要保证 i 位置的值小鱼 2*i 和 2*i + 1 位置的值
The heappop function pops off the smallest element, which is always found at index 0, and makes sure
that the smallest of the remaining elements takes over this position

The heapify function takes an arbitrary list and makes it a legal heap

Deque

from collections import deque

q = deque(range(5))
q.append(5)
q.appendleft(6)
q.pop()
q.popleft()
q.rotate(3)
q.rotate(-1)

time

The field of Python Date Tuples

Field	Value
Year	2000, 2001, …
Month	1-12
Day	1-31
Hour	0-23
Minute	0-59
Second	0-61
Weekday	0-6, Monday is 0
Julian	1-366
Daylight savings	0, 1, -1

Some important functions

Function	Description
asctime([tuple])	Converts a time tuple to a string
localtime([secs])	Converts seconds to a date tuple, local time
mktime(tuple)	Converts a time tuple to local time
sleep(secs)	sleeps(does nothing) for secs seconds
strptime(string[, format])	Parsed a string into a time tuple
time()	current time(seconds since the epoch, UTC)

random

If you need real randomness (for cryptography or anything security-related, for example), you should check out the urandom function of the os module. The class SystemRandom in the random module is based on the same kind of functionality and gives you data that is close to real randomness.

Function	Description
random()	Returns a random real number n such that 0 <= n <= 1
getrandbits(n)	Returns n random bits, in the form of a long integer
uniform(a, b)	Returns a random real number n such that a <= n <= b
randrange([start], stop, [step])	Returns a random real number from range(start, stop, step)
choice(seq)	Returns a random element from the Sequence seq
shuffle(seq[, random])	Shuffles the Sequence seq in place
sample(seq, n)	Chooses n random, unique elements from the Sequence seq

For the statistically inclined, there are other functions similar to uniform that return random numbers sampled according to various other distributions, such as betavariate, exponential, Gaussian, and several others.

shelve and json

The only function of interest in shelve is open. When called (with a file name), it returns a Shelf object, which you can use to store things. Just treat it as a normal dictionary (except that the keys must be strings), and when you’re finished (and want things saved to disk), call its close method.

re

Escaping Special Characters

1	'python\\.org'

两个\, 一个是正则表达式需要的，一个是Python字符串需要的
If you are tired of doubling up backslashes, use a raw string, such as r'python\.org'.

Function	Description
compile(pattern[, flags])	Create a pattern object from a string with a regular expression
search(pattern, string[, flags])	Search for pattern in string
match(pattern, string[, flags])	Matches pattern at the beginning of string
split(pattern, string[, maxsplit=0])	Split a string by occurrences of pattern
findall(pattern, string)	Returns a list of all occurrences of pattern in string
sub(pat, repl, string[, count=0])	Substitutes occurrences of pat in string with repl
escape(string)	Escapes all special regular expression characters in string

The function re.search searches a given string to find the first substring, if any, that matches the given regular expression. If one is found, a MatchObject (evaluating to true) is returned; otherwise, None (evaluating to false) is returned.

Match Object and Group

A group is simply a subpattern that has been enclosed in parentheses. The groups are numbered by their left parenthesis. Group zero is the entire pattern

Method	Description
group[group1, ..]	Retrieves the occurrences of the given subpatterns (groups)
start([group])	Returns the starting position of the occurrence of a given group
end([group])	Returns the ending position(an exclusive limit, as in slices) of the occurrence of a given group
span([group])	Returns both the beginning and ending positions of group

m = re.match(r"www\.(.*)\..{3}", "www.python.org")
print(type(m))
print(m.group(1))
print(m.start(1))
print(m.end(1))

<class '_sre.SRE_Match'>
python
4
10

Other Interesting Standard Modules

argparse
cmd
csv
datetime
enum
functools
itertools
logging
statistics
timeit, profile, trace

编程Python