Wednesday, December 2, 2015

effecive python

The Pylint tool (http://www.pylint.org/) is a popular static analyzer for Python
source code. Pylint provides automated enforcement of the PEP 8 style guide and
detects many other types of common errors in Python programs.

Python supports closures: functions that refer to variables from the scope in which
they were defined. This is why the helper function is able to access the group
argument to sort_priority.
Functions are first-class objects in Python, meaning you can refer to them directly,
assign them to variables, pass them as arguments to other functions, compare them
in expressions and if statements, etc. This is how the sort method can accept a
closure function as the key argument.
Python has specific rules for comparing tuples. It first compares items in index zero,
then index one, then index two, and so on. This is why the return value from the
helper closure causes the sort order to have two distinct groups


This shows that the system calls will all
run in parallel from multiple Python threads even though they’re limited by the GIL. The
GIL prevents my Python code from running in parallel, but it has no negative effect on
system calls. This works because Python threads release the GIL just before they make
system calls and reacquire the GIL as soon as the system calls are done.


Python can work around all these issues with coroutines. Coroutines let you have many
seemingly simultaneous functions in your Python programs. They’re implemented as an
extension to generators (see Item 16: “Consider Generators Instead of Returning Lists”).
The cost of starting a generator coroutine is a function call. Once active, they each use less
than 1 KB of memory until they’re exhausted.


Although it looks simple to the programmer, the multiprocessing module and
ProcessPoolExecutor class do a huge amount of work to make parallelism possible.
In most other languages, the only touch point you need to coordinate two threads is a
single lock or atomic operation. The overhead of using multiprocessing is high
because of all of the serialization and deserialization that must happen between the parent
and child processes.

multiprocessing provides more advanced
facilities for shared memory, cross-process locks, queues, and proxies. But all of these
features are very complex. It’s hard enough to reason about such tools in the memory
space of a single process shared between Python threads

You can start by using the
ThreadPoolExecutor class to run isolated, high-leverage functions in threads. Later,
you can move to the ProcessPoolExecutor to get a speedup. Finally, once you’ve
completely exhausted the other options, you can consider using the multiprocessing
module directly

Moving CPU bottlenecks to C-extension modules can be an effective way to
improve performance while maximizing your investment in Python code. However,
the cost of doing so is high and may introduce bugs.
The multiprocessing module provides powerful tools that can parallelize
certain types of Python computation with minimal effort.
The power of multiprocessing is best accessed through the
concurrent.futures built-in module and its simple
ProcessPoolExecutor class


No comments:

Post a Comment