Advanced Python for Data Scientists
Advanced Python tips you should be using but probably aren’t.
Topics:
- Programming Essentials: Logging, Type hinting, Generators, Argument parsing & Decorators.
- Class Behaviour: Inheritance, Dunders, Operator Overloading, Encapsulation & Abstract Classes.
- Introduction to Design Patterns: Singleton, Factory, Proxy & Composit.
Programming Essentials
Logging
When monitoring & debugging, most self-taught programmers start off by either writing a custom logger module or (God forbid) just printing everything imaginable.
Making use of the built-in logging
& (pip
installable) coloredlogs
can go along with minimal effort.
Recall that there are 5 security levels when logging:
DEBUG: - for devs, troubleshoot and testing.
INFO: - informational about current state.
WARNING - system doesn't crash but could cause a problem. Example: RAM limit approaching.
ERROR - couldn't do some operation, but the system can still runs.
CRITICAL - the entire system is impaired.
Changing the security level will dictate how much the script will log to the console/write to a Handler. Implementing a logger consists of a handler
(that writes the log file); a formatter
(optional, to format output); and a logger instance
.
import logging, coloredlogs
# configuration
formatter = logging.Formatter("[%(asctime)s] [%(levelname)8s] --- %(message)s (%(filename)s:%(lineno)s)", "%Y-%m-%d %H:%M:%S")
logger = logging.getLogger('My logger.')
handler = logging.FileHandler('mylogs.log')
handler.setLevel(logging.DEBUG)
handler.setFormatter(formatter)
logger.addHandler(handler)
coloredlogs.install(level='DEBUG', logger=logger)
# examples
logger.debug("this is a DEBUG message")
logger.info("this is an INFO message")
logger.warning("this is a WARNING message")
logger.error("this is an ERROR message")
logger.critical("this is a CRITCAL message")
Type Hinting
Python is dynamically typed. It is often useful, however, to monitor the type of variables. Type hinting
is regularly used but it does not force any compilation issues, thus can be ignored. Conditional statements can be used to write exceptions or handle undesired types
or returns
at runtime, but this may unnecessarily bloat the code.
mypy
is a package designed to address this issue during development. Function arguments and return types
can be added in standard fashion. We then run a script from a terminal using mypy script.py
(as opposed to python script.py
) — automatically checking that types
match their respective expectations.
For example, the code below compiles without error, yet the function is erroneously passed a string
where an int
is expected.
def func(arg_1:int, arg_2:list[int]) -> tuple:
return arg_1, arg_2
func('12', [1,2,3])
During development, one could run the script with mypy script.py
, catching the error:
Generators
Generators use “lazy execution” to avoid executing large iterables until they are desired. The all-too-familiar, range()
function is an example of a generator. That is, it does not compute the full range of items and store them in memory, but rather stores the function itself (with ‘state’) and can then generate the next step in the sequence as required.
This can be seen by viewing the size (bytes) of two range()
instantiations: the first containing a single element, the second containing 1 million elements. Compare the size of the generate r
with the size of the corresponding list l
.
import sys
# range items (generators)
r1 = range(1)
r2 = range(1000000)
# convert to list
l1 = list(r1)
l2 = list(r2)
def ptr(obj, name):
print(f'object:{name} is {sys.getsizeof(obj):>7} bytes.')
ptr(r1, 'r1')
ptr(r2, 'r2')
ptr(l1, 'l1')
ptr(l2, 'l2')
In many cases, it may be advantageous to write your own generator. This can be achieved with the yeild
statement in python. Generators work into perpetuity, making infinite sequences possible. For example, here is a generator that returns the powers of x
into perpetuity.
def power_x(x):
y = 0
while True:
yield x**y
y += 1
value = power_x(4)
print(next(value))
print(next(value))
print(next(value))
print(next(value))
print(next(value))
Argument Parsing
We often want to execute a script from the command line and we may wish to include arguments. sys.argv
returns a list of arguments passed to a script (where sys.argv[0]
is the filename).
Suppose we wish to use optional arguments and not a simple list of positional arguments. The built-in getopt
package can be used to include optional arguments. For example:
import getopt
import sys
opts, args = getopt.getopt(
sys.argv[1:],
'f:m:',
['filename','message']
)
print('opt: ', opts)
print('args: ', args)
Decorators
As the name implies, decorators are used to decorate
functions (add additional functionality). They take the form:
def decorator(function):
def wrapper(*args, **kwargs):
# add decorations
rt = function(*args, **kwargs)
return rt
return wrapper
@decorator
def function(x):
# some function
pass
Where some additional logic can be contained in the decorator.wrapper()
function. The @decorator
line above the function is equivalent to calling:
decorator(function)(*args, **kwargs)
For example, we may build a decorator that times a function:
import time
def timer_decorator(function):
def wrapper(*args, **kwargs):
start = time.time()
rt = function(*args, **kwargs)
end = time.time()
print(f'function: {function.__name__} was executed in {end-start} seconds.')
return rt
return wrapper
@timer_decorator
def funct(x):
a = 1
for i in range(1,x):
a *= i
return a
funct(10000)
Class Behaviour
Inheritance
Inheritance (Included for completeness) allows a child class to inherit all the properties of a parent class. For example, a Ferrari
class may have all the properties of a car
class, but also contain individual properties.
# Inheretence
class Car:
def __init__(self, type, age) -> None:
self.type = type
self.age = age
class Ferrari(Car):
def __init__(self, type, age, speed) -> None:
super(Ferrari, self).__init__(type, age)
self.speed = speed
Class Variables
The purpose of OOP programming is to couple data and operations into objects (classes). A particular class instance contains specific data that defines the instance. It is also possible to associate data with the class
level object. We may wish to keep track of certain data with respect to all instances of a class type.
class Base:
count = 0
def __init__(self) -> None:
Base.count += 1
[Base() for _ in range(5)]
print('Instances: ', Base.count)
Static Methods: the same principle can be applied to methods by defining static methods. These are methods that can be classed on the class directly without instantiation.
class Base:
@staticmethod
def mymethod():
# some function
pass
Base.mymethod()
Dunders
Dunder (double underscore) methods define built in behaviour. Standard behaviour can be overridden by including dunder methods in a class definition. The constructor (__init__
), destructor (__del__
) & string type cast (__str__
) are common examples of this.
Importantly, if you want to access the default dunder behaviour, super().__dunder__()
can be used. Another under-utilised dunder is the __call__
method, which makes an object callable. More concretely,
class Example:
def __str__(self) -> str:
txt = super().__str__()
txt += ' - added to text...'
return txt
def __call__(self, *args, **kwds):
print(f'\n{self.__class__.__name__} was called!')
ex = Example()
print(ex)
ex()
Operator Overloading
A special case to dunders, operator Overloading allow us to define how objects interact with operators (+,-,*,÷)
. Many forms of programming (including my native data science) are mathematic in nature. Overridding default operator behaviour can be used to specify how class objects should interact. Consider the +
operator (modified with the __add__
dunder):
class Vector:
def __init__(self,x,y) -> None:
self.x, self.y = x, y
def __add__(self, other):
return Vector(self.x+other.x, self.y+other.y)
def __str__(self) -> str:
return ' x:{}, y:{}'.format(self.x, self.y)
v1 = Vector(1,2)
v2 = Vector(5,4)
v3 = v1 + v2
print(v1)
print(v2)
print(v3)
Encapsulation
Sometimes we want to obsure or limit some class information. Suppose we have a class Person
with a property age
. The double understore naming convention makes a variable private:
class Person:
def __init__(self, age) -> None:
self.__age = age
As a result, we cannot change the age
variable at a later stage:
p1 = Person()
p1.__age
We may add a property
to the class that allows us to interact with this variable.
@property
def Age(self):
return self.__age
We can then add a Property.setter
method (using the same method name) to update the property. Putting it all together yields:
class Person:
def __init__(self, age) -> None:
self.__age = age
@property
def Age(self):
return self.__age
@Age.setter
def Age(self, value):
if value <= 0:
return
else:
self.__age = value
p1 = Person(20)
print(p1.Age)
p1.Age = -2342
print(p1.Age)
p1.Age = 2342
print(p1.Age)
Notice that we prohibited age from being set to a negative value. By this way we can control and restrict class properties.
Abstract Classes
Abstract classes — defined as classes that cannot be instantiated — define a blueprints for building consistent classes. Suppose we want to define a set of classes, that all implement the same set of methods. All subclasses can inherit from an abstract class. That way if any subclass fails to implement a required method an exception will be raised.
from abc import ABCMeta, abstractstaticmethod
class IAbstractClass(metaclass=ABCMeta): @abstractstaticmethod
def method_01():
passclass child_class(IAbstractClass): @staticmethod
def method_01():
print('required method implemented!')
If method_01()
was not implemented in the child_class, a compilation exception would be raised.
To improve system reliability, abstract classes are often used when defining design patterns.
Introduction to Design Patterns
Good software engineering is about designing reliable systems at runtime. This requires thoughtful design decisions on both the architectural and code level.
Although by no means extensive, here are some (Gang of 4) software design patterns to improve your code. Read more about design patterns on Refactoring.Guru.
Singleton
As the name suggests, the Singleton design pattern adds a restriction to a class object such that it may only be instantiated once. Wonderfully articulated by Refactoring.Guru this may be desired to ensure a variable is not overwritten or limit access to shared resources.
The key is the Singleton
class variable: __instance
. The first time a singleton class is instantiated it is set to self
. That way Singleton is initialised for a second time, an exception will be raised.
from abc import ABCMeta, abstractstaticmethod
class IAbstractClass(metaclass=ABCMeta):
@abstractstaticmethod
def print_data():
pass
class Singleton(IAbstractClass):
__instance = None
def __init__(self) -> None:
if Singleton.__instance is not None:
raise Exception('Singleton cannot be instantiated more than once.')
super().__init__()
Singleton.__instance = self
@staticmethod
def get_instance():
if Singleton.__instance is not None:
return Singleton.__instance
return 'No instance of Singleton.'
@staticmethod
def print_data():
print(f'{Singleton.__instance}')
Factory
Suppose we want to delay the categorisation of a class until more information (some calculation, data or user input) is complete. The factory method delays specifying the class definition, using a Factory
class to generate a sub-classes.
When implementing a factory design pattern, we implement a Factory
class that builds the desired subclass given some data. Following the abstract class methodology, a factory
design pattern may be implemented as:
from abc import ABCMeta, abstractstaticmethod
class IAbstractClass(metaclass=ABCMeta):
@abstractstaticmethod
def print_class_type():
"""Interface Method"""
pass
class subclass_01(IAbstractClass):
def __init__(self) -> None:
super().__init__()
def print_class_type(self):
print('Class 01')
class subclass_02(IAbstractClass):
def __init__(self) -> None:
super().__init__()
def print_class_type(self):
print('Class 02')
class Factory:
@staticmethod
def build_subclass(class_type):
if class_type == 'a':
return subclass_01()
return subclass_02()
if __name__ == '__main__':
choice = input('Type of class?\n')
person = Factory.build_subclass(choice)
person.print_class_type()
Proxy
Much like decorators, the proxy design pattern wraps an additional layer of functionality around a class. Likely to increase modularity or delay the execution of the core class until more resources are available. A proxy is a middle-man that handles the implementation of the core class.
from abc import ABCMeta, abstractstaticmethod
class IAbstractClass(metaclass=ABCMeta):
@abstractstaticmethod
def class_method():
"""Implement in class."""
class CoreClass(IAbstractClass):
def class_method(self):
print('Core class executed...')
class ProxyClass(IAbstractClass):
def __init__(self) -> None:
super().__init__()
self.core = CoreClass()
def class_method(self):
print('Proxy class executed...')
self.core.class_method()
Composit
As per intuition, the composite class is simply a collection of subclasses that form a larger class. Naturally fitting into much of how the corporate world is organised, it is intuitive and straightforward to implement.
from abc import ABCMeta, abstractmethod, abstractstaticmethod
class IAbstractModule(metaclass=ABCMeta):
@abstractmethod
def __init__(self, nodes) -> None:
"""Implement in child class"""
@abstractstaticmethod
def print_data():
"""Implement in child class"""
class child_01(IAbstractModule):
def __init__(self, nodes) -> None:
self.nodes = nodes
def print_data(self):
print('child 01 has {} nodes.'.format(self.nodes))
class child_02(IAbstractModule):
def __init__(self, nodes) -> None:
self.nodes = nodes
def print_data(self):
print('child 02 has {} nodes.'.format(self.nodes))
class ParentClass(IAbstractModule):
def __init__(self, nodes) -> None:
# super().__init__(nodes)
self.nodes = nodes
self.base_nodes = nodes
self.children = []
def add_child(self, child):
self.children.append(child)
self.nodes += child.nodes
def print_data(self):
s = '\n'+'...'*10; print(s)
print('Core nodes: {}'.format(self.base_nodes))
for chd in self.children:
chd.print_data()
print('Total nodes: {}'.format(self.nodes))
s = '...'*10+'\n'; print(s)
# implement
c1 = child_01(21)
c2 = child_02(43)
p1 = ParentClass(6)
p1.add_child(c1)
p1.add_child(c2)
p1.print_data()
Fin 👋.