Advanced Python for Data Scientists

Zach Wolpe
9 min readFeb 2, 2023

--

Advanced Python tips you should be using but probably aren’t.

Topics:

  1. Programming Essentials: Logging, Type hinting, Generators, Argument parsing & Decorators.
  2. Class Behaviour: Inheritance, Dunders, Operator Overloading, Encapsulation & Abstract Classes.
  3. Introduction to Design Patterns: Singleton, Factory, Proxy & Composit.

Programming Essentials

Logging

When monitoring & debugging, most self-taught programmers start off by either writing a custom logger module or (God forbid) just printing everything imaginable.

Making use of the built-in logging & (pip installable) coloredlogs can go along with minimal effort.

Recall that there are 5 security levels when logging:

DEBUG:       - for devs, troubleshoot and testing.
INFO: - informational about current state.
WARNING - system doesn't crash but could cause a problem. Example: RAM limit approaching.
ERROR - couldn't do some operation, but the system can still runs.
CRITICAL - the entire system is impaired.

Changing the security level will dictate how much the script will log to the console/write to a Handler. Implementing a logger consists of a handler (that writes the log file); a formatter (optional, to format output); and a logger instance.

import logging, coloredlogs

# configuration
formatter = logging.Formatter("[%(asctime)s] [%(levelname)8s] --- %(message)s (%(filename)s:%(lineno)s)", "%Y-%m-%d %H:%M:%S")
logger = logging.getLogger('My logger.')
handler = logging.FileHandler('mylogs.log')
handler.setLevel(logging.DEBUG)
handler.setFormatter(formatter)
logger.addHandler(handler)
coloredlogs.install(level='DEBUG', logger=logger)

# examples
logger.debug("this is a DEBUG message")
logger.info("this is an INFO message")
logger.warning("this is a WARNING message")
logger.error("this is an ERROR message")
logger.critical("this is a CRITCAL message")
Custom logger output.

Type Hinting

Python is dynamically typed. It is often useful, however, to monitor the type of variables. Type hinting is regularly used but it does not force any compilation issues, thus can be ignored. Conditional statements can be used to write exceptions or handle undesired types or returns at runtime, but this may unnecessarily bloat the code.

mypy is a package designed to address this issue during development. Function arguments and return types can be added in standard fashion. We then run a script from a terminal using mypy script.py(as opposed to python script.py) — automatically checking that types match their respective expectations.

For example, the code below compiles without error, yet the function is erroneously passed a string where an int is expected.

def func(arg_1:int, arg_2:list[int]) -> tuple:
return arg_1, arg_2

func('12', [1,2,3])

During development, one could run the script with mypy script.py, catching the error:

Generators

Generators use “lazy execution” to avoid executing large iterables until they are desired. The all-too-familiar, range() function is an example of a generator. That is, it does not compute the full range of items and store them in memory, but rather stores the function itself (with ‘state’) and can then generate the next step in the sequence as required.

This can be seen by viewing the size (bytes) of two range() instantiations: the first containing a single element, the second containing 1 million elements. Compare the size of the generate r with the size of the corresponding list l.

import sys

# range items (generators)
r1 = range(1)
r2 = range(1000000)

# convert to list
l1 = list(r1)
l2 = list(r2)

def ptr(obj, name):
print(f'object:{name} is {sys.getsizeof(obj):>7} bytes.')

ptr(r1, 'r1')
ptr(r2, 'r2')
ptr(l1, 'l1')
ptr(l2, 'l2')
r1 and r2 are exactly the same size as only the generator is stored in memory. L2 is, however, MUCH larger than L1 because the entire iterable is stored in memory.

In many cases, it may be advantageous to write your own generator. This can be achieved with the yeild statement in python. Generators work into perpetuity, making infinite sequences possible. For example, here is a generator that returns the powers of x into perpetuity.

def power_x(x):
y = 0
while True:
yield x**y
y += 1

value = power_x(4)
print(next(value))
print(next(value))
print(next(value))
print(next(value))
print(next(value))

Argument Parsing

We often want to execute a script from the command line and we may wish to include arguments. sys.argv returns a list of arguments passed to a script (where sys.argv[0] is the filename).

Suppose we wish to use optional arguments and not a simple list of positional arguments. The built-in getopt package can be used to include optional arguments. For example:

import getopt
import sys

opts, args = getopt.getopt(
sys.argv[1:],
'f:m:',
['filename','message']
)

print('opt: ', opts)
print('args: ', args)
f:” means that an optional argument “f” takes a parameter input (marked by the colon). We may omit the colon if the argument does not take a parameter.

Decorators

As the name implies, decorators are used to decorate functions (add additional functionality). They take the form:

def decorator(function):
def wrapper(*args, **kwargs):
# add decorations
rt = function(*args, **kwargs)
return rt
return wrapper

@decorator
def function(x):
# some function
pass

Where some additional logic can be contained in the decorator.wrapper() function. The @decorator line above the function is equivalent to calling:

decorator(function)(*args, **kwargs)

For example, we may build a decorator that times a function:

import time

def timer_decorator(function):
def wrapper(*args, **kwargs):
start = time.time()
rt = function(*args, **kwargs)
end = time.time()
print(f'function: {function.__name__} was executed in {end-start} seconds.')
return rt
return wrapper


@timer_decorator
def funct(x):
a = 1
for i in range(1,x):
a *= i
return a

funct(10000)

Class Behaviour

Inheritance

Inheritance (Included for completeness) allows a child class to inherit all the properties of a parent class. For example, a Ferrari class may have all the properties of a carclass, but also contain individual properties.

# Inheretence
class Car:
def __init__(self, type, age) -> None:
self.type = type
self.age = age

class Ferrari(Car):
def __init__(self, type, age, speed) -> None:
super(Ferrari, self).__init__(type, age)
self.speed = speed

Class Variables

The purpose of OOP programming is to couple data and operations into objects (classes). A particular class instance contains specific data that defines the instance. It is also possible to associate data with the class level object. We may wish to keep track of certain data with respect to all instances of a class type.

class Base:
count = 0
def __init__(self) -> None:
Base.count += 1

[Base() for _ in range(5)]
print('Instances: ', Base.count)
Data is associated with a class, NOT the instance (in this case the count variable). Each time the Base class is instantiated count is increased (note we call “Base.count += 1” and not “self.count += 1” as this would localise the variable count to the particular instance.

Static Methods: the same principle can be applied to methods by defining static methods. These are methods that can be classed on the class directly without instantiation.

class Base:

@staticmethod
def mymethod():
# some function
pass

Base.mymethod()

Dunders

Dunder (double underscore) methods define built in behaviour. Standard behaviour can be overridden by including dunder methods in a class definition. The constructor (__init__), destructor (__del__) & string type cast (__str__) are common examples of this.

Importantly, if you want to access the default dunder behaviour, super().__dunder__() can be used. Another under-utilised dunder is the __call__ method, which makes an object callable. More concretely,

class Example:
def __str__(self) -> str:
txt = super().__str__()
txt += ' - added to text...'
return txt

def __call__(self, *args, **kwds):
print(f'\n{self.__class__.__name__} was called!')

ex = Example()
print(ex)
ex()

Operator Overloading

A special case to dunders, operator Overloading allow us to define how objects interact with operators (+,-,*,÷). Many forms of programming (including my native data science) are mathematic in nature. Overridding default operator behaviour can be used to specify how class objects should interact. Consider the + operator (modified with the __add__ dunder):

class Vector:
def __init__(self,x,y) -> None:
self.x, self.y = x, y

def __add__(self, other):
return Vector(self.x+other.x, self.y+other.y)

def __str__(self) -> str:
return ' x:{}, y:{}'.format(self.x, self.y)

v1 = Vector(1,2)
v2 = Vector(5,4)
v3 = v1 + v2
print(v1)
print(v2)
print(v3)

Encapsulation

Sometimes we want to obsure or limit some class information. Suppose we have a class Person with a property age . The double understore naming convention makes a variable private:

class Person:
def __init__(self, age) -> None:
self.__age = age

As a result, we cannot change the age variable at a later stage:

p1 = Person()
p1.__age

We may add a property to the class that allows us to interact with this variable.

@property
def Age(self):
return self.__age

We can then add a Property.setter method (using the same method name) to update the property. Putting it all together yields:


class Person:
def __init__(self, age) -> None:
self.__age = age

@property
def Age(self):
return self.__age

@Age.setter
def Age(self, value):
if value <= 0:
return
else:
self.__age = value


p1 = Person(20)
print(p1.Age)

p1.Age = -2342
print(p1.Age)

p1.Age = 2342
print(p1.Age)

Notice that we prohibited age from being set to a negative value. By this way we can control and restrict class properties.

Abstract Classes

Abstract classes — defined as classes that cannot be instantiated — define a blueprints for building consistent classes. Suppose we want to define a set of classes, that all implement the same set of methods. All subclasses can inherit from an abstract class. That way if any subclass fails to implement a required method an exception will be raised.

from abc import ABCMeta, abstractstaticmethod
class IAbstractClass(metaclass=ABCMeta):    @abstractstaticmethod
def method_01():
pass
class child_class(IAbstractClass): @staticmethod
def method_01():
print('required method implemented!')

If method_01() was not implemented in the child_class, a compilation exception would be raised.

To improve system reliability, abstract classes are often used when defining design patterns.

Introduction to Design Patterns

Good software engineering is about designing reliable systems at runtime. This requires thoughtful design decisions on both the architectural and code level.

Although by no means extensive, here are some (Gang of 4) software design patterns to improve your code. Read more about design patterns on Refactoring.Guru.

Singleton

As the name suggests, the Singleton design pattern adds a restriction to a class object such that it may only be instantiated once. Wonderfully articulated by Refactoring.Guru this may be desired to ensure a variable is not overwritten or limit access to shared resources.

The key is the Singleton class variable: __instance. The first time a singleton class is instantiated it is set to self. That way Singleton is initialised for a second time, an exception will be raised.

from abc import ABCMeta, abstractstaticmethod

class IAbstractClass(metaclass=ABCMeta):

@abstractstaticmethod
def print_data():
pass

class Singleton(IAbstractClass):
__instance = None

def __init__(self) -> None:
if Singleton.__instance is not None:
raise Exception('Singleton cannot be instantiated more than once.')
super().__init__()
Singleton.__instance = self

@staticmethod
def get_instance():
if Singleton.__instance is not None:
return Singleton.__instance
return 'No instance of Singleton.'

@staticmethod
def print_data():
print(f'{Singleton.__instance}')

Factory

Suppose we want to delay the categorisation of a class until more information (some calculation, data or user input) is complete. The factory method delays specifying the class definition, using a Factory class to generate a sub-classes.

When implementing a factory design pattern, we implement a Factory class that builds the desired subclass given some data. Following the abstract class methodology, a factory design pattern may be implemented as:

from abc import ABCMeta, abstractstaticmethod

class IAbstractClass(metaclass=ABCMeta):
@abstractstaticmethod
def print_class_type():
"""Interface Method"""
pass

class subclass_01(IAbstractClass):
def __init__(self) -> None:
super().__init__()
def print_class_type(self):
print('Class 01')

class subclass_02(IAbstractClass):
def __init__(self) -> None:
super().__init__()
def print_class_type(self):
print('Class 02')

class Factory:
@staticmethod
def build_subclass(class_type):
if class_type == 'a':
return subclass_01()
return subclass_02()

if __name__ == '__main__':
choice = input('Type of class?\n')
person = Factory.build_subclass(choice)
person.print_class_type()

Proxy

Much like decorators, the proxy design pattern wraps an additional layer of functionality around a class. Likely to increase modularity or delay the execution of the core class until more resources are available. A proxy is a middle-man that handles the implementation of the core class.

from abc import ABCMeta, abstractstaticmethod
class IAbstractClass(metaclass=ABCMeta):

@abstractstaticmethod
def class_method():
"""Implement in class."""

class CoreClass(IAbstractClass):
def class_method(self):
print('Core class executed...')

class ProxyClass(IAbstractClass):
def __init__(self) -> None:
super().__init__()
self.core = CoreClass()
def class_method(self):
print('Proxy class executed...')
self.core.class_method()

Composit

As per intuition, the composite class is simply a collection of subclasses that form a larger class. Naturally fitting into much of how the corporate world is organised, it is intuitive and straightforward to implement.


from abc import ABCMeta, abstractmethod, abstractstaticmethod

class IAbstractModule(metaclass=ABCMeta):
@abstractmethod
def __init__(self, nodes) -> None:
"""Implement in child class"""

@abstractstaticmethod
def print_data():
"""Implement in child class"""

class child_01(IAbstractModule):
def __init__(self, nodes) -> None:
self.nodes = nodes

def print_data(self):
print('child 01 has {} nodes.'.format(self.nodes))

class child_02(IAbstractModule):
def __init__(self, nodes) -> None:
self.nodes = nodes

def print_data(self):
print('child 02 has {} nodes.'.format(self.nodes))

class ParentClass(IAbstractModule):
def __init__(self, nodes) -> None:
# super().__init__(nodes)
self.nodes = nodes
self.base_nodes = nodes
self.children = []

def add_child(self, child):
self.children.append(child)
self.nodes += child.nodes

def print_data(self):
s = '\n'+'...'*10; print(s)
print('Core nodes: {}'.format(self.base_nodes))
for chd in self.children:
chd.print_data()
print('Total nodes: {}'.format(self.nodes))
s = '...'*10+'\n'; print(s)

# implement
c1 = child_01(21)
c2 = child_02(43)
p1 = ParentClass(6)
p1.add_child(c1)
p1.add_child(c2)
p1.print_data()

Fin 👋.

--

--