Posting this for easy future reference (job interviews and the like).
Some central principles to be aware of
1. lines of source code is an **awful** metric — Shorter and more compressed/minified code does not make the code “cleaner”. “Clean” code is code that a human “can get the jist of” within 15 minutes.
2. code is meant to be read — Everything we write as programmers is compiled into assembly during actual execution. Assembly is a nightmare to read. Higher level languages are, by design, meant to be much easier to read. I don’t obsess about condensing / minifying / inlining code. Good code is code that is easy to read.
3. white space is king — Have you ever been reading a really in-depth book and gotten to a page where there are absolutely no paragraph breaks at all (i’m looking at you glass bead game). My brain almost lets out an audible sigh when I get to pages which are just walls of text. Humans like white space. It breaks up the information into manageable chunks for our eyes/brains.
4. up/down is faster than left/right — Humans scan through ‘documents’ fastest when our eyes do not have to fully read each line left-to-right. You can recognise parts of a file based on the ‘horizontal shape’ and a few key variable names. This is always faster than fully reading a line left-right.
5. lowest level dependency at the top of the file — if there is a function definition at the top of a file I’ve written, it’s used by multiple functions / classes in the rest of the file. it’s probably important you see this thing as you’ll be seeing it in a few different places. so, it goes at the top. the first time you open the file you know where the definition for this very common function is.
6. document everything, assume nothing — doc strings, doc strings, doc strings. other people weren’t there when I wrote a function. They don’t know why it was written that way. code should ideally be self-documenting in the sense of the how it runs. the why it was written that way can only ever be captured by comments. so I add comments and doc strings.
7. imperative / procedural functional — when writing code, we are usually applying functions to data in some specific order. Prefer map and/or filter operations to “do stuff” to “things”. This tends to make the “doing stuff” code smaller and more independent, which makes it easier to unit test! You can also get parallelism for (almost) free via `multiprocessing.Pool`.
8. avoid list comprehensions, for loops or while loops — These are blocking calls. i prefer the lazily evaluated map/filter operations mentioned above. If necessary, I write generator functions (co-routines) to prepare data for a map operation. generator functions are lazily evaluated too
9. class inheritance — python is not java. leave the misery and pain of class inheritance to java people.
Here’s the steps I use to write a script
1. Boilerplate – __main__, shbang and main function.
2. Create a logging set-up, decide on whether context managed loggers or decorator loggers are more appropriate.
3. Start writing things under the main function
4. Keep writing under main until patterns appear in the code (doing the same thing three times etc)
5. Create functions for these patterns. Start with a debug logging call explaining what this function is about to do, then paste the code, then copy paste the debugging line and change it to say it’s completed.
6. Rinse and repeat 4 and 5.
7. Undo a lot of the rinsing and repeating from 4 and 5 when I realise I’ve over complicated parts of it. No, we don’t need a config class with classmethods.
8. Fix a completely unrelated customer vps instance failure.
9. Write an email to customer(s) about the unrelated vps instance failure, write an internal slack message about the failure, revert the dodgy feature that someone managed to get through the CI process without CI failing.
10. Realise that half of the functions for this script belong in their own module. Create a new module. Realise that the other half belong in another module. Create the other new module. Put the commonly used utility functions into a `common.py` module.
11. Read about a new framework, think about rewriting the entire codebase to use the new framework. Read the documentation for a while. It does look interesting…
12. Release the code from 10. Frameworks are never worth it.
Some key ‘building block’ elements to be aware of
"""
Comments
"""
# inline code comment for something.
# whitespace is king -- new lines before/after makes it easier to read.
# prefer to have full sentences on their own line where possible,
# splitting the right hand side of a comment line at natural punctuation.
variable_name = do_something(
some_value,
)
# =====================================================================
# Block comments
# =====================================================================
#
# These are useful when a long winded explanation is required in the
# middle of the codebase, e.g. when something needs a esoteric approach
# or 'looks weird' at first glance. These are the *WHY* documentation
# comments.
#
# NOTE: @dijksterhuis: newline before and after comment blocks makes
# code easier to read (whitespace is king).
#
# i also tag comments with `NOTE: @author` to make TODO/FIXME/HACK
# lines both easy to grep and also to provide attribution without
# relying on `git blame`
#
# =====================================================================
#######################################################################
# Section -- below this line is x, y, z.
#######################################################################
# when these ^ section comments start appearing, it is time to refactor.
# the file has so much code that different sections need to be explicitly
# highlighted --> move sections to their own modules!
"""
Consistent multi-line hangs
"""
# multiple imports from the same module spread over multiple lines
# this makes it easier to temporarily disable/enable imports by adding
# a comment character at the start of the relevant line
from some_package.module import (
some_function,
another_function,
)
# example commenting out a specific import and replacing it with another
from some_package.module import (
some_function,
# another_function,
yet_another_function_v2,
)
# this applies to normal function calls as well
_logging_level = os.environ.get(
"LOG_LEVEL",
default="INFO",
)
_logging_level = os.environ.get(
"LOG_LEVEL",
# default="INFO",
default="SOME_SPECIAL_DEBUG_VALUE_I_JUST_ADDED_LOCALLY",
)
# it also applies to function calls that only have one argument.
# next week they might have a second argument ;)
variable_name = do_something(
some_value,
)
variable_name = do_something(
some_value,
kwarg_because_edge_case_bug_we_didnt_know_about=True,
another_kwarg_because_we_also_forgot_defaults=72,
)
# consistency in the code style is more important because it helps
# make the code easier to read. our brains don't have to constantly
# switch between one-line and multi-line styles constantly
# there are exceptions to this rule, such as:
# - one-liner if / for statements
# - generator function execution
def gen_fnc(xs):
if len(xs) == 0: return False
for x in xs: yield x[0]
list(gen_fnc([1, 2, 3]))
"""
try-except-else in function definitions
"""
def read_a_file_as_str(file_path: Path) -> str:
"""
1. white space is king.
new lines before and after the try-except-else blocks.
the exception to this rule is a function like `write_over_existing_file`
in the example script below. adding white space before and after makes
that function *harder* to read quickly because there's already a lot of
white space! too much white space can be a bad thing!
2. return in the `else` clause and **NOT** in the `try` clause.
the last thing we do is return the value from the function.
the return statement belongs at the END of the function definition,
not at the start.
3. Always account for unknown exceptions.
I am not a god. I am not able to account for every single way
this function will fail in the future.
Unknown exception handling is saying this to future developers:
> something happened here that I, the original developer, did not
> foresee when i first wrote this. please look into it!
"""
try:
with open(file_path, "w+") as f:
data = f.read()
except FileNotFoundError as err:
logger.exception(
f"Could not find file to read: path={file_path}",
exc_info=err
)
raise err
except Exception as err:
logger.exception(
f"An unknown error occurred! Please investigate this!",
exc_info=err
)
raise err
else:
return data
Here is an example script that does not very much of anything.
#!/usr/bin/env python3
"""
This script is to demonstrate my natural python coding style.
It has no utility in the real world.
Always add a module header docstring for autogenerated documentation
with sphinx. Often a PITA to set up, but useful if someone uses the
codebase downstream.
"""
import os
import logging
import itertools
from dijksterhuis.logging.loggers import (
initialise_global_logger,
logger_decorator,
)
_logging_level = os.environ.get(
"LOG_LEVEL",
default="INFO",
)
logger = initialise_global_logger(
logging_level=_logging_level,
)
def gen_take_first_elem_from_nested_arr(xs):
"""
Grab the first item of a nested array
NOTE: generator / co-routine functions are always prefixed with gen_*
"""
for x in xs: yield x[0]
def read_some_file(file_path: str) -> (str, str):
"""
Docstrings are important because it will explain non-code things
to other human beings.
NOTE: @dijksterhuis: this function does not have a logging decorator
to demonstrate the kinds of logging lines i like to see. loglines
are meant to be read by humans when searching for problems.
the goal is making it easy for a human to understand the current
application / script state in plain english
"""
logger.debug(f"Attempting to read some file with path: {file_path}")
try:
with open(file_path, "r") as f:
file_contents = f.read()
except FileNotFoundError as err:
logger.exception(
f"Could not find file to read with path {file_path}",
exc_info=err
)
raise err
except Exception as err:
logger.exception(
f"An unknown exception occurred for file path: {file_path}",
exc_info=err
)
raise err
else:
logger.info(f"Read some file with path: {file_path}")
return (file_path, file_contents)
@logger_decorator(
logger,
"Make file contents happy instead of sad.",
info_level=logging.INFO,
timer_enabled=True,
)
def make_file_happy(file_path: str, file_contents: str) -> (str, str):
"""
Make the file happy instead of sad.
NOTE 1: @dijksterhuis: this function will be called with a starmap
NOTE 2: @dijksterhuis: this isn't expected to fail unless inputs to
the function are of the wrong type, so no exception handling
until it becomes clear we need it
"""
happy_file_contents = file_contents.replace("sad", "happy")
# return modified inputs -- makes later map functions easier
return file_path, happy_file_contents
@logger_decorator(
logger,
"Write over existing file contents.",
info_level=logging.DEBUG
)
def write_over_existing_file(file_path: str, file_contents: str) -> True:
"""
Write over the file, making it happy.
NOTE: @dijksterhuis: this function will be called with a starmap,
but will be called at the end of the graph. so we can
return a bool instead of passing on the input arguments.
"""
try:
with open(file_path, "w+") as f:
f.write(file_contents)
except Exception as err:
raise err
else:
return True
def main():
original_file_paths_to_contents_mappings = [
("./sad_file_1.txt", "sad"),
("./sad_file_2.txt", "sad"),
("./happy_file.txt", "happy"),
]
# execute this now as will not be part of the lazy-eval graph execution.
# NOTE: @dijksterhuis: lazy-eval op execution always have a `list` call
# or some such wrapping them, which makes them easier to spot when
# scanning through a long file. just look for the list(stuff) blocks.
_ = list(
itertools.starmap(
write_over_existing_file,
original_file_paths_to_contents_mappings,
)
)
logger.info(
f"Created test files:"
f" nFiles={len(original_file_paths_to_contents_mappings)}"
)
# lazy eval
# generator arg split over multiple lines to break up horizontal shape
# --> makes the extra processing of input data arguments easier to spot
# and follow logic vs one-liners like this:
# `map(some_function_name, gen_fnc_with_long_name(xs))`
paths_to_contents_mappings = map(
read_some_file,
gen_take_first_elem_from_nested_arr(
original_file_paths_to_contents_mappings,
),
)
logger.info(
f"Created map op: Read files."
)
# lazy eval
# lambda expressions are fine when not
# - code is not complex
# - performance is not a major concern
# - we aren't using a multiprocessing.Pool (yet)
sad_paths_to_contents_mappings = filter(
lambda x: "sad" in x[1],
paths_to_contents_mappings,
)
logger.info(
f"Created filter op: Filtered sad files."
)
# lazy eval
happified_paths_to_contents_mappings = itertools.starmap(
make_file_happy,
sad_paths_to_contents_mappings,
)
logger.info(
f"Created map op: Make files happy instead of sad."
)
# executes due to list() call
overwritten_file_paths = list(
itertools.starmap(
write_over_existing_file,
happified_paths_to_contents_mappings,
)
)
logger.info(
f"Executed graph and wrote files:"
f" nFiles={len(overwritten_file_paths)}"
f" filePaths={overwritten_file_paths}"
)
if __name__ == '__main__':
main()
Here are the logging outputs when running it
$ python3 ./test.py
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> main ==> INFO ==> Created test files: nFiles=3
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> main ==> INFO ==> Created map op: Read files.
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> main ==> INFO ==> Created map op: Filtered sad files.
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> main ==> INFO ==> Created map op: Make files happy instead of sad.
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> read_some_file ==> INFO ==> Read some file with path: ./sad_file_1.txt
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> read_some_file ==> INFO ==> Read some file with path: ./sad_file_2.txt
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> read_some_file ==> INFO ==> Read some file with path: ./happy_file.txt
2022-08-29 23:57:41,766 ==> root ==> 1282313 MainProcess ==> decorated_logging ==> INFO ==> Started ==> Make file contents happy instead of sad.
2022-08-29 23:57:41,767 ==> root ==> 1282313 MainProcess ==> decorated_logging ==> INFO ==> Finished ==> Make file contents happy instead of sad.
2022-08-29 23:57:41,767 ==> root ==> 1282313 MainProcess ==> decorated_logging ==> INFO ==> Time ==> 0:00:00.000016
2022-08-29 23:57:41,767 ==> root ==> 1282313 MainProcess ==> decorated_logging ==> INFO ==> Started ==> Make file contents happy instead of sad.
2022-08-29 23:57:41,767 ==> root ==> 1282313 MainProcess ==> decorated_logging ==> INFO ==> Finished ==> Make file contents happy instead of sad.
2022-08-29 23:57:41,767 ==> root ==> 1282313 MainProcess ==> decorated_logging ==> INFO ==> Time ==> 0:00:00.000014
2022-08-29 23:57:41,774 ==> root ==> 1282313 MainProcess ==> main ==> INFO ==> Executed graph and wrote files: nFiles=2 filePaths=[('./sad_file_1.txt', 'happy'), ('./sad_file_2.txt', 'happy')]