Background

This is my first openly controversion post, as far as I know. I have hestiated to post about it, or to tell my supervisors about it, because on some level, it is embarrassing. This came about when I first went about tring to write some unittest some argument handling for a series of python modules I wrote. I found that the most popular command-line parsing package, argparse, uses a particular type of structure to store what gets set from the commandline: Namespaces. Importantly, argparse is in the standard library.

Now, having read the Zen of Python fairly recently, one line rang a bell: “Namespaces are one honking great idea – let’s do more of those!”. “Ok!”, I thought, and decided to use more of this data structure. I soon learned that what Tim Peters meant by namespaces had nothing to do with this, but at that point, I failed to understand the diference between “_N_amespace” and “_n_amespace”.

In the beginning

All the stories that have villians have a backstory as to what drove someone to the evil they now practitioned. So here is mine. For silly little things, the easiest way to store them was a variable. Say I had a couple of accessions to track.

accessions = ["AP001", "AP002"]

Then, the script would get bigger and bigger. Soon, I realized I needed to also store a file path to the accession. A perfect time to use a dict.

accessions_dict = {"AP001": "~/accessions/AP001.fasta",
                   "AP002": "~/accessions/AP001.fasta"}

But then, I the scope would increase, and I would need to knwo the lenght of the record that the accession refered to. I could use a list as the dict value.

accessions_dict = {"AP001": ["~/accessions/AP001.fasta", 580],
                   "AP002": ["~/accessions/AP001.fasta", 444]}

And so on. This is a contrived example, but we all know that things tend to increase with complexity as time goes on.

What I needed was a very flexible object. Something that I could whip up without too much difficulty, that didn’t have a ton of computational overhead. Something that was ideally in the standard library.

So there are two options: write a class for such an object, or keep on fumbling with simple types. But is there a door number 3?

Door #3

Argparse’s namespaces. they are easy to whip up (no methods), built in, and make for pretty syntax. Here’s an example, in a function for a project that needed to keep track of the attributes of a sequencing experiment:

from Argparse import Namespace
seq_params =  Namespace(lib_type="paired", encoding="sanger",
	                    read_length_mean=100,
                     read_length_stddev=7.5)

Heres the nice part – you already know how to deal with this type of object, cause you use argparse! you can use the handy dot notation to access elements

What makes the argparse namespace so useful?

  1. easy to access the values stored
  2. flexibility in types
  3. (safer) mutability – you can change things, but you can’t define an element that doens’t exist on the fly, only on creation of the object

They are particulary useful becasue in a normal script, you are usually only gonna have a single “args” object. This is an important caeat from the heracy I am about to preach.

It is filled with quirks. I had a interesting (?) time yesterday messing with a cryptic error involving TypeError: cannot serialize '_io.TextIOWrapper, It was a similar issue to the onefound here. After 7 hours, I eventually found out that objects cannot be sent to multiprocessing. So, you cannot send a logger, or other unserializable object without subjecting yourself to a long (and in my case, fruitless) trip down the path of pickling objects. But other than that, this approach has led to cleaner code, fewer bugs, and less time spent defining __repr__s and __init__s.

Thats all.