Parameters implying other parameters

Some options are meant to be used together. This might be bidirectional (e.g. both a sample name and a data file) or it might be one-way (e.g. an alignment file needs a genome file, but not the other way around). In either case, using a specific optional argument requires the use of another “optional” argument.

Documenting implications

Wherever the arguments are documented, be clear that using one implies another. For example, from vg giraffe:

  --track-provenance            track how internal intermediate alignment candidates were arrived at
  --track-correctness           track if internal intermediate alignment candidates are correct (implies --track-provenance)
  --track-position              coarsely track linear reference positions of good intermediate alignment candidates (implies --track-provenance)" 

This removes surprises. I know that if I pass --track-correctness then the --track-provenance functionality will occur. Alternatively, this information could go in a docstring:

def filter(alignments, min_mapping_quality, chromosome=None, position=None):
  """Filter alignments for a minimum mapping quality score.

  [...]

  chromosome : str, optional
    Only use alignments to this chromosome.
  position : tuple, optional
    Only use alignments within this range of positions
    (requires "chromosome" to be set)
  """

Here, instead of another option being applied automatically, the code requires chromosome to be manually set alongside position.

Incorrect usage

But what happens if the user doesn’t follow these guidelines? Either they disregarded the instructions, or there weren’t any warnings at all. What now?

Best option? A program failure. Ideally with an error message such as “the position argument requires chromosome to be set”. By getting this kind of useful error message, the user can easily fix their code.

That last paragraph seems sort of obvious, right? But it’s not always done. More times than I care to count, I’ve had to debug errors that came from my misuse of arguments which depended on each other, where my error messages were unhelpful at best. Or the program worked but the output wasn’t what I expected.

As a practical example, take another tool from vg, the transcriptome graph maker vg rna. It takes a graph and a list of gene annotations, then modifies the graph to add edges connecting exons. There’s an option to “project” these annotations onto haplotypes other than the one in the annotation file. Then there is a separate option to save those annotations in the graph. These options are rather clearly meant to be used together. Using the former but not the latter is almost certainly a user error. Yet, doing so causes no error. It makes the tool do a bunch of computations to project annotations, but then throws away all of that work, since the user didn’t ask to save it in the graph.

Don’t be like that. Warn your user about what they’re doing.

Why is this hard?

In a deviation from my normal posts where I just recommend what to do, here I’ll speculate on why this recommendation is even required.

Put simply, cataloguing your own assumptions is hard. You know how this code is intended to be used. You wouldn’t make these sort of overlooking errors. You’d never invoke that function with only one of the paired arguments. The issue is that users don’t know what is obviously wrong. Even if the issue is obvious from a single glance by the developer, ideally the end user would fix their own issue before it ever got to that state.

So: idiot-test your stuff. Think through all the wrong ways it could be used. Even the idiotic ones. Especially the idiotic ones. Then make sure there are useful error messages where needed. Help your users help themselves. Everyone messes up. The key is knowing when you have.

Tags: parameters errors