Don't Repeat Yourself | Science Code

A recent pull request I made was supposed to just add a small function to a larger codebase. But, while poking around the code, I ran into egregiously non-DRY bits. Being who I am¹, I refactored. Today’s rant is about DRY coding. I’m aware this has been done to death on the internet, but, well, it’s the first thing I thought to write.

What is DRY code?

DRY stands for Don’t Repeat Yourself. Its antithesis WET² stands for Write Everything Twice. Basically, have bespoke code appear only once. Some examples, from the refactoring I just did:

Two functions had the exact same set-up and clean-up with slightly different inputs. I factored out the set-up/clean-up into separate helper functions (this involved literally copy-pasting the code from one of the original functions) and then simply called the helpers from each function.
Many snippets calculated the length represented by a struct by performing the same set of bit operations. I factored out a helper .length() (copy-paste). Then I used .length() in place of all the identical bit operations.
Nucleotides were encoded as 4-bit numbers. (ACGT, but also a gap character and the ambiguous nucleotides.) These numbers were simply copied wherever needed, with no true central table. I created an enum NucCode to explicitly tie each nucleotide to its code, and then used the NucCode enum wherever nucleotide codes were used to check the underlying nucleotides.

Why DRY?³

Hopefully my examples above had you nodding your head saying, yeah, that all makes sense. Here are some benefits I’ve personally had from DRY code:

I only have to change the logic in one place. If I’m improving logic, fixing a bug, or just adding a new check, I only have to update one bit of code.
I get less weird errors. When I have copy-pasted code, I’ll think I updated it everywhere, or I’ll think that the code here works similar to the code there, and then subsequently be very confused when those assumptions are proved false. If I scan ten bitshifts throughout the file, my eye may skip over the one different one. If have all those bitshifts instead call a central function, everything is fine and dandy.
My code becomes more readable. Instead of e.g.
```
  // Nucleotide length is stored as <complicated bespoke bit math>
  int length = <complicated bespoke bit operation>
```
(which is admittedly better documented than the code I usually find), I get
```
  int length = curMut.length();
```
The improvement should be obvious here. It’s similar to the readability gain from factoring out helper functions: hide the logic elsewhere, just declare your goal and trust that it happens.
I have a central thing to show others. For example, if I had to explain those nucleotide codes, I much prefer passing over the enum instead of the hardcoded conversion functions which I started with.

How to DRY your code

So I’ve convinced you, or you were already convinced. Great. Here is the simple method I used to DRY code:

Look at code. Think about code. What is done multiple times? Choose a thing.
Write a well-documented, well-named helper which does the thing.
Use your helper everywhere the thing is done.

Congrats! Your code is one bit DRY-er. Repeat until readable enough.

¹ what kind of person do you think would write a blog about coding best practices?
² hah hah aren’t the coders funny
³ I couldn’t resist (this is why I shouldn’t allow myself to have footnotes)

Tags: last-pass process style

What is DRY code?

Why DRY?3

How to DRY your code

Why DRY?³