Software Development as an Exercise in Archaeology

09-22-21 Eli Brown

Understanding a legacy codebase is very complex. Learn how you can work as efficiently as possible in these codebases and even leave the project better than you found it.

Having to dissect and analyze someone else’s work in order to do your own is a pretty common situation in software development. You may be implementing a feature in a complex piece of software where documentation is sparse or nonexistent. Maybe the original authors are AWOL, or there are so many authors that no one knows exactly how anything works. Maybe half the codebase is written in a language you don’t know. And just like that, you’ve become an archaeologist.

Since these situations are almost unavoidable—especially if you often have to touch other people’s code—how should you conduct yourself in order to work as efficiently as possible? And perhaps more importantly: how can you leave the project better than you found it?

Figuring out Where to Start

The beginning is likely the hardest part of the entire process. If you’re adding features to preexisting software, you already have your piece of the puzzle—you just need to find where your piece fits. A good thing to do initially is make sure everything is running correctly on your machine. Build the software and see if it completes and runs correctly. Run the test suites and make sure they pass. These are just some cursory sanity checks, but they’re essential to the integrity of the foundation you’re going to work on.

Then, you need to think a little like an art conservator: ensure that anything you do is reversible and fits seamlessly into the codebase. There’s a reason why conservators use tools and paints that are created to be reversible, and it’s the same reason that you should have some version control available—to prevent irreversible damage. Ensure your work environment is clean and well-prepared. Once this is established, you can take your feature spec and start looking for an entry point.

Here’s a fairly mundane situation: you’re working on a rest-type API running on a server for a social media site. Your job is to add a feature to the social media site that will allow users to edit their posts after they’ve already been posted. There’s no documentation on how the API actually works—all you have to work with is the raw existing code. But if you’ve gotten this far, it’s likely that you already have one piece of information that you need: how rest APIs work. You already know how a rest API typically is set up, how they’re meant to work, etc. So one way to proceed at this point is to search the codebase until you find something that looks like an endpoint definition. In this example, you’re probably looking for something that appears to be a group of API endpoints. Any IDE or text editor worth its salt these days will likely include some kind of search feature that you can use. In Atom, my text editor of choice, I’d just use the “find in project” tool to search for a string within all the files in a folder. You can apply the same concept to any work.

Now, if your sanity checks from earlier were successful and you have the API running on your machine, you have another way of finding the right place. If you note the output of a certain existing function call on the machine where the software is running, then you can search for something like that within the files. That way, you can trace the source of the output back to the method that produced it.

Regex searching comes in handy in situations like these, because using regex will allow you to search for function outputs that have a certain “shape.” One of the main factors to consider while finding your angle at this step is time. Using debuggers, printing to console, or analyzing program output in some other way might be the best course to take. However, if you can guess the general location of your target, or if the codebase is relatively small, you might be able to just visually scan files based on their name and location until you see something that resembles what you want.

Finally, make sure to log code output along the way. Even little things like console.log('I was here') can be useful when trying to determine the flow of a program. Leaving a trail of breadcrumbs for yourself can be an indispensable tool in situations where the path of code execution isn’t immediately clear.

Finding What You Need

Let’s say those previous steps weren’t enough to fully complete the work. At this stage, you have the API running, you can orient yourself in the codebase, and you understand how the pieces should fit together. You can write a document explaining the process so another developer could understand it. But it seems you’re missing some critical detail. You think you’ve done everything that should be necessary, but your feature addition or fix just isn’t working the way it’s supposed to. If you’re completely lost, faced with inscrutable error messages and failing tests, your best course of action is usually to find someone closer to the metal, so to speak.

If this is client software that you’re working on, try contacting one of their staff developers, or someone who might know people who worked on it before. At this point in your journey, you have all the information you need to pose an intelligent, well-thought-out question. So just bother people (within reason) until you find who and what you need. If no one within reach knows the answer to your question, you may have to take it online to a forum or something similar.

Improving the State of Documentation

During and after the implementation, you should leave behind good documentation. This means writing plenty of comments (where needed) and putting more detailed information in documents separate from the code, such as in a README file. What’s the difference between a mysterious ancient archaeological artifact and a museum exhibit anyway, if not a piece of writing describing what it is?

Documentation doesn’t have to be particularly long or complex, but if you can describe the project in a way that would have helped you do what you did, then it will inevitably help someone who comes after you. The main goal is to leave the project in a better overall state than the one you found it in. You can just write a brief page or so of documentation explaining what the code does so that someone who’s never seen it before could jump right in. Of course, it doesn’t hurt to write something describing what you did so that the process can be expedited in the future.

Staying Flexible

The most important thing to keep in mind during the archaeology process is this: avoid making assumptions about what the original creators were thinking or what they meant to do. Although instinct and gut feelings have their place, stick to objective analysis to help you focus on the task at hand. A piece of code that you might think has a particular obvious purpose might be doing something completely different. Try to stay mentally flexible and be meticulous in your analysis—you never know what small thing you might miss just by guessing you already know what’s going on. And of course, this process takes practice. As you do this over and over again, you’ll become faster and more accurate, and every codebase you touch will see certain improvement.