Performance in our applications has, and always will be, an important, highly discussed topic. We read and hear about countless Medium articles, books, talks, and soap boxes about it. So why spend time reading yet another article on performance?
This article isn’t meant to add another way to fix your problems. Instead, it provides a practical outline for addressing issues mid-project. Sometimes we find issues that just can’t wait until after release (which is really just code for probably never). Let’s solve those problems now!
Premise: The Project
Working with Seer, we created a tool that helps companies chip away at the tons of small, unqualified clicks that hide and eat up a pay-per-click budget. The tool does this by evaluating a company’s spending and providing an actionable report to help them reduce it. Users upload CSV files of the search terms they are currently spending money on from Google AdWords or their other services. These files on average have over 15,000 rows of search keywords and sometimes up to 100,000. Architecturally, we expected a high volume of requests for the calculation as well as for managing temporary uploads, so we pushed off the work to use a job queue with Delayed Job.
With this project as our baseline, let’s dive into identifying when we need to refactor, how we can set ourselves up for success, and how this shift brings clear value to our client and architecture.
Take a Step Back and Dream of Solutions
As a project begins to take shape, the decisions we make at the beginning start to either flourish or show pain points.
After discovering slow page load times, database responses, or general runtime slowing throughout development, we always want to take a step back. This enables us to mull over the “why” and dream up alternatives without the requirement to codify ideas.
For Seer, we initially took advantage of Rails’ ActiveRecord for modeling individual matching keyword phrases in our calculations. This approach gave us quick feedback and early on success in working towards a valid report, but the approach began to suffer in its response time. The original report algorithm made a few too many database calls. Database queries can be fast; however, at a high volume with network latency, along with the number of unique queries that ActiveRecord was constructing, it became clear it would be our slowest resource.
So we took a step back and came up with our new approach over lunch at the White Lotus (best burger in Dayton, period). Changing our location and walking around helped shift our mental perspectives in healthy ways.
Over lunch, we discussed how, at its core, our problem was that we were making multiple calls to external resources instead of sending one message to handle the entire process for us. The calculation of wasted keyword spend was a bounded context, or a unique model with its own set of responsibilities that only needed to have an API. In this article, we are using the formal definition of an API: a message contract between applications that fulfill a feature, not explicitly an HTTP request.
With the mindset that the calculation was not just a model in Rails but its own bounded context, we simplified our approach further by adopting part of the Unix philosophy of “Do One Thing and Do It Well.” We settled on building a Go executable to manage parsing, calculating, and reporting the keyword spend.
We chose Go for a few reasons, but primarily because of the following:
- We did not have to pull in a lot of dependencies to quickly move forward because Go is a “batteries included” language.
- We knew Go, as a compiled language, could perform the parsing and calculations faster than Ruby (even if we were to optimize our Ruby implementation).
- We knew how to write Go applications.
- We could provide clear reasoning and community proof our solution would benefit us.
So we have identified our performance problem, thought of a solution, and provided proof it is likely to work! As we come across similar pain points, we need to take time to step back and dream up solutions. We can’t allow any idea to be too far fetched; rather, we should carefully examine the risks and effort for each concept. We can also consider bringing in developers outside of the project to bring a new perspective and help discuss the problem in a simple, clear manner.
Set Up a Strategy Pattern
Armed with a potential solution to our performance bottleneck, we need to introduce another workflow for our application to complete a specific action. When refactoring features that have the same end result but with different internal approaches, we use the Strategy Pattern.
At its simplest, the Strategy Pattern allows our code to choose a specific entity to message and get an identical result.
Using the Strategy Pattern allows us to continue working on other parts of the application and to not wait for a refactor.
Similarly, if the deadline comes up, we are still able to ship the existing feature into production, and we will not risk shipping a broken experience or nothing at all.
Having a pattern in place to swap different functional pieces is great, but without a mechanism to “flip the switch,” it isn’t all that helpful. There are a lot of ways to calculate a Strategies state, but the most common is using a feature toggle, as we’ve described in the past. A feature toggle is a dynamic mechanism to enable or disable a feature, ideally without redeploying the entire application. We utilized feature toggles to “toggle” the Strategy between the Go and ActiveRecord instances.
The combination of the Strategy Pattern and Feature Toggles provides us a mechanism to ensure we provide an accurate experience, which at any given time could internally “flip” to using a different workflow that is more performant.
With this approach in place, we could confidently continue working in parallel, given the two classes provided the same result. Which begs the question, how can we always know we are accurately returning the same values between strategies?
Create Contract Tests
Refactoring often saves lines of code, reduces complexity, and increases the runtime performance. However, if we don’t also have a thorough set of tests to assert the expected output is maintained, we might actually be breaking the application in unknown ways.
Once we had a clear understanding of what a “correct” report should look like, we created integration tests to validate the contract between the caller and any Strategy.
These tests are higher up on the testing pyramid and do not replace the importance of unit testing the Strategies themselves. Instead, they augment the unit tests to provide us with early and continued confidence that the same inputs provide an expected output.
Similar to our general testing approach in Rails, we used Minitest to provide a set of tests to first ensure the ActiveRecordStrategy had the expected output. Then, we wrote the integration test using the Go Strategy, which failed for quite some time. This test-driven development approach helped us continually have a goal in mind. So when we finished the Go executable, we could see both strategies provide the expected report.
Testing while refactoring provides the necessary proof that we are making accurate changes, but the purpose of our changes was to impact the overall performance. How do we continually prove our work is worth pursuing to stakeholders?
Measure Performance Impact to Prove Your Value
Just as important as ensuring different strategies have the same output, we also need to continue to measure the performance difference. The purpose of our refactor isn’t for us to flex our super awesome programming muscles, but instead, it is to provide better business value.
Claims of “faster” or “more efficient” should only and ever be made with concrete evidence.
For Seer, we captured process completion time through logging, so we could flip our strategy type and manually test the process completion time for an identical set of data. This data allowed us to show value both internally to our team and externally to Seer as we continued refactoring. Maintaining transparency throughout the process enabled stakeholders to see progress on the work. We also showed our progress through automated performance tests and benchmarks similar to a website’s performance budget.
Moving to a Go executable changed the average response time from 550 seconds to 2 seconds—that isn’t a typo. We got a 99.64% decrease in processing time.
There were opportunities to fine tune the Rails implementation and algorithm it was using, but Go proved to be the better tool for the job.
As we continue to find performance benefits and measure the improvement of our application, we may begin to see some awesome side effects. For instance, because we reduced the time our application was querying our database, the number of simultaneous requests could increase and the load our application could take went up! Similarly, the faster report also meant that our loading screen was shown significantly less and created a better user experience moving from the upload to report page. Ultimately, this one improvement unlocked a lot of improvements across our architecture, and it allowed our application to run on fewer servers—which saved our client money!
Keep in mind that as we iteratively measure the immediate effects of our performance benefits to show value, we can look around for those “extra presents” that may be lying around as side effects. These improvements can be just as valuable and can show stakeholders that enabling the pivot created even more value than originally discussed!
6 Steps for a Mid-Project Pivot
Whenever you find a performance bottleneck, follow this outline, which provides actionable steps in patching the problem:
- Take a step back and dream of solutions. Don’t limit your ideas: problems can be solved using new architectures, languages, or features.
- Provide clear proof, with assumptions, on why this new solution will provide performance gains and business value.
- Set up a Strategy Pattern to manage flipping between the two experiences.
- Create contract tests to ensure accurate changes and provide confidence to developers and stakeholders.
- Continually measure performance impact and report to stakeholders to show progress and help validate their risk.
- Ship your performant code and bask in your success.