Transforming software testing from a bottleneck to an enabler: our investment in Launchable
Originally posted on the 645 Ventures blog.
The term “bug” as reference to an issue in a technological system first appeared in 1878 in a letter (opens in a new tab) written by Thomas Edison as he was working on a telegraph system. As electronics became a bigger and bigger part of our infrastructure through phones, radios, etc, there was an increased focus on testing them to make sure they didn’t create issues. In the early 1900s, books like “Testing Electrical Apparatus” (opens in a new tab), “The Testing of Electrical Machinery”, and “Load Testing Devices” were published by companies like General Electric and Western Electro-Mechanical Co.
The origins of software date back to the late 1940s, but “software testing” didn’t formally exist until 1957, when it was formally separated from the “debugging” of software (i.e. fixing a problem after it was discovered in the wild). Testing would usually fall within the scope of a “quality assurance” (QA) team, which was separated from the engineering team, and could also be non-technical. As development cycles and methodologies changed, so did the level of collaboration between developers and the QA team. In a “waterfall” (opens in a new tab) setting for example, development and QA work is fully separated, and there’s no testing done until all the work is done. At times, testing happens even after the software is delivered to customers, which overlaps with debugging. As companies transitioned to agile, this feedback loop started to become shorter; QA would come in after each sprint, testing tickets separately, and approving or rejecting them for the current release. While better, this still created a massive bottleneck for engineering productivity and product velocity, as testing after merges were done could require some major refactoring, or push back a release altogether.
In the early 2000s, more automation came into the testing space. The creation of open source libraries like Selenium (opens in a new tab) replaced the majority of manual tests with automated ones. Now instead of waiting until the QA cycle began, engineers could test their new code before it was merged in. In 2005, Kohsuke Kawaguchi released an open source project called “Hudson” (later renamed to Jenkins (opens in a new tab)) which helped developers easily execute and schedule “builds” of their software, which would run all tests automatically to make sure the application still worked properly after making some changes. Kohsuke discussed on this podcast (opens in a new tab) about the frustration that breaking a build would cause, and how he just wanted to build something that allowed him to catch those issues earlier.
Jenkins is now one of the most popular open source projects ever, with more than 1.2 million nodes, and an iconic product in the Continuous Integration (“CI”) space, which has been growing quickly in the last decade. Cloudbees, the company commercializing Jenkins where Kohsuke served as CTO, was founded in 2010 and is now over 1.7B in their June 2021 fundraise. It’s clear that CI is becoming mainstream and gaining adoption in more and more engineering teams. We believe that this trend will only continue as more organizations start to rely on software, as we described in our Engineering Value Chain Revolution (opens in a new tab) investment theme.
Jenkins nodes growth from 0 in 2007 to 1.2 millions in June 2021
The growth of DevOps and CI as a quasi-standard industry practice has dramatically increased productivity of engineering teams. Rather than having to manually test software, it can now be done programmatically in the background. Kent Beck coined the “Continuous Integration” term in the context of “Extreme Programming (opens in a new tab)”, which also put forward a few core values around testing:
- All code must have unit tests.
- All code must pass all unit tests before it can be released.
- When a bug is found, tests are created.
As agile development became more prevalent, the speed of execution of a testing suite became one of the main bottlenecks for an engineering team. Unless there’s a green build (i.e. all tests are successful), developers cannot merge their pull requests and close their tickets. Tests usually run on each change added, so every time a piece of feedback has to be implemented, a design change needs to be done, which requires re-running the full test suite. In addition, each bug requires a new test to be created, further increasing the time it takes to run the suite.
OpenStack has a public CI dashboard that you can view here (opens in a new tab); on the left side, you can see how long it takes for a single build to run. Depending on the project, builds can take between 1 hour and 4 hours. Imagine being a developer on the project, and having to wait that long to have confirmation that your changes are good to go. When speaking with engineering leaders at large organizations, we have heard run times ranging from 3 hours to overnight builds. I personally spent a lot of hours over my career improving RSpec and Capybara test suites, and I know how frustrating it is for engineers to do this rather than work on core product challenges.
Continuous Delivery is another important piece of this equation. As software deployment has become more automated (and complex), we have moved from manual releases to continuous delivery pipelines that push software to production once all checks are met. Once again, tests are a circuit breaker for this process, and will prevent code from going live. At times, especially when responding to live incidents, engineering teams resort to running subsets of tests before releasing but those are usually decisions made in a qualitative manner, not a quantitative one.
The cost of a slow testing suite is clear to engineering leaders. Deploying quality features at a higher velocity than competitors can be a great advantage, because it enables great products to be delivered to customers first. On the other hand, the cost of NOT running extensive tests is even higher, as production outages can lead to serious customers and revenue loss. Teams have found themselves between a rock and a hard place, having to pick between a fast-running test suite or a thorough one.
One of the big advantages of testing is the institutionalization of knowledge around the behavior of your application. As a new engineer being onboarded on a team, I can make a change and then run the test suite to see all the ways in which it might affect the rest of the codebase. Without testing, I’d have to figure out where the code I’m modifying is being called, and what classes might inherit from it. The downside of this, as we mentioned, is that it can take a long time to run all of these tests.
Over the last few months we’ve had a chance to spend time with both Kohsuke Kawaguchi and Harpreet Singh, who worked with Kohsuke as VP Product Management and Design at Cloudbees. They recently started a new company called Launchable (opens in a new tab), which is creating a “testing intelligence platform”.
Today, CI platforms are all-or-nothing; they either run all tests, or they run none. They don’t have a way to pick which ones will be most predictive based on the changes made to the code. The smarter way to do testing would be to look at your changes, identify what potential areas of risk are, and then thoroughly test the at-risk parts in order to avoid delivering flawed software to your customers.
Launchable is building a product that sits at the CI-level through an open source cli interface (opens in a new tab)and records all test results from each build; this data is then used to train a machine learning model that powers the customer’s Launchable model. Developers can send them the code diff from their commit as well as a target confidence threshold, and in return they receive the smallest possible subset of tests that will achieve the level of confidence they requested, drastically reducing the runtime of their test suite without compromising on stability.
For example, one of Launchable’s customers is on track to cut ~15,600 hours of test time over a year, which is estimated to be worth ~$1M/year to the company. When you look at the cost of engineering salaries, as well as cloud compute to actually run the tests, you can start to understand the magnitude of the problem, and the amount of savings that a product like Launchable can bring to an engineering organization.
645 Ventures is excited to announce our lead investment in Launchable’s Series A, joining great partners in the syndicate such as Battery Ventures and Unusual Ventures, and an excellent group of angels including Sri Viswanath, CTO of Atlassian, Sacha Labourey, CEO of Cloudbees, and Al Zollar, former Head of IBM Tivoli and Board Member of Red Hat, and more exceptional operators. We believe Harpreet, Kohsuke, and the rest of the Launchable team can build an iconic developer tools company, and we’re excited to be in their corner from the early days of their journey.© Alessio Fanelli.RSS