Delphi 2007 - Under the hood of Quality Assurance
Now that Delphi 2007, codenamed ‘Spacely’ has been released, I’d like to share what happened in the Quality Assurance team during the Spacely project cycle.
It all started on a dark and stormy night in February, in a small town called Scotts Valley…
Ok more seriously, the CodeGear crew wanted to do something special and different for our next release. During the transition from Borland to CodeGear, we had done a fairly detailed (long!) survey for our existing Delphi customers to better understand their needs, wants and pain points. From the feedback of this survey it was clear we had a feature full IDE, and the main issues were related to performance and stability. A feature is only solid and ‘good’ if the IDE can stay up and let you use it reliably! The result of this was the theme for our next project, Highlander at the time, was stability and performance. Do as small a set of critical features as possible, and spend significant time and make noticeable improvements to stability and performance in the IDE. To me, noticeable meant ’statistically significant’.
My teams challenge was to scientifically validate the statement “our next Delphi release is more stable and performs better then previous Delphi releases“. We went so far as to hire a QA engineer who had the specific task of performance and stability testing. Looking back, that was a smart move since we now have a set of suites that run against previous and current builds, and monitor such metrics as IDE start and shutdown time, compile time, project open and close time, build and run time, desktop switching time and many other metrics in a very scientific fashion.
Backing up a step - we also made a resolution to fix our test system. By that, I mean the QA team had been switching platforms and frameworks as a result of developing Kylix, then Delphi for .NET, and that had played havoc with our automation tooling. We use a flexible and powerful automation framework affectionately called ‘Zombie’ to drive our IDE automation. (It’s not all we use - we also have various command line based automation tools and frameworks, but the IDE has always been the most challenging to automate). This year we created a special team called the Automation Mini-Team which included QA engineers, R&D engineers, managers and integration engineer to dramatically improve our tooling. The results really paid off - every build we release to QA is automatically tested by a suite of IDE and command line tests, which then report to a test report server summarizing current run vs last run, and provides a graphical chart of test run time and test results. It’s slick, but not only that it works. Almost too well - this next year, we’ll be looking to improve to the next level : automatic email notification when results deviate statistically. For example, the QA and R&D engineer could have email notification if a test that used to run 95% success suddenly performed at 90%, or took twice as long to run, or any other metric we wanted to monitor such as memory use. Right now, we have too many results to quickly data mine so we need to automate that review of test results.
One particular R&D engineer, Steve Trefethen, spent a couple months helping to dramatically improve the Zombie framework. He went so far as to creating tools and a process to automatically generate models required by the automation on a per build basis. This meant the manual intervention required to maintain our tests went down a VERY large amount, so more effort could be put on writing and improving tests then on the library of test models! So, a big thanks goes out to Steve for that help. (And for those that don’t know - Steve used to be a QA engineer).
We hit the quality and testing problem from a number of angles- besides improvements to the test automation system, we also had field test surveys to identify pain points on the fly and monitor the IDE ‘feel’. We also nominated some outstanding individuals as ‘Field Test Marshals’. These folks co-ordinated Quality Central reports and feedback and summarized into ‘Top Ten’ lists that the R&D team referred to. To be fair, not all Top Ten issues could be addressed in the Spacely timeframe, since some required changes that would invalidate the ‘no breaking changes’ mantra. But we have those lists now and can focus on them for the breaking release later in the year, and we did fix a number of the issues for Spacely.
I’ll also give a huge thumbs up to VMWare - VMWare is used in our test automation system extensively, and helps to isolate changes especially in terms of performance and stability. You can always revert to a known state using VMWare, and do it in a few minutes. In addition, we could test rapidly against a number of different operating systems, configurations and languages. We extended our test system by a number of VMWare ESX servers - and they became so desirable and useful to testing, they hit maximum capacity after only a few weeks. I expect we’ll be after additional servers in the future - especially as we add additional automated tests.
So lots of great things happened in CodeGear Quality Assurance for Delphi 2007. And more still planned!