At the Institute for the Study of Violent Groups, we maintain the world’s largest open-source (meaning, drawn from publicly-available sources; not FOSS or CC, sadly) terrorism database. I’ve been the lead developer here at ISVG since February, 2008, and our IT and software infrastructures have seen a LOT of change. One of the greatest parts about the job is that I have essentially carte blanche to guide and refactor the codebase as I see fit, which has been both a blessing and a curse. When I arrived on scene, I was a hot-shot, full-of-myself programmer a couple years out of school (still working on my MS, actually). I knew was a pretty decent programmer, but I didn’t realize that I was a very poor software engineer. Now, two and a half years later, I’m both significantly better at all things development-related, and significantly more humble! I’ve gotten the opportunity at work to redo our “legacy” software (“legacy” meaning I hacked it together before I knew any better), applying all the lessons I’ve learned. The next few months should be interesting!
I’m going to post several blog entries in the coming months about the various refactoring steps we (a single co-worker and I) take in the process of improving the codebase and our database, so it’ll probably help to describe the current state of affairs, and the modifications I’ve applied already since I started at ISVG.
Since 2001, ISVG has produced a variety of reports and whitepapers, but our primary product has always been our database. I’m not quite clear on the timeline and technical details before 2008, but it started off as an Access database. At some point they experienced significant growing pains while having to support both a growing model and an increasing data entry staff, and the decision was made to migrate to a SQL Server 2000 database. The Microsoft Access-based UI was still used as the entry point, though. Over the course of several years, the Access file grew to the point of being unstable and virtually unusable (again, due to a growing domain model, but primarily due to the number of not-used-but-not-deleted artifacts the developers left behind). It was at this point that I came on board, tasked with building a custom UI for the database that wouldn’t suffer the same woes as the previous incarnation.
Before getting to start on this project, though, I was sidetracked to work on integrating our database with an unstructured knowledgebase we had contracted to create. Between data integration with the knowledgebase and working on integrating our data with various analytical tools, it was November before I really got to start on the new UI. At long last, though, I started digging into the Access module, and my struggles truly began, as the database schema had no documentation, and I was forced to reverse-engineer the entire schema based on bindings within the Access forms. Eventually, though, I forged through, and our database schema (despite still having many legacy artifacts) enjoys complete documentation.
In an effort to transition away from the dying beast that was our Access application as quickly as possible, I originally targeted SQL databinding in WinForms as the technology of choice. After mapping out a fifth of the schema and getting nothing more out of it than a generated data set large enough to consistently crash the Visual Studio 2005 designer, though, I decided we couldn’t take the short route, and set about implementing an OO domain model. Tuxedo (the internal name for our application) was born.
It was at this point that I experienced my greatest professional shame so far: after wrestling with NHibernate, with its then-unfamiliar terminology and (for a newbie like me) unforgiving documentation, for several weeks, I decided it would be easier to roll my own DAL/ORM, and set off down that rabbit hole. I’ve made many mistakes (some of which will be the topic of future posts), but eventually arrived at a working implementation that’s a daunting and schizophrenic combination of good design and ugly hacks, and all told about 120,000 SLOC (including designer-generated code).
At this point, our domain model consists of ~200 objects, our UI uses pretty complicated complex databinding, and our support for entity deletion is buggy at best. The codebase contains almost no abstraction (coming from a C++ background, I made overuse of base classes, and dramatic underuse of interfaces), and it’s very difficult to add new features at any point in the system. These are the problems I’m setting out to solve moving forward, and I hope you’ll come along with me for the ride.