How to craft elegant software: August 2010

I am currently working on a project where I had to implement a data import feature. The goal was to implement a background job running on the server, that would import new (inserted) or already existing (updated) records from Xml files. These files could be retrieved from different data interfaces like : webservice, Ftp, file system, etc.

My first implementation was working fine. A class called ImportDataJob was created that used a class called ImportData to store the data to process and deal with the ImportStatus. A JobState was responsible for providing information about the progress of the process to the external world.

Fairly simple approach. I skip here the description of the different data interface implementations, but let's say they were already quite nicely decoupled form the job, that was able to use either a webservice, or Ftp, or the file system to get the data from. The mapping from the data received as Xml was done in separate data mappers that were used by the job, and every one of them was unit tested to ensure that it was doing its job properly.

So what was the issue then ? Well, it all started when I wanted to write test for the job behavior : test what happens when I don't receive any files, test what happens when I receive files that I can't process, test what happens when I can't send acknowledgment that the files have been processed, etc.

I know, in a perfect world, I would have started by writing the tests first and this would never have occurred, but let's say for the sake of discussion that this was not the case. So I start to try implement some tests, using NUnit and my favourite mocking framework Moq. So I mock my data mappers, I mock my data interface. I setup the mock so it sends no data, and I call my Execute method on my job. Ok, fair enough... I can test this first case. But it get more and more complicated when I start implementing the next tests.

Something was not completely right, and the reason why, was that the ImportDataJob class was just doing too many thinks at the same time. So I split it into several classes, respecting the Single Responsibility Principle.

As you can see, I went from 3 classes and 1 enum to 8 classses, 1 enum and 3 interfaces. ImportDataJob was now using an IDataRetriever to get the data. The data the job is working on was handled by an IDataContext, that could be passed on to different other objects as method parameters, and the import of data itself was clearly separated into two IDataImporter that were dealing with importing inserted data on one side, and updated data on the other. Thanks to these well defined interfaces, the responsibilities were clearly separated and the implementation of each part of the process completely decoupled from the job itself, thanks to the help of my good friend Unity.

I could then unit test each part of the process isolated from the others (have I just been redundant... a unit test is testing in isolation anyway, isn't it ? ...) using mocks for the different components to test the job behavior.

I know it seems quite easy when you see the end result. But it took me quite some time to realize that something was really wrong and that the ImportDataJob class was doing to many things. Why is that ? Probably because I was focused on making it work before making it right, which by the way is the way to go. But again, I think I could have avoided these issues if I had used TDD instead of jumping directly into implementation and writing tests afterward.

Lesson leant ? Well, we'll see if in the future I can keep the discipline to always write tests first

I recently encountered some particularities of the .Net framework when it comes to XDocument class.

First, the ToString() method, gives you the Xml contained in the docuemnt as a string indeed, but it completely forgets the Xml declaration.
Second, the Save method when called on an XmlWriter has a strange behavior.

Here are two extension methods to nicely deal with this issue.

public static class ConversionExtensions {

        public static string ToXml(this XDocument document) {
            if(document.Declaration == null) {
                document.Declaration = new XDeclaration("1.0", "utf-8", "no");
            }

            var sb = new StringBuilder();
            using(var writer = XmlWriter.Create(sb)) {
                document.Save(writer);
            }
            return sb.ToString();
        }

        public static Stream ToStream(this XDocument document) {
            var stream = new MemoryStream();
            using(var writer = XmlWriter.Create(stream, new XmlWriterSettings
            {
                Indent = true
            })) {
                document.Save(writer);
            }
            stream.Position = 0;
            return stream;
        }
    }

Notice how I use the Save() method instead of ToString() on the first extension so the Xml declaration is not omitted.

The important thing in the second extension is to reset the position of the stream to 0, otherwise the consumer of the stream will not see any data in there.

The other important think in both methods, it so wrap the XmlWriter into a using block. Unless you do that, you might end up with no data at all in your return value, since the XmlWriter is buffered. You can alternatively call the Close() method on the writer explicitely.

How to craft elegant software

Monday, August 23, 2010

Decoupling for testability, respecting SRP

Friday, August 20, 2010

Extending XDocument class to provide ToXml() and ToStream()