• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)


Date: Thursday, 27 Nov 2008 15:15

Introduction

Like generics, delegates are one of those features that developers use without really understanding. Initially this wasn't really a problem since delegates were reserved for fairly specific purposes: implementing callbacks and as the building-block for events (amongst a few other edge cases). However, each version of .NET has seen delegates evolve, first with the introduction of anonymous methods in 2.0 and now with lambda expressions in C# 3.0. With each evolution, delegates have become less of an specific pattern and more of a general purpose tool. In fact, most libraries written specifically for .NET 3.5 are likely to make heavy use of lambda expressions. As always, our concern isn't just about understanding the code that we use, but also about enriching our own toolset. Seven years ago it wouldn't have been abnormal to see even a complex system make little (or no) us of delegates (except for using the events of built-in controls). Today, however, even the simplest systems heavily relies on them.

Delegates

The best way to learn about all three framework/language feature is to start from the original and build our way up, seeing what each evolution adds. Delegates have always been pretty simple to understand, but without any good reason to use them, people never really latched on to the concept. It's always easier to understand something when you can see what problem it solves and how it's used - and examples of delegates always seem contrived.

Delegates are .NETs version of function pointers - with added type safety. If you aren't familiar with C or C++ (or another lower level languages) that might not be very helpful. Essentially they let you pass a method into another method as an argument. Although many developers understand the concept in languages such as JavaScript, the strictness of C#/VB.NET makes it a little more confusion. For example, the following JavaScript code is completely valid (and even common):

function executor(functionToExecute)
{
   functionToExecute(9000);
}
var doSomething = function(count){alert("It's Over " + count);}
executor(doSomething);

Although simplistic, the above code shows how a function can be assigned to a variable (not the return value mind you, the actual function body) and then passed around as a parameter into a function. This parameter can then be executed as you would any other function.

The only difference in .NET is that you can't just assign any function to any variable and pass it into any method. Instead everything has to be properly typed. That's where delegates come in, they let you define the signature for a method – its parameters and return type. So, to build the above code in C#, we use the following:

public delegate void NotifyDelegate(int count);
public class Executor
{
   public static void Execute(NotifyDelegate notifier)
   {
       notifier(9000);
   }
}

public class Program
{
     public static void Main(string[] args)
     {
         NotifyDelegate notifier = Alert;
         Executor.Execute(notifier);
     }
     public static void Alert(int count)
     {
        Console.WriteLine("It's over {0}", count);
     }
}

It seems like a lot more code, but a good chunk of it is simply the requirement for everything in .NET to be in a class. The first, and most important, line is the definition for the delegate itself. Delegates are much like classes, interfaces, structures or enums – they define a type. Here we've defined a type named NotifyDelegate. Any method can be assigned to a variable of type NotifyDelegate provided that method has the same signature: it must return void and take a single parameter of type int. In the above code I explicitly assigned the Alert method to a variable named notifier for demonstration purposes only, the following would have been just as acceptable:

Executor.Execute(Alert);

The point is that a delegate can be used like any other type, the difference is simply that its assigned to a method. The other point to keep in mind is that my example used a static method (Alert). You can use either a static or an instance method – again, the only requirement is that it meets the defined method signature. Within an instance method you have access to all instance members just like any normal method (because it is a normal method).

Let's Get More Practical

So we have an idea of what delegates are, but when would we use them? As I mentioned earlier, in their normal form, their usage is fairly reserved for specific cases, most notably callbacks (if you've ever used an asynchronous method you likely had to supply a delegate to be called when the async call completed). However, lets look at a real example in which you might use a delegate. Initially this is going to be a little contrived, but as we move the example to anonymous methods and then lambda expressions, the example will feel more natural. This is a real example I use in my code.

Within our data layer we have a method that expects an array of objects and saves them all within a transaction. The method looks something like this (we use NHibernate, but the implementation details don't matter):

public class GenericStore<T>
{
    public void Save(params T[] entities)
    {
       using (var transaction = BeginTransaction())
       {
          try
          {
             foreach(T entity in entities)
             { 
                Save(entity);
             }
             transaction.Commit();
          }
          catch(Exception)
          {
             transaction.Rollback();
          }
       }
    }
}

We use this method, for example, when we want to switch the default group, something like:

public void SwitchDefault(Group newDefault)
{
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   DataFactory.For<Group>().Save(oldDefault, newDefault);
}

Of course, if our Save method throws an exception, we need to undo our code (rolling back the transaction undoes the database commit, but not the actual in-memory change we made). Here's one way to do that:

public void SwitchDefault(Group newDefault)
{
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   try
   {
      DataFactory.For<Group>().Save(oldDefault, newDefault);
   }
   catch
   {
      oldDefault.Default = true;
      newDefault.Default = false;
   }
}

This works fine, except we end up with a lot of try/catches all over the place. Instead we use a delegate (again, this is actually more code, but it's just a foundation to progress to anonymous methods). Here's our improved Save method:

public class GenericStore<T>
{
   public delegate void RollbackDelegate(T entities);

   public void Save(params T[] entities)
   {
      Save(null, entities);
   }
   public void Save(RollbackDelegate rollback, params T[] entities)
   {
      using (var transaction = BeginTransaction())
      {
         try
         {
            foreach(T entity in entities)
            { 
               Save(entity);
            }
            transaction.Commit();
         }
         catch(Exception)
         {
            transaction.Rollback();
            if (rollback != null)
            { 
               rollback(entities);
            }
         }
      }      
   }
}

The overloaded Save method is provided so that calling code can provide a delegate or not. We can use this code simply by doing:

public void SwitchDefault(Group newDefault)
{
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   DataFactory.For<Group>().Save(SwitchDefaultRollback, oldDefault, newDefault);
}

public void SwitchDefaultRollback(Group[] groups)
{
    groups[0].Default = false;
    groups[1].Default = true;
}

Whether or not you consider this an improvement over the try/catch solution is largely a matter of taste. I find neither particularly elegant. The problem with the delegate solution is the need for the extra method, and the awkward use of array indexes (we're relying on the Save method to pass the items back in the same order they were passed in).

Anonymous Methods

.NET 2.0 added a fairly significant improvement to delegates: anonymous methods. Delegates still exist and are still the ideal solution for a number of cases. However, for situations such as the one we're developing, they are far from perfect. What we want is a more concise way to create our callback as well as something that'll help us avoid the weird array indexing. Anonymous methods solve both those problem. We'll address each point separately.

Probably the most intimidating aspect of anonymous method is the way they are declared. Unlike normal methods, anonymous methods are declared within another method, using the delegate keyword:

public void SwitchDefault(Group newDefault)
{
   GenericStore<Group>.RollbackDelegate rollback = delegate(Group[] entities)
   {
      groups[0].Default = false;
      groups[1].Default = true;
   };	
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   DataFactory.For<Group>().Save(rollback, oldDefault, newDefault);
}

Like any nested-type, we access our delegate via its full name (GenericStore<Group>.RollbackDelegate). The delegate keyword creates an anonymous method – which behaves like any other method, except it isn't named and exists within a limited scope. Again, I assigned the anonymous method to a variable for demonstrative purposes, in real life you're more likely do to:

public void SwitchDefault(Group newDefault)
{
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   DataFactory.For<Group>().Save(delegate(Group[] entities)
     {
        groups[0].Default = false;
        groups[1].Default = true;
     }, oldDefault, newDefault);
}

The syntax is more confusing. If you look at it, you'll notice that we're really just passing 3 parameters to our Save method – our anonymous method, oldDefault and newDefault. The syntax will improve considerably when we look at the next evolution. For now, it's important that you understand the concept behind creating an anonymous method.

While the syntax might be the most confusing, the most important aspect of anonymous methods is their scope. Anonymous methods behave like any other code-block. In our above code that means that our anonymous method has access to oldDefault, newDefault and all the other instance members which might be defined (like the GetDefaultGroup method we're calling). That means that we can really simplify our code. First, we'll change our delegate so that it no longer passes back our array of entities:

public delegate void RollbackDelegate();

Along with the corresponding part of our Save method:

transaction.Rollback();
if (rollback != null)
{ 
   rollback();
}

Our calling code now looks like:

public void SwitchDefault(Group newDefault)
{
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   DataFactory.For<Group>().Save(delegate()
     {
        newDefault.Default = false;
        oldDefault.Default = true;
     }, oldDefault, newDefault);
}

I consider this much cleaner, not only because there's less chance of bugs, but also because the code is far more readable. All ambiguity around what groups[0] and groups[1] referred to has been removed.

Our solution still isn't perfect (the syntax around anonymous delegates is a little messy), but to me it's definitely a step in the right direction. We no longer have to create a full-blown method for each delegate, and having our scoped within the method gives us direct access to variables we'll likley need.

Lambda Expressions

While anonymous methods provide a new feature, lambda expressions merely provide an improved syntax. The improvement is rather significant though, which has made anonymous methods even more popular. At one point our delegate was passing along an array as a parameter:

public delegate void RollbackDelegate(T entities);

And, to be valid, our anonymous method had to be defined with the same signature:

delegate(Group[] entities){ ... }

Although we've moved beyond the need to pass-back the array, I want to look at lambdas from this point on, as it'll help make things clearer (and often times you'll have delegates with parameters). The lambda version of our above code is:

entities => {....}

Essentially, the => operator (some people call it the wang operator) replaces the need for both the delegate keyword as well as the parameter types. Additionally, if your delegate is a single statement, you can drop the brackets { }. If you have multiple parameters, you wrap them in parenthesis:

(entities, transaction) => {...}

If you don't have any parameters, like in our example, you use empty paranthesis:

() => {...}

It's easy to get mixed up with the syntax, but if you walk backwards through the code, hopefully everything makes sense. Here's what our implementation now looks like:

public void SwitchDefault(Group newDefault)
{
   var oldDefault = GetDefaultGroup(); //implementation isn't important
   oldDefault.Default = false;
   newDefault.Default = true;
   DataFactory.For<Group>().Save(() =>{newDefault.Default = false; oldDefault.Default = true;}, oldDefault, newDefault);
}

Some More Examples

To get a good feel for the syntax, let's look at some other, common, examples. We'll stick to the List<T> class, which exposes a number of methods which expect delegates.

To get the sum of all integers within a list using an anonymous method:

var ids = new List<int>{1,2,3,4,5};
var sum = 0;
ids.ForEach(delegate(int i){ sum += i;});

Using a lambda expression:

var ids = new List<int>{1,2,3,4,5};
var sum = 0;
ids.ForEach(i => sum += i);

To find a specific group by its id:

public Group FindGroup(int id)
{
   var groups = GetAllGroups(); //there might be a more efficient way!
   return groups.Find(delegate(Group g){return g.Id == id;});
}

Using a lambda expression:

public Group FindGroup(int id)
{
   var groups = GetAllGroups(); //there might be a more efficient way!
   return = groups.Find(g => g.Id == id);
}

Notice that using a lambda we don't even have to return true or false if a match is found. Lambdas explicitly returns the value.

Delegates and Generics

The last thing we'll cover is the synergy between delegates and generics. This is something I've covered in depth before, so we'll only briefly discuss it here. The .NET framework comes with a number of built-in delegates (for example, we might not need to define our own RollbackDelegate type as the .NET framework might already have one for the same method signature). It turns out that if you sprinkle some generic goodness of top of delegates, you can easily create a delegate for almost any situation. There are three core generic delegates within the .NET framework: Predicate, Func and Action. Each comes with a number of overloads to cover the most common cases:

delegate bool Predicate();
delegate bool Predicate<T1>(T1 parameter1);
delegate bool Predicate<T1, T2>(T1 parameter1, T2 paremter2);
delegate bool Predicate<T1, T2, T3>(T2 parameter1, T2 paremter2, T3 parameter3);

delegate T Func<T>(T returnType);
delegate T Func<T, T1>(T returnType, T1 parameter1);
delegate T Func<T, T1, T2>(T returnType, T1 parameter1, T2 paremter2);
delegate T Func<T T1, T2, T3>(T returnType, T1 parameter1, T2 paremter2, T3 parameter3);

delegate void Action();
delegate void Action<T1>(T1 parameter1);
delegate void Action<T1, T2>(T1 parameter1, T2 paremter2);
delegate void Action<T1, T2, T3>(T1 parameter1, T2 paremter2, T3 parameter3);

The difference between the three is the type of the return (Predicate always returns bool, Func returns T and Action returns void). The overloads just let us support multiple parameters.

Instead of using this delegate:

public delegate void RollbackDelegate(T entities);
...
public void Save(RollbackDelegate rollback, params T[] entities){...}

We could have simply used:

public void Save(Action<T[]> rollback, params T[] entities){...}

And instead of:

public delegate void RollbackDelegate();
...
public void Save(RollbackDelegate rollback, params T[] entities){...}

We could have simply used:

public void Save(Action rollback, params T[] entities){...}

Conclusion

Hopefully this helped clarify delegates, anonymous methods and lambdas, both in terms of their crazy syntax as well as how you can use them within your own code. When you combine this with a solid understanding of generics you end up with some powerful and concise code. You also end up with new ways to solve existing problems, which could otherwise be problematic and ugly. Don't be afraid to try using some of these solutions within your code. The best way to learn how pieces fit together and to actually try to make something work

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Thursday, 14 Aug 2008 15:39

The relative hype around the Foundation ebook has been pretty fun. Today I noticed a very detailed (and positive) review of the book. Which is, of course, flattering.

If there's one thing a few people don't care for though, it's the title. They don't feel that it properly captures the spirit of the book, or that it isn't as marketable as it could be. I think if this was a "real" book, the publisher would have insisted on something different - likely with the words "ALT.NET" in big print.

However, given that it wasn't commercial, I had the luxury of being a little cleverer. The point, which I think most people get, is that this stuff *IS* fundamental. If enterprise developers think "fundamentals" mean if-statements, recursions and hash algorithm, than we're in trouble. IoC might not be what they teach in school - but it should be. Or at least it should be the first thing you teach yourself. It reminds me of the funny pro-Google quote:

"Google uses Bayesian filtering the way Microsoft uses the if statement"

If I had to do it again though, I'd probably change the name. My ego can't stand knowing I might have gotten more praise with a better title.

 

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 23 Jul 2008 01:49
Like me, you might have been surprised that the foundation series didn't have a chapter on the MVC pattern. I'm no fan of the existing page model (I actually think it's horrible), and I've successfully used MonoRail on a few projects, so it would have made for a good topic. My reasons for not including something on MVC were simple: we were and continue to be flooded with MVC information (as though it's a brand new invention), and I didn't think I could explain MVC using MonoRail effectively (I find it has a steep learning curve). I considered using RoR, but figured that would confuse people even more.

Hopefully though, if you're a fan of the foundation series, you've already downloaded the learning application which puts the theory to practice using ASP.NET's MVC framework. So, what do I think about ASP.NET MVC? Overall I've been very impressed. I can't think of a good reason for starting a new project using the WebForms model - or MonoRail for that matter (sorry). If you're an ASP.NET developer, it's really a no brainer.

I do have two major issues with it though. First, if you come from almost any other MVC framework (MonoRail, Django, RoR, Akelos, etc...) you might be expecting an actually Model framework - instead you get an empty Model folder. In other words, the MVC framework doesn't add anything to the .NET O/R Mapping / DAL story. From Microsoft's point of view this makes sense, since they feel that they are already offering solid solutions - DataSets, SqlDataSources, LINQ to SQL, Entity Framework. Truth be told, this is fine with me, as it lets me use NHibernate. I just think, given what the other MVC frameworks offer, it's a little dishonest - you'll end up disappointed if you're expecting to be able to do this out of the box:

public class Car : ActiveRecord
{
}
....
Car.FindById(1);

My real problem though is simply that neither C# nor VB.NET lend themselves all that well to view logic. Jeff Atwood actually just blogged the same criticism. Jeff uses RoR to highlight the problem. I don't fully agree. I won't say that RHTML is great, but I will say that it's far better than C# or VB.NET. I think views need a specialize language - I'm sure that anyone who's done some significant work in either RoR or Django would agree. There are solutions available now - NVelocity and Boo (I assume you could use it with the MVC framework?), but I'm just going to trudge along with C# until IronRuby is a viable solution.

Aside from that, everything is pretty solid - routes work great, helper methods are adequate (they're starting to add more and more), and testing is actually doable - I haven't run into any problems, but from what I've read things aren't 100% perfect yet (either way, it's a huge step up from WebForms).

So, to recap. MVC good. WebForms Bad. C# in views less than ideal. Empty Model folder = M. Oh, and download the learning application!.

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Friday, 18 Jul 2008 13:34

If you're anything like me, you probably learn a lot better by going through code rather than reading books. I'm happy to release the Foundations of Programming Learning Application - it's a complete solution meant to show what was covered in the Foundations series. It's a Visual Studio 2008 solution.

You can download it here. It should require no configuration (my fingers are crossed on that one) and ought to just run out of the box. There are comments sprinkled all over to help explain things or provide some insight. No doubt there'll be typos, since I'm nothing without word.

(you can grab the free ebook from: http://codebetter.com/blogs/karlseguin/archive/2008/06/24/foundations-of-programming-ebook.aspx)

What is it?
It's a sample awards website - with categories and nominees. The root container is called a Round - a sample Round would be called 'The 2008 CodeBetter Awards'. A Round has a state (planning, annoucements, voting, winners) and a number of Categories (Best Blogger, Best Blog Post, Best Open Source Project, ...) with each categories having a Nominee (Title, Summary, Link, Author...). The website is using the ASP.NET MVC Preview 4 - I don't think you'll need to install anything extra as all the DLLs are included with the project. I'm using an SQL Lite database with a relative path to the file, so all should work as-is. Dummy data is already loaded.

The web application mostly shows a read-only view of the data. There's also a sample console application that does more administrative stuff (it isn't interactive, it just runs through 4 steps or so). You can run the administrative portion over and over again - the first step is to clean itself up. The admin part basically adds a new round, with categories and nominees.

Of course, there's a project full of unit tests as well.

I tried to keep everything simple and straightforward (which is largely why I didn't want to build a whole web-based admin module and user registration and all that). Like most, I'm pretty new to ASP.NET MVC. Some might think my views have too much code, I think they have the perfect amount Stick out tongue. There's extensive use of Lambdas, so if you have a hard time reading them, I hope my excessive examples will help illuminate them.

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 25 Jun 2008 01:53

I'm excited to finally release the official, and completely free, Foundations of Programming EBook. This essentially contains all 9 Foundation parts including a conclusion and some typical book fluff (table of content, acknowledgement and so on). A number of spelling errors were corrected, along with some small technical changes and clarifications - largely based on feedback, so thanks for everyone who provided it! Otherwise it's exactly the same as what's been posted here over the past several months.

Download it from http://codebetter.com/files/folders/codebetter_downloads/entry179694.aspx

Download the Learning Application from: http://codebetter.com/blogs/karlseguin/archive/2008/07/18/foundations-of-programming-learning-application.aspx

 Foundations Of Programming 

If the above link fails, you can also get it from http://www.openmymind.net/FoundationsOfProgramming.pdf

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 18 Jun 2008 12:32
Few keywords are as simple yet amazingly powerful as virtual in C# (overridable in VB.NET). When you mark a method as virtual you allow an inheriting class to override the behavior. Without this functionality inheritance and polymorphism wouldn't be of much use. A simple example, slightly modified from Programming Ruby (ISBN: 978-0-9745140-5-5), which has a KaraokeSong overrides a Song's to_s (ToString) function looks like:
class Song
   def to_s
      return sprintf("Song: %s, %s (%d)", @name, @artist, @duration)
   end
end

class KaraokeSong < Song
   def to_s
      return super + " - " @lyrics
   end
end

The above code shows how the KaraokeSong is able to build on top of the behavior of its base class. Specialization isn't just about data, it's also about behavior!

Even if your ruby is a little rusty, you might have picked up that the base to_s method isn't marked as virtual. That's because many languages, including Java, make methods virtual by default. This represents a fundamental differing of opinion between the Java language designers and the C#/VB.NET language designers. In C# methods are final by default and developers must explicitly allow overriding (via the virtual keyword). In Java, methods are virtual by default and developers must explicitly disallow overriding (via the final keyword).

Typically virtual methods are discussed with respect to inheritance of domain models. That is, a KaraokeSong which inherits from a Song, or a Dog which inherits from a Pet. That's a very important concept, but it's already well documented and well understood. Therefore, we'll examine virtual methods for a more technical purpose: proxies.

Proxy Domain Pattern

A proxy is something acting as something else. In legal terms, a proxy is someone given authority to vote or act on behalf of someone else. Such a proxy has the same rights and behaves pretty much like the person being proxied. In the hardware world, a proxy server sits between you and a server you're accessing. The proxy server transparently behaves just like the actual server, but with additional functionality - be it caching, logging or filtering. In software, the proxy design pattern is a class that behaves like another class. For example, if we were building a task tracking system, we might decide to use a proxy to transparently apply authorization on top of a task object:

public class Task
{  
   public static Task FindById(int id)
   {
      return TaskRepository.Create().FindById(id);
   }   

   public virtual void Delete()
   {
      TaskRepository.Create().Delete(this);
   }
}
public class TaskProxy : Task
{
   public override void Delete()
   {
      if (User.Current.CanDeleteTask())
      {
         base.Delete();
      }
      else
      {
         throw new PermissionException(...);
      }
   }
}

Thanks to polymorphism, FindById can return either a Task or a TaskProxy. The calling client doesn't have to know which was returned - it doesn't even have to know that a TaskProxy exists. It just programs against the Task's public API.

Since a proxy is just a subclass that implements additional behavior, you might be wondering if a Dog is a proxy to a Pet. Proxies tend to implement more technical system functions (logging, caching, authorization, remoting, etc) in a transparent way. In other words, you wouldn't declare a variable as TaskProxy - but you'd likely declare a Dog variable. Because of this, a proxy wouldn't add members (since you aren't programming against its API), whereas a Dog might add a Bark method.

Interception

The reason we're exploring a more technical side of inheritance is because two of the tools we've looked at so far, RhinoMocks and NHibernate, make extensive use of proxies - even though you might not have noticed. RhinoMocks uses proxies to support its core record/playback functionality. NHibernate relies on proxies for its optional lazy-loading capabilities. We'll only look at NHibernate, since it's easier to understand what's going on behind the covers, but the same high level pattern applies to RhinoMocks.

(A side note about NHibernate. It's considered a frictionless or transparent O/R mapper because it doesn't require you to modify your domain classes in order to work. However, if you want to enable lazy loading, all members must be virtual. This is still considered frictionless/transparent since you aren't adding NHibernate specific elements to your classes - such as inheriting from an NHibernate base class or sprinkling NHibernate attributes everywhere.)

Using NHibernate there are two distinct opportunities to leverage lazy loading. The first, and most obvious, is when loading child collections. For example, you may not want to load all of a Model's Upgrades until they are actually needed. Here's what your mapping file might look like:

<class name="Model" table="Models">
   <id name="Id" column="Id" type="int">
      <generator class="native" />
   </id>
   ...
   <bag name="Upgrades" table="Upgrades" lazy="true" >
      <key column="ModelId" />
      <one-to-many class="Upgrade" />
   </bag>      
</class>

By setting the lazy attribute to true on our bag element, we are telling NHibernate to lazily load the Upgrades collection. NHibernate can easily do this since the it returns it uses its own collection types (which all implement standard interfaces, such as IList, so to you, it's transparent).

The second, and far more interesting, usage of lazy loading is for individual domain objects. The general idea is that sometimes you'll want whole objects to be lazily initialized. Why? Well, say that a sale has just been made. Sales are associated with both a sales person and a car model:

Sale sale = new Sale();
sale.SalesPerson = session.Get<SalesPerson>(1);
sale.Model = session.Get<Model>(2);
sale.Price = 25000;
session.Save(sale);

Unfortunately, we've had to go to the database twice to load the appropriate SalesPerson and Model - even though we aren't really using them. The truth is all we need is their ID (since that's what gets inserted into our database), which we already have.

By creating a proxy, NHibernate lets us fully lazy-load an object for just this type of circumstance. The first thing to do is change our mapping and enable lazy loading of both Models and SalesPeoples:

<class name="Model" table="Models" lazy="true" proxy="Model">...</class>

<class name="SalesPerson" table="SalesPeople" 
      lazy="true" proxy="SalesPerson ">...</class>

The proxy attribute tells NHibernate what type should be proxied. This will either be the actual class you are mapping to, or an interface implemented by the class. Since we are using the actual class as our proxy interface, we need to make sure all members are virtual - if we miss any, NHibernate will throw a helpful exception with a list of non-virtual methods. Now we're good to go:

Sale sale = new Sale();
sale.SalesPerson = session.Load<SalesPerson>(1);
sale.Model = session.Load<Model>(2);
sale.Price = 25000;
session.Save(sale);

Notice that we're using Load instead of Get. The difference between the two is that if you're retrieving a class that supports lazy loading, Load will get the proxy, while Get will get the actual object. With this code in place we're no longer hitting the database just to load IDs. Instead, calling Session.Load<Model>(2) returns a proxy - dynamically generated by NHibernate. The proxy will have an id of 2, since we supplied it the value, and all other properties will be uninitialized. Any call to another member of our proxy, such as sale.Model.Name will be transparently intercepted and the object will be just-in-time loaded from the database.

Just a note, NHibernate's lazy-load behavior can be hard to spot when debugging code in Visual Studio. That's because VS.NET's watch/local/tooltip actually inspects the object, causing the load to happen right away. The best way to examine what's going on is to add a couple breakpoints around your code and check out the database activity either through NHibernate's log, or SQL profiler.

Hopefully you can imagine how proxies are used by RhinoMocks for recording, replaying and verifying interactions. When you create a partial you're really creating a proxy to your actual object. This proxy intercepts all calls, and depending on which state you are, does its own thing. Of course, for this to work, you must either mock an interface, or a virtual members of a class.

In This Chapter

In chapter 6 we briefly covered NHibernate's lazy loading capabilities. In this chapter we expanded on that discussion by looking more deeply at the actual implementation. The use of proxies is common enough that you'll not only frequently run into them, but will also likely have good reason to implement some yourself. I still find myself impressed at the rich functionality provided by RhinoMock and NHibernate thanks to the proxy design pattern. Of course, everything hinges on you allowing them to override or insert their behavior over your classes. Hopefully this chapter will also make you think about which of your methods should and which shouldn't be virtual. I strongly recommend that you take a look at the following articles/posts to better understand the virtual by default vs final by default points of view:

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Friday, 30 May 2008 00:02

Exceptions are such powerful constructs that developers can get a little overwhelmed and far too defensive when dealing with them. This is unfortunate because exceptions actually represent a key opportunity for developers to make their system considerably more robust. In this chapter we'll look at three distinct aspects of exceptions : handling, creating and throwing them. Since exceptions are unavoidable you can neither run nor hide, so you might as well leverage.

Handling Exceptions

Your strategy for handling exceptions should consist of two golden rules:
1 - Only handle exceptions that you can actually do something about, and
2 - You can't do anything about the vast majority of exceptions

Most new developers do the exact opposite of the first rule, and fight hopelessly against the second. When your application does something deemed exceptionally outside of its normal operation the best thing to do is fail right then and there. If you don't you won't only lose vital information about your mystery bug, but you risk placing your application in an unknown state, which can result in far worse consequences.

Whenever you find yourself writing a try/catch statement, ask yourself if you can actually do something about a raised exception. If your database goes down, can you actually write code to recover or are you better off displaying a friendly error message to the user and getting a notification about the problem? It's hard to accept at first, but sometimes it's just better to crash, log the error and move on. Even for mission critical systems, if you're making typical use of a database, what can you do if it goes down? This train of thought isn't limited to database issues or even just environmental failures, but also your typical every-day runtime bug . If converting a configuration value to an integer throws a FormatException does it make sense continuing as if everything's ok? Probably not.

Of course, if you can handle an exception you absolutely ought to - but do make sure to catch only the type of exception you can handle. Catching exceptions and not actually handling them is called exception swallowing (I prefer to call it wishful thinking) and it's a bad code. A common example I see has to do with input validation. For example, let's look at how not to handle a categoryId being passed from the QueryString of an ASP.NET page.

int categoryId;
try
{
  categoryId = int.Parse(Request.QueryString["categoryId"]);
}
catch(Exception)
{
  categoryId = 1;
}

The problem with the above code is that regardless of the type of exception thrown, it'll be handled the same way. But does setting the categoryId to a default value of 1 actually handle an OutOfMemoryException? Instead, the above could should catch a specific exception:

int categoryId;
try
{
   categoryId = int.Parse(Request.QueryString["categoryId"])
}
catch(FormatException)
{
   categoryId = -1;
}

(an even better approach would be the use the int.TryParse function introduced in .NET 2.0 - especially considering that int.Parse can throw two other types of exceptions that we'd want to handle the same way, but that's beside the point).

Logging

Even though most exceptions are going to go unhandled, you should still log each and every one of them. Ideally you'll centralize your logging - an HttpModule's OnError event is your best choice for an ASP.NET application or web service. I've often seen developers catch exceptions where they occur only to log and rethrow (more on rethrowing in a bit). This causes a lot of unnecessary and repetitive code - better to let exceptions bubble up through your code and log all exceptions at the outer edge of your system. Exactly which logging implementation you use is up to you and will depend on the criticalness of your system. Maybe you'll want to be notified by email as soon as exceptions occur, or maybe you can simply log it to a file or database and either review it daily or have another process send you a daily summary. Many developers leverage rich logging frameworks such as log4net or Microsoft's Logging Application Block.

Cleaning Up

In the previous chapter we talked about deterministic finalization with respect to the lazy nature of the garbage collector. Exceptions prove to be an added complexity as their abrupt nature can cause Dispose not to be called. A failed database call is a classic example:

SqlConnection connection = new SqlConnection(FROM_CONFIGURATION)
SqlCommand command = new SqlCommand("SomeSQL", connection);
connection.Open();
command.ExecuteNonQuery();
command.Dispose();
connection.Dispose();

If ExecuteNonQuery throws an exception, neither our command nor our connection will get disposed of. The solution is to use Try/Finally:

SqlConnection connection;
SqlCommand command;
try
{
   connection = new SqlConnection(FROM_CONFIGURATION)
   command = new SqlCommand("SomeSQL", connection);
   connection.Open();
   command.ExecuteNonQuery();
}
finally
{
   if (command != null) { command.Dispose(); }
   if (connection != null) { connection.Dispose(); }
}

or the syntactically nicer using statement (which gets compiled to the same try/finally above):

using (SqlConnection connection = new SqlConnection(FROM_CONFIGURATION))
using (SqlCommand command = new SqlCommand("SomeSQL", connection))
{
   connection.Open();
   command.ExecuteNonQuery();
}

The point is that even if you can't handle an exception, and you should centralize all your logging, you do need to be mindful of where exceptions can crop up - especially when it comes to classes that implement IDiposable.

Throwing Exceptions

There isn't one magic rule to throwing exceptions like there is for catching them (again, that rule is don't catch exceptions unless you can actually handle them). Nonetheless throwing exceptions, whether or not they be your own (which we'll cover next), is still pretty simple. First we'll look at the actual mechanics of throwing exceptions, which relies on the throw statement. Then we'll examine when and why you actually want to throw exceptions.

Throwing Mechanics

You can either throw a new exception, or rethrow a caught exception. To throw a new exception, simply create a new exception and throw it.

throw new Exception("something bad happened!");
//or
Exception ex = new Exception("somethign bad happened");
throw ex;

I added the second example because some developers think exceptions are some special/unique case - but the truth is that they are just like any other object (except they inherit from System.Exception which in turn inherits from System.Object). In fact, just because you create a new exception doesn't mean you have to throw it - although you probably always would.

On occasion you'll need to rethrow an exception because, while you can't handle the exception, you still need to execute some code when an exception occurs. The most common example is having to rollback a transaction on failure:

ITransaction transaction = null;
try
{
  transaction = session.BeginTransaction();
  // do some work
  transaction.Commit();
}
catch
{
  if (transaction != null) { transaction.Rollback(); }
  throw;
}
finally
{
  //cleanup
}

In the above example our vanilla throw statement makes our catch transparent. That is, a handler up the chain of execution won't have any indication that we caught the exception. In most cases, this is what we want - rolling back our transaction really doesn't help anyone else handle the exception. However, there's a way to rethrow an exception which will make it look like the exception occurred within our code:

catch (HibernateException ex)
{
  if (transaction != null) { transaction.Rollback(); }
  throw ex;
}

By explicitly rethrowing the exception, the stack trace is modified so that the rethrowing line appears to be the source. This is almost always certainly a bad idea, as vital information is lost. So be careful how you rethrow exceptions - the difference is subtle but important.

If you find yourself in a situation where you think you want to rethrow an exception with your handler as the source, a better approach is to use a nested exception:

catch (HibernateException ex)
{
  if (transaction != null) { transaction.Rollback(); }
  throw new Exception("Email already in use", ex);
}

This way the original stack trace is still accessible via the InnerException property exposed by all exceptions.

When To Throw Exceptions

It's important to know how to throw exceptions. A far more interesting topic though is when and why you should throw them. Having someone else's unruly code bring down your application is one thing. Writing your own code that'll do the same thing just seems plain silly. However, a good developer isn't afraid to judicially use exceptions.

There are actually two levels of thought on how exceptions should be used. The first level, which is universally accepted, is that you shouldn't hesitate to raise an exception whenever a truly exceptional situation occurs. My favorite example is the parsing of configuration files. Many developers generously use default values for any invalid entries. This is ok in some cases, but in others it can put the system in an unreliable or unexpected state. Another example might be a Facebook application that gets an unexpected result from an API call. You could ignore the error, or you could raise an exception, log it (so that you can fix it, since the API might have changed) and present a helpful message to your users.

The other belief is that exceptions shouldn't just be reserved for exceptional situations, but for any situation in which the expected behavior cannot be executed. This approach is related to the design by contract approach - a methodology that I'm adopting more and more every day. Essentially, if the SaveUser method isn't able to save the user, it should throw an exception.

In languages such as C#, VB.NET and Java, which don't support design by contract mechanism, this approach can have mixed results. A Hashtable returns null when a key isn't found, but a Dictionary throws an exception - the unpredictable behavior sucks (if you're curious why they work differently check out Brad Abrams blog post). There's also a line between what constitutes control flow and what's considered exceptional. Exceptions shouldn't be used to control an if/else-like logic, but the bigger a part they play in a library, the more likely programmers will use them as such (the int.Parse method is a good example of this).

Generally speaking, I find it easy to decide what should and shouldn't throw an exception. I generally ask myself questions like:
1 - Is this exceptional,
2 - Is this expected,
3 - Can I continue doing something meaningful at this point and
4 - Is this something I should be made aware of so I can fix it, or at least give it a second look

Perhaps the most important thing to do when throwing exceptions, or dealing with exceptions in general, is to think about the user. The vast majority of users are naive compared to programmers and can easily panic when presented with error messages. Jeff Atwood recently blogged about the importance of crashing responsibly.:
1 - It is not the user's job to tell you about errors in your software!
2 - Don't expose users to the default screen of death.
3 - Have a detailed public record of your application's errors.

It's probably safe to say that Windows' Blue Screen of Death is exactly the type of error message users shouldn't be exposed to (and don't think just because the bar has been set so low that it's ok to be as lazy).

Creating Custom Exceptions

One of the most overlooked aspect of domain driven design are custom exceptions. Exceptions play a serious part of any business domain, so any serious attempt at modeling a business domain in code must include custom exceptions. This is especially true if you believe that exceptions should be used whenever a method fails to do what it says it will. If a workflow state is invalid it makes sense to throw your own custom WorkflowException exception and even attach some specific information to it which might not only help you identify a potential bug, but can also be used to present meaningful information to the user.

Many of the exceptions I create are nothing more than marker exceptions - that is, they extend the base System.Exception class and don't provide further implementation. I liken this to marker interfaces (or marker attributes), such as the INamingContainer interface. These are particularly useful in allowing you to avoid swallowing exceptions. Take the following code as an example. If the Save() method doesn't throw a custom exception when validation fails, we really have little choice but to swallow all exceptions:

try
{
   user.Save();
catch
{
   Error.Text = user.GetErrors();
   Error.Visible = true;
}
//versus
try
{
   user.Save();
}
catch(ValidationException ex)
{
   Error.Text = ex.GetValidationMessage();
   Error.Visible = true;
}

The above example also shows how we can extend exceptions to provide further custom behavior specifically related to our exceptions. This can be as simple as an ErrorCode, to more complex information such as a PermissionException which exposes the user's permission and the missing required permission.

Of course, not all exceptions are tied to the domain. It's common to see more operational-oriented exceptions. If you rely on a web service which returns an error code, you may very wrap that into your own custom exception to halt execution (remember, fail fast) and leverage your logging infrastructure.

Actually creating a custom exception is a two step process. First (and technically this is all you really need) create a class, with a meaningful name, which inherits from System.Exception.

public class UpgradeException : Exception
{
}

You should go the extra step and mark your class with the SerializeAttribute and always provide at least 4 constructors:
1 - public YourException()
2 - public YourException(string message)
3 - public YourException(string message, Exception innerException)
4 - protected YourException(SerializationInfo info, StreamingContext context)

The first three allow your exception to be used in an expected manner. The fourth is used to support serialization incase .NET needs to serialize your exception - which means you should also implement the GetObjectData method. The purpose of support serialization is in the case where you have custom properties, which you'd like to have survive being serialized/deserialize. Here's the complete example:

[Serializable]
public class UpgradeException: Exception
{
  private int _upgradeId;
  public int UpgradeId { get { return _upgradeId; } }

  public UpgradeException(int upgradeId)
  {
    _upgradeId = upgradeId;
  }
  public UpgradeException(int upgradeId, string message, Exception inner) : base(message, innerException)
  {
    _upgradeId = upgradeId;
  }
  public UpgradeException(int upgradeId, string message) : base(message)
  {
    _upgradeId = upgradeId;
  }
  protected UpgradeException(SerializationInfo info, StreamingContext c) : base(info, c)
  {
    if (info != null)
    {
      _upgradeId = i.GetInt32("upgradeId");
    }
  }
  public override void GetObjectData(SerializationInfo i, StreamingContext c)
  {
    if (i != null)
    {
      info.AddValue("upgradeId", _upgradeId);
    }
}

Conclusion

It can take quite a fundamental shift in perspective to appreciate everything exceptions have to offer. Exceptions aren't something to be feared or protected against, but rather vital information about the health of your system. Don't swallow exceptions. Don't catch exceptions unless you can actually handle them. Equally important is to make use of built-in, or your own exceptions when unexpected things happen within your code. You may even expand this pattern for any method that fails to do what it says it will. Finally, exceptions are a part of the business you are modeling. As such, exceptions aren't only useful for operational purposes but should also be part of your overall domain model.

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Sunday, 04 May 2008 20:28

I've made two additions to Part 7. The first is based on a suggestion by Greg to talk about a common cause of memory leaks - events and delegates. The second is about deterministic finalization

Memory Leaks with Events
There's one specific situation worth mentioning as a common cause of memory leaks: events. If, in a class, you register for an event, a reference is created to your class. Unless you de-register from the event your objects lifecycle will ultimately be determined by the event source. In other words, if ClassA (the listener) registers for an event in ClassB (the event source) a reference is created from ClassB to ClassA. Two solutions exists: de-registering from events when you're done (the IDisposable pattern is the ideal solution), or use the WeakEvent Pattern or a simplified version.

 


Deterministic Finalization

Despite the presence of the garbage collector, developers must still take care of managing some of their references. That's because some objects hold on to vital or limited resources, such as file handles or database connections which should be released as soon as possible. This is problematic  since we don't know when the garbage collector will actually run - by nature the garbage collector only runs when memory is in short supply. To compensate, classes which hold on to such resources should make use of the Disposable pattern. All .NET developers are likely familiar with this pattern, along with its actual implementation (the IDisposable interface), so we won't rehash what you already know. With respect to this chapter, it's simply important that you understand the role deterministic finalization takes. It doesn't free the memory used by the object. It releases resources. In the case of database connections for example, it releases the connection back to the pool in order to be reused.

If you forget to call Dispose on an object which implements IDisposable the garbage collector will do it for you (eventually). You shouldn't rely on this behavior however as the problem of limited resources is very real (it's relatively trivial to try it out with a loop that opens connections to a database). You may be wondering why some objects expose both a Close and Dispose method, and which you should call. In all the cases I've seen the two are generally equivalent - so it's really a matter of taste. I would suggest that you take advantage of the using statement and forget about Close. Personally I find it frustrating (and inconsistent) that both are exposed.


Finally, if you're building a class that would benefit from deterministic finalization you'll find that implementing the IDisposable pattern is simple. A straightforward guide is available on MSDN.

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Monday, 28 Apr 2008 00:58

I'm back. Readers can expect a quality free pdf ebook once the series is complete (end of may at the latest hopefully).

Try as they might, modern programming language can't fully abstract fundamental aspects of computer systems. This is made evident by the various exceptions thrown by high level languages. For example, it's safe to assume that you've likely faced the following .NET exceptions: NullReferneceException, OutOfMemoryException, StackOverflowException and ThreadAbortException. As important as it is for developers to embrace various high level patterns and techniques, it's equally important to understand the ecosystem in which your program runs. Looking past the layers provided by the C# (or VB.NET) compiler, the CLR and the operating system, we find memory. All programs make extensive use of system memory and interact with it in marvelous ways, it's difficult to be a good programmer without understanding this fundamental interaction.

Much of the confusion about memory stems from the fact that C# and VB.NET are managed languages and that the CLR provides automatic garbage collection. This has caused many developers to erroneously assume that they need not worry about memory.

Memory Allocation

In .NET, as with most languages, every variable you define is either stored on the stack or in the heap. These are two separate spaces allocated in system memory which serve a distinct, yet complimentary purpose. What goes where is predetermined: value types go on the stack, while all reference types go on the heap. In other words, all the system types, such as char, int, long, byte, enum and any structures (either defined in .NET or defined by you) go on the stack. The only exception to this rule are value types belonging to reference types - for example the Id property of a User class goes on the heap along with the instance of the User class itself.

The Stack

Although we're used to magical garbage collection, values on the stack are automatically managed even in a garbage collectionless world (such as C). That's because whenever you enter a new scope (such as a method or an if statement) values are pushed onto the stack and when you exit the stack the values are popped off. This is why a stack is synonymous with a LIFO - last-in first-out. You can think of it this way: whenever you create a new scope, say a method, a marker is placed on the stack and values are added to it as needed. When you leave that scope, all values are popped off up to and including the method marker. This works with any level of nesting.

Until we look at the interaction between the heap and the stack, the only real way to get in trouble with the stack is with the StackOverflowException. This means that you've used up all the space available on the stack. 99.9% of the time, this indicates an endless recursive call (a function which calls itself ad infinitum). In theory it could be caused by a very very poorly designed system, though I've never seen a non-recursive call use up all the space on the stack.

The Heap

Memory allocation on the heap isn't as straightforward as on the stack. Most heap-based memory allocation occurs whenever we create a new object. The compiler figures out how much memory we'll need (which isn't that difficult, even for objects with nested references), carves up an appropriate chunk of memory and returns a pointer to the allocated memory (more on this in moments). The simplest example is a string, if each character in a string takes up 2 bytes, and we create a new string with the value of "Hello World", then the CLR will need to allocate 22 bytes (11x2) plus whatever overhead is needed.

Speaking of strings, you've no doubt heard that string are immutable - that is, once you've declared a string and assigned it a value, if you modify that string (by changing its value, or concatenating another string onto it), then a new string is created. This can actually have negative performance implications, and so the general recommendation is to use a StringBuilder for any significant string manipulation. The truth though is that any object stored on the heap is immutable, and any changes to the underlying size will require new allocation. The StringBuilder, along with some collections, partially get around this by using internal buffers. Once the buffer fills up though, the same reallocation occurs and some type of growth algorithm is used to determined the new size (the simplest being oldSize * 2). Whenever possible it's a good idea to specify the initial capacity of such objects in order to avoid this type of reallocation (the constructor for both the StringBuilder and the ArrayList (amongst many other collections) allow you to specify an initial capacity).

Garbage collecting the heap is a non-trivial task. Unlike the stack where the last scope can simply be popped off, objects in the heap aren't local to a given scope. Instead, most are deeply nested references of other referenced objects. In languages such as C, whenever a programmer causes memory to be allocated on the heap, he or she must also make sure to remove it from the heap when he's finished with it. In managed languages, the runtime takes care of cleaning up resources (.NET uses a Generational Garbage Collector which is briefly described on Wikipedia).

There are a lot of nasty issues that can sting developers while working with the heap. Memory leaks aren't only possible but very common, memory fragmentation can cause all types of havoc, and various performance issues can arise due to strange allocation behavior or interaction with unmanaged code (which.NET does a lot under the covers).

Pointers

For many developers, learning pointers in school was a painful experience. They represent the very real indirection which exists between code and hardware. Many more developers have never had the experience of learning them - having jumped into programming directly from a language which didn't expose them directly. The truth though is that anyone claiming that C# of Java are pointerless languages is simply wrong. Since pointers are the mechanism by which all languages manage values on the heap, it seems rather silly not to understand how they are used.

Pointers represent the nexus of a system's memory model - that is, pointers are the mechanism by which the stack and the heap work together to provide the memory subsystem required by your program. As we discussed earlier, whenever you instantiate a new object, .NET allocates a chunk of memory on the heap and returns a pointer to the start of this memory block. This is all a pointer is: the starting address for the block of memory containing an object. This address is really nothing more than an unique number, generally represented in hexadecimal format. Therefore, a pointer is nothing more than a unique number that tells .NET where the actual object is in memory. When you assign a reference type to a variable, your variable is actually a pointer to the object. This indirection is transparent in Java or .NET, but not in C or C++ where you can manipulate the memory address directly via pointer arithmetic. In C or C++ you could take a pointer and add 1 to it, hence arbitrarily changing where it points to (and likely crashing your program because of it).

Where it gets interesting is where the pointer is actually stored. They actually follow the same rules outlined above: as integers they are stored on the stack - unless of course they are part of a reference object and then they are on the heap with the rest of their object. It might not be clear yet, but this means that ultimately, all heap objects are rooted on the stack (possibly through numerous levels of references). Let's first look at a simple example.

static void Main(string[] args)
{
  int x = 5;
  string y = "codebetter.com";
}

From the above code, we'll end up with 2 values on the stack, the integer 5 and the pointer to our string, as well as the actual string on the heap. Here's a graphical representation: Stack and Heap Figure 1

When we exit our main function (forget the fact that the program will stop), our stack pops off all local values, meaning both the x and y values are lost. This is significant because the memory allocated on the heap still contains our string, but we've lost all references to it (there's no pointer pointing back to it). In C or C++ this results in a memory leak - without a reference to our heap address we can't free up the memory. In C# or Java, our trusty garbage collector will detect the unreferenced object and free it up.

We'll look at a more complex examples, but asside from having more arrows, it's basically the same.

public class Employee
{
  private int _employeeId;
  private Employee _manager;
  public int EmployeeId
  {
    get { return _employeeId; }
    set { _employeeId = value; }
  }
  public Employee Manager
  {
    get { return _manager; }
    set { _manager = value; }
  }
  public Employee(int employeeId)
  {
    _employeeId = employeeId;
  }
}
public class Test
{
  private Employee _subordinate;
  void DoSomething(
  {
    Employee boss = new Employee(1);
    _subordinate = new Employee(2);
    _subordinate.Manager = _boss;
  }
}
Stack and Heap Figure 2

Interestingly, when we leave our method, the boss variable will pop off the stack, but the subordinate, which is defined in a parent scope, won't. This means the garbage collector won't have anything to clean-up because both heap values will still be referenced (one directly from the stack, and the other indirectly from the stack through a referenced object).

As you can see, pointers most definitely play a significant part in both C# and VB.NET. Since pointer arithmetic isn't available in either language, pointers are greatly simplified and hopefully easily understood.

Memory Model in Practice

We'll now look at the actual impact this has on our applications. Keep in mind though that understanding the memory model in play won't only help you avoid pitfalls, but it will also help you write better applications.

Boxing

Boxing occurs when a value types (stored on the stack) is coerced onto the heap. Unboxing happens when these value types are placed back onto the stack. The simplest way to coerce a value type, such as an integer, onto the heap is by casting it:

int x = 5;
object y = x;

A more common scenario where boxing occurs is when you supply a value type to a method that accepts an object. This was common with collections in .NET 1.x before the introduction of generics. The non-generic collection classes mostly work with the object type, so the following code results in boxing and unboxing:

ArrayList userIds = new ArrayList(2);
userIds.Add(1);
userIds.Add(2);
int firstId = (int)userIds[0];

The real benefit of generics is the increase in type-safety, but they also address the performance penalty associated with boxing. In most cases you wouldn't notice this penalty, but in some situations, such as large collections, you very well could. Regardless of whether or not it's something you ought to actually concern yourself with, boxing is a prime example of how the underlying memory system can have an impact on your application.

ByRef

Without a good understanding of pointers, it's virtually impossible to understand passing a value by reference and by value. Developers generally understand the implication of passing a value type, such as an integer, by reference, but few understand why you'd want to pass a reference by reference. ByRef and ByVal affect reference and value types the same - provided you understand that they always work against the underlying value (which in the case of a reference type means they work against the pointer and not the value). Using ByRef is the only common situation where .NET won't automatically resolve the pointer indirection (passing by reference or as an output parameter isn't allowed in Java).

First we'll look at how ByVal/ByRef affects value types. Given the following code:

public static void Main()
{
  int counter1 = 0;
  SeedCounter(counter1);
  Console.WriteLine(counter1);

  int counter2 = 0;
  SeedCounter(ref counter2);
  Console.WriteLine(counter2);
}
private static void SeedCounter(int counter)
{
  counter = 1;
}
private static void SeedCounter(ref int counter)
{
  counter = 1;
}

We can expect an output of 0 proceeded by 1. The first call does not by pass counter1 by reference, meaning a copy of counter1 is passed into SeedCounter and changes made within are local to the function. In other words, we're taking the value on the stack and duplicating it onto another stack location.

In the second case we're actually passing the value by reference which means no copy is created and changes aren't localized to the SeedCounter function.

The behavior with reference types is the exact same, although it might not appear so at first. We'll look at two examples. The first one uses a PayManagement class to change the properties of an Employee. In the code below we see that we have two employees and in both cases we're given them a $2000 raise. The only difference is that one passes the employee by reference while the other is passed by value. Can you guess the output?

public class Employee
{
  private int _salary;
  public int Salary
  {
    get {return _salary;}
    set {_salary = value;}
  }
  public Employee(int startingSalary)
  {
    _salary = startingSalary;
  }
}
public class PayManagement
{
  public static void GiveRaise(Employee employee, int raise)
  {
    employee.Salary += raise;
  }
  public static void GiveRaise(ref Employee employee, int raise)
  {
    employee.Salary += raise;
  }
}
public static void Main()
{
  Employee employee1 = new Employee(10000);
  PayManagement.GiveRaise(employee1, 2000);
  Console.WriteLine(employee1.Salary);

  Employee employee2 = new Employee(10000);
  PayManagement.GiveRaise(ref employee2, 2000);
  Console.WriteLine(employee2.Salary);
}

In both cases, the output is 12000. At first glance, this seems different than what we just saw with value types. What's happening is that passing a reference type by value does indeed pass a copy of the value, but not the heap value. Instead, we're passing a copy of our pointer. And since a pointer and a copy of the pointer point to the same memory on the heap, a change made by one is reflected in the other.

When you pass a reference type by reference, you're passing the actual pointer as opposed to a copy of the pointer. This begs the question, when would we ever pass a reference type by reference? The only reason to pass by reference is when you want to modify the pointer itself - as in where it points to. This can actually result in nasty side effects - which is why it's a good thing functions wanting to do so must specifically specify that they want the parameter passed by reference. Let's look at our second example.

public class Employee
{
  private int _salary;
  public int Salary
  {
    get {return _salary;}
    set {_salary = value;}
  }
  public Employee(int startingSalary)
  {
    _salary = startingSalary;
  }
}
public class PayManagement
{
  public static void Terminate(Employee employee)
  {
    employee = null;
  }
  public static void Terminate(ref Employee employee)
  {
    employee = null;
  }
}
public static void Main()
{
  Employee employee1 = new Employee(10000);
  PayManagement.Terminate(employee1);
  Console.WriteLine(employee1.Salary);

  Employee employee2 = new Employee(10000);
  PayManagement.Terminate(ref employee2);
  Console.WriteLine(employee2.Salary);
}

Try to figure out what will happen and why. I'll give you a hint: an exception will be thrown. If you guessed that the call to employee1.Salary outputted 10000 while the 2nd one threw a NullReferenceException then you're right. In the first case we're simply setting a copy of the original pointer to null - it has no impact whatsoever on what employee1 is pointing to. In the second case, we aren't passing a copy but the same stack value used by employee2. Thus setting the employee to null is the same as writing employee2 = null;.

It's quite uncommon to want to change the address pointed to by a variable from within a separate method - which is why the only time you're likely to see a reference type passed by value is when you want to return multiple values from a function call (in which case you're better off using an out parameter, or using a purer OO approach). The above example truly highlights the dangers of playing in an environment whose rules aren't fully understood.

Managed Memory Leaks

We already saw an example of what a memory leak looks like in C. Basically, if C# didn't have a garbage collector, the following code would leak:

private void DoSomething()
{
  string name = "dune";
}

Our stack value (a pointer) will be popped off, and with it will go the only way we have to reference the memory created to hold our string. Leaving us with no method of freeing it up. This isn't a problem in .NET because it does have a garbage collector which tracks unreferenced memory and frees it. However, a type of memory leak is still possible if you hold on to references indefinitely. This is common in large applications with deeply nested references. They can be hard to identify because the leak might be very small and your application might not run for long enough - even ASP.NET applications tend to be recycled fairly often.

Ultimately when your program terminates the operating system will reclaim all memory, leaked or otherwise. However, if you start seeing OutOfMemoryException and aren't dealing with abnormally large data, there's a good chance you have a memory leak. .NET ships with tools to help you out, but you'll likely want to take advantage of a commercial memory profiler such as dotTrace or ANTS Profiler. When hunting for memory leaks you'll be looking for your leaked object (which is pretty easy to find by taking 2 snapshots of your memory and comparing them), tracing through all the objects which still hold a reference to it and correcting the issue.

Fragmentation

Another common cause for OutOfMemoryException has to do with memory fragmentation. When memory is allocated on the heap it's always a continuous block. This means that the available memory must be scanned for a large enough chunk. As your program runs its course, the heap becomes increasingly fragmented (like your hard drive) and you might end up with plenty of space, but spread out in a manner which makes it unusable. Under normal circumstances, the garbage collector will compact the heap as it's freeing memory. As it compacts memory, addresses of objects change, and .NET makes sure to update all your references accordingly. Sometimes though, .NET can't move an object: namely when the object is pinned to a specific memory address.

Pinning

Pinning occurs when an object is locked to a specific address on the heap. Pinned memory cannot be compacted by the garbage collector resulting in fragmentation. Why do values get pinned? The most common cause is because your code is interacting with unmanaged code. When the .NET garbage collector compacts the heap, it updates all references in managed code, but it has no way to jump into unmanaged code and do the same. Therefore, before interoping it must first pin objects in memory. Since many methods within the .NET framework rely on unmanaged code, pinning can happen without you knowing about it (the scenario I'm most familiar with are the .NET Socket classes which rely on unmanaged implementations and pin buffers).

A common way around this type of pinning is to declare large objects which don't cause as much fragmentation as many small ones (this is even more true considering large objects are placed in a special heap (called the Large Object Heap (LOH) which isn't compacted at all)). For example, rather than creating hundreds of 4KB buffers, you can create 1 large buffer and assign chunks of it yourself. For an example as well as more information on pinning, I suggest you read Greg Young's advanced post on pinning and asynchronous sockets.

There's a second reason why an object might be pinned - when you explicitly make it happen. In C# (not in VB.NET) if you compile your assembly with the unsafe option you can pin an object via the fixed statement. While extensive pinning can cause memory pressures on the system, judicial use of the fixed statement can greatly improve performance. Why? Because a pinned object can be manipulated directly with pointer arithmetic - this isn't possible if the object isn't pinned because the garbage collector might reallocate your object somewhere else in memory.

Take for example this efficient ASCII string to integer conversion which runs over 6 times faster than using int.Parse.

public unsafe static int Parse(string stringToConvert)
{
  int value = 0;
  int length = stringToConvert.Length;
  fixed(char* characters = stringToConvert)
  {
    for (int i = 0; i &lgt; length; ++i)
    {
      value = 10 * value + (characters[ i ] - 48);
    
    }
  }
  return value;
}

Unless you're doing something abnormal, there should never be a need to mark your assembly as unsafe and take advantage of the fixed statement. The above code will easily crash (pass null as the string and see what happens), isn't nearly as feature rich as int.Parse, and in the scale of things is extremely risky while providing no benefits.

Setting things to null

So, should you set your reference types to null when you're done with them? Of course not. Once a variable falls out of scope, it's popped of the stack and the reference is removed. If you can't wait for the scope to exit, you likely need to refactor your code.

Conclusion

Stacks, heaps and pointers can seem overwhelming at first. Within the context of managed languages though, there isn't really much to it. The benefits of understanding these concepts are tangible in day to day programming, and invaluable when unexpected behavior occurs. You can either be the programmer who causes weird NullReferenceExceptions and OutOfMemoryExceptions, or the one that fixes them.

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Thursday, 03 Jan 2008 01:12

UPDATED: There's an official free ebook now available here.

 

Tim Barcz was kind enough to compile the the foundation series into a single PDF, for your sharing/printer pleasure.

You can grab it here 

I'll be taking a short break from blogging for the next couple weeks, so have fun playing with the new toys.
 

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 02 Jan 2008 13:52

In part 3 we took our first stab at bridging the data and object world by hand-writing our own data access layer and mapper. The approach turned out to be rather limited and required quite a bit of repetitive code (although it was useful in demonstrating the basics). Adding more object and more functionality would bloat our DAL into an enormously unmaintainable violation of DRY (don't repeat yourself). In this section we'll look at an actual O/R Mapping framework to do all the heavy lifting for us. Specifically, we'll look at the popular open-source NHibernate framework (http://www.hibernate.org/343.html).

The single greatest barrier preventing people from adopting domain driven design is the issue of persistence. My own adoption of O/R mappers came with great trepidation and doubt. You'll essentially be asked to trade in your knowledge of a tried and true method for something that seems a little too magical. A leap of faith may be required.

The first thing to come to terms with is that O/R mappers generate your SQL for you. I know, it sounds like it's going to be slow, insecure and inflexible, especially since you probably figured that it'll have to use inline SQL. But if you can push those fears out of your mind for a second, you have to admit that it could save you a lot of time and result in a lot less bugs. Remember, we want to focus on building behavior, not worry about plumbing (and if it makes you feel any better, a good O/R mapper will provide simple ways for you to circumvent the automated code generation and execute your own SQL or stored procedures).

Infamous Inline SQL vs Stored Procedure Debate
Over the years, there's been some debate between inline SQL and stored procedures. This debate has been very poorly worded, because when people hear inline SQL, they think of badly written code like:

public int GetUserIdByCredentials(string userName, string password)
{
   string sql = @"SELECT UserId FROM Users 
                  WHERE UserName = '" + userName + "' AND Password = '" + password + "'";
   using (SqlCommand command = new SqlCommand(sql))
   {
      //todo
      return 0;
   }         
}
If you stop and think about it though, and compares apples to apples, I think you'll agree that neither is particularly better than the other. Let me help you out.

Stored Procedures are more Secure
Inline SQL should be written using parameterized queries just like you do with stored procedures. For example, the correct way to write the above code in order to eliminate the possibility of an SQL injection attack is:

public int GetUserIdByCredentials(string userName, string password)
{
   string sql = @"SELECT UserId FROM Users 
                  WHERE UserName = @UserName AND Password = @Password";
   using (SqlCommand command = new SqlCommand(sql))
   {
      command.Parameters.Add("@UserName", SqlDbType.VarChar, 64).Value = userName;
      command.Parameters.Add("@Password", SqlDbType.VarChar, 64).Value = password;
      //todo
      return 0;
   }         
}

Stored procedures provide an abstraction to the underlying schema
Whether you're using inline SQL or stored procedures, what little abstraction you can put in a SELECT statement is the same. If any substantial changes are made, your stored procedures are going to break and there's a good chance you'll need to change the calling code to deal with the issue. O/R Mappers on the other side, generally provide much better abstraction.

If I make a change, I don't have to recompile the code
Somewhere, somehow, people got it in their head that code compilations should be avoided at all cost (maybe this comes from the days where projects could take hours to compile). If you change a stored procedure, you still have to re-run your unit and integration tests and deploy a change to production. It genuinely scares and puzzles me that developers consider a change to a stored procedure or XML trivial compared to a similar change in code.

Stored Procedures reduce network traffic
Who cares? In most cases your database is sitting on a GigE connection with your servers and you aren't paying for that bandwidth. You're literally talking fractions of milliseconds. On top of that, a well configured O/R mapper can save round-trips via identify map implementations, caching and lazy loading.

Stored procedures are faster
This is the excuse I held onto the longest. Write a reasonable/common SQL statement inline and then write the same thing in a stored procedure and time them. Go ahead. In most cases there's little or no difference. In some cases, stored procedures will be slower because a cached execution plan will not be efficient given a certain parameter. Jeff Atwood called using stored procedures for the sake of better performance a fairly extreme case of premature optimization. He's right. The proper approach is to take the simplest possible approach (let a tool generate your SQL for you), and optimize specific queries when/if bottlenecks are identified.

It took a while, but after a couple years, I realized that the debate between inline and stored procedures was as trivial and meaningless as the one about C# and VB.NET. Of course, since the differences are practically non-existing; why not just use stored procedures? If you aren't willing to adopt an O/R mapper, that's certainly what I would suggest – there's no sense in dynamically creating your own inline SQL. However, O/R mappers, which rely on inline SQL, provide three very important benefits (there are more, but with respect to maintainability, I think these are the most important:

  1. You end up writing a lot less code – which obviously results in a more maintainable system,
  2. You gain a true level of abstraction from the underlying data source – both because you're querying the O/R mapper for your data directly (and it converts that into the appropriate SQL), and because you're providing mapping information between your table schemas and domain objects,
  3. If your impedance mismatch is low, they save you from having to write a lot of repetitive code; however, if your impedance mismatch is high, you'll be able to design your database the way it should be, and your domain layer the way it should be, without having to create an uncomfortable compromise – the O/R mapper will handle the mismatch for you.

In the end, this really comes down to building the simplest solution upfront. After a few iterations, you can spend time profiling your code, and only if you detect an actual problem do you have to address that specific case. It might not sound so much simpler because you have to learn a fairly complex framework upfront, but that's the reality of our profession.

Remember, our goal is to widen our knowledge base by looking at different ways to build systems in order to provide our clients with greater value. While we may be specifically talking about NHibernate, the goal is really to introduce to concept of O/R mappers, and try to correct the blind faith .NET developers have put into stored procedures and ADO.NET.

NHibernate
Of the frameworks and tools we've looked at so far, NHibernate is the most complex. This complexity is certainly something you should take into account when deciding on a persistence solution, but once you do find a project that allows for some R&D time, the payoff will be well worth it in future projects. The nicest thing about NHibernate, and a major design goal of the framework, is that it's completely transparent – your domain objects aren't forced to inherit a specific base class and you don't have to use a bunch of decorator attributes. This makes unit testing your domain layer possible – if you're using a different persistent mechanism, say typed datasets, the tight coupling between domain and data makes it hard/impossible to properly unit test.

At a very high level, you configure NHibernate by telling it how your database (tables and columns) map to your domain objects, use the NHibernate API and NHibernate Query Language to talk to your database, and let it do the low level ADO.NET and SQL work.

In previous parts we focused on a system for a car dealership – specifically focusing on cars and upgrades. In this part we'll change perspective slightly and look at car sales (sales, models and sales people).The domain model is simple – a SalesPerson has zero or more Sales which reference a specific Model.

I've also included a VS.NET solution that contains sample code and annotations – you can find a link at the end of this article. All you need to do to get it running is create a new database, execute the provide SQL script (a handful of create tables), and configure the connection string. The sample, along with the rest of this article, is meant to help you get started with NHibernate – a topic too often overlooked.

You might also be interested in the excellent NHibernate Reference Manual as well as Manning's NHibernate in Action book.

Configuration
The secret to NHibernate's amazing flexibility lies in its configurability. Initially it can be rather daunting to set it up, but after a coupe project it becomes rather natural. The first step is to configure the NHibernate itself. The simplest such configuration, which must be added to your app.config or web.config, looks like:

<?xml version="1.0" encoding="utf-8" ?> 
<configuration>
  <configSections>
    <section name="hibernate-configuration" type="NHibernate.Cfg.ConfigurationSectionHandler, NHibernate" /> 
  </configSections>
  <hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
    <session-factory>
      <property name="hibernate.dialect">NHibernate.Dialect.MsSql2005Dialect</property> 
      <property name="hibernate.connection.provider">NHibernate.Connection.DriverConnectionProvider</property> 
      <property name="hibernate.connection.connection_string">Server=SERVER;Initial Catalog=DATABASE;User Id=USER;Password=PASSWORD;</property> 
      <mapping assembly="CodeBetter.Foundations" /> 
    </session-factory>
  </hibernate-configuration>
</configuration>

Of the four values, dialect is the most interesting. This tells NHibernate what specific language our database speaks. If, later on, we ask NHibernate to return a paged result of Cars and our dialect is set to SQL Server 2005, NHibernate will issue an SQL SELECT utilizing the ROW_NUMBER() ranking function. However, if the dialect is set to MySQL, NHibernate will issue a SELECT with a LIMIT. In most cases, you'll set this once and forget about it, but it does provide some insight into the capabilities provide by a layer that generates all of your data access code.

In our configuration, we also told NHibernate that our mapping files were located in the CodeBeter.Foundations assembly. Mapping files are embedded XML files which tell NHibernate how each class is persisted. With this information, NHibernate is capable of returning a Car object when you ask for one, as well as saving it. The general convention is to have a mapping file per domain object, and for them to be placed inside a Mappings folder. The mapping file for our Model object, name Model.hbm.xml, looks like:

<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" assembly="CodeBetter.Foundations" namespace="CodeBetter.Foundations">
  <class name="Model" table="Models" lazy="true" proxy="Model">
    <id name="Id" column="Id" type="int" access="field.lowercase-underscore">
      <generator class="native" /> 
    </id>
    <property name="Name" column="Name" type="string" not-null="true" length="64" /> 
    <property name="Description" column="Description" type="string" not-null="true" /> 
    <property name="Price" column="Price" type="double" not-null="true" /> 
  </class>
</hibernate-mapping>

(it's important to make sure the "Build Action" for all mapping files is set to "Embedded Resources")

This file tells NHibernate that the Model class maps to rows in the Models table, and that the 4 properties Id, Name, Description and Price map to the Id, Name, Description and Price columns. The extra information around the Id property specifies that the value is generated by the database (as opposed to NHibernate itself (for clustered solutions), or our own algorithm) and that there's no setter, so it should be accessed by the field with the specified naming convention (we supplied Id as the name, and lowercase-underscore as the naming strategy, so it'll use a field named _id).

With the mapping file set up, we can start interacting with the database:

//Let's add a new car model
Model model = new Model();
model.Name = "Hummbee";
model.Description = "Great handling, built-in GPS to always find your way back home, Hummbee2Hummbe(tm) communication";
model.Price = 50000.00;         
ISession session = _sessionFactory.OpenSession();
session.Save(model);


//Let's discount the x149
Model model = session.CreateQuery("from Model model where model.Name = ?").SetString(0, "X149").UniqueResult<Model>();
model.Price -= 5000;
ISession session = _sessionFactory.OpenSession();
session.Update(model); 

The above example shows how easy it is to persist new objects to the database, retrieve them and update them – all without any ADO.NET or SQL.

You may be wondering where the _sessionFactory object comes from, and exactly what an ISession is. The _sessionFactory (of type ISessionFactory) is a global thread-safe object that you'd likely create on application start. You'll typically need one per database your application is using (which means you'll typically only need one), and its job, like most factories, is to create a preconfigured object: an ISession. The ISession has no ADO.NET equivalent, but it does map loosely to a database connection. However, creating an ISession doesn't necessarily open up a connection. Instead, ISessions smartly manage connections and command objects for you. Unlike connections which should be opened late and closed early, you needn't worry about having ISessions stick around for a while (although they aren't thread-safe). If you're building an ASP.NET application, you could safely open an ISession on BeginRequest and close it on EndRequest (or better yet, lazy-load it in case the specific request doesn't require an ISession).

ITransaction is another piece of the puzzle which is created by calling BeginTransaction on an ISession. It's common for .NET developers to ignore the need for transactions within their applications. This is unfortunate because it can lead to unstable and even unrecoverable states in the data. An ITransaction is used to keep track of the unit of work – tracking what's changed, been added or deleted, figuring out what and how to commit to the database, and providing the capability to rollback should an individual step fail.

Relationships
In our system, it's important that we track sales – specifically with respect to sales people, so that we can provide some basic reports. We're told that a sale can only ever belong to a single sales person, and thus set up a one to many relationship – that is, a sales person can have multiple sales, and a sales can only belong to a single sales person. In our database, this relationship is represented as a SalesPersonId column in the Sales table (a foreign key). In our domain, the SalesPerson class has a Sales collection and the Sales class has a SalesPerson property (references).

Both ends of the relationship needs to be setup in the appropriate mapping file. On the Sales end, which maps a single property, we use a glorified property element called many-to-one:

...
<many-to-one name="SalesPerson" class="SalesPerson" column="SalesPersonId" not-null="true"/>
...

We're specifying the name of the property, the type/class, and the foreign key column name. We're also specifying an extra constraint, that is, when we add a new Sales object, the SalesPerson property can't be null.

The other side of the relationship, the collection of sales a sales person has, is slightly more complicated – namely because NHibernate's terminology isn't standard .NET lingo. To set up a collection we use a set, list, map, bag or array element. Your first inclination might be to use list, but NHibernate requires that you have a column that specifies the index. In other words, the NHibernate team sees a list as a collection where the index is important, and thus must be specified. What most .NET developers think of as a list, NHibernate calls a bag. Confusingly, whether you use a list or a bag element, your domain type must be an IList (or its generic IList equivalent). This is because .NET doesn't have an IBag object. In short, for your every day collection, you use the bag element and make your property type an IList.

The other interesting collection option is the set. A set is a collection that cannot contain duplicates – a common scenario for enterprise application (although it is rarely explicitly stated). Oddly, .NET doesn't have a set collection, so NHibernate uses the Iesi.Collection.ISet interface. There are four specific implementations, the ListSet which is really fast for very small collections (10 or less items), the SortedSet which can be sorted, the HashedSet which is fast for larger collections and the HybridSet which initially uses a ListSet and automatically switches itself to a HashedSet as your collection grows.

For our system we'll use a bag (even though we can't have duplicate sales, it's just a little more straightforward right now), so we declare our Sales collection as an IList:

private IList<Sale> _sales;
public IList<Sale> Sales
{
   get { return _sales;}
}
And add our element to the SalesPerson mapping file:
...
<bag name="Sales" access="field.lowercase-underscore" table="Sales" inverse="true" cascade="all">
   <key column="SalesPersonId" />
   <one-to-many class="Sale" />
</bag>
...
Again, if you look at each element/attribute, it isn't as complicated as it first might seem. We identify the name of our property, specify the access strategy (we don' t have a setter, so tell it to use the field with our naming convention), the table and column holding the foreign key, and the type/class of the items in the collection.

We've also set the cascade attribute to all which means that when we call Update on a sales person, any changes made to his or her sales collection (additions, removals, changes to existing sales) will automatically be persisted. Cascading can be a real time saver as your system grows in complexity.

Querying
NHibernate supports two different querying approaches: Hibernate Query Language (HQL) and Criteria Queries (you can also query in actual SQL, but lose portability when doing so). HQL is the easier of two as it looks a lot like SQL – you use from, where, aggregates, order by, group by, etc. However, rather than querying against your tables, you write queries against your domain – which means HQL supports OO principles like inheritance and polymorphism. Either query methods are abstractions on top of SQL, which means you get total portability – all you need to do to target a different database is change your dialect configuration.

HQL works off of the IQuery interface, which is created by calling CreateQuery on your session. With the IQuery you can return individual entities, collections, substitute parameters and more. Here are some example:

string lastName = "allen";
ISession session = _sessionFactory.OpenSession();

//retrieve a salesperson by last name
IQuery query = session.CreateQuery("from SalesPerson p where p.LastName = 'allen'");
SalesPerson p = query.UniqueResult<SalesPerson>();

//same as above but in 1 line, and with the last name as a variable
SalesPerson p = session.CreateQuery("from SalesPerson p where p.LastName = ?").SetString(0, lastName).UniqueResult<SalesPerson>();

//people with few sales         
IList<SalesPerson> slackers = session.CreateQuery("from SalesPerson person where size(person.Sales) < 5").List<SalesPerson>();
This is just a subset of what can be accomplished with HQL (the downloadable sample has slightly more complicated examples).

Lazy Loading
When we load a sales person, say by doing: SalesPerson person = session.Get(1); the Sales collection won't be loaded. That's because, by default, collections are lazily loaded. That is, we won't hit the database until the information is specifically requested (i.e., we access the Sales property). We can override the behavior by setting lazy="false" on the bag element.

The other, more interesting, lazy load strategy implemented by NHibernate is on entities themselves. You'll often want to add a reference to an object without having to load the actual object from the database. For example, when we add a sales to a sales person, we need to specify the model, but don't want to load all the model information – all we really want to do is get the Id so we can store it in the ModelId column of the Sales table. When you use session.Load(id) NHibernate will load a proxy of the actual object (unless you specify lazy="false" in the class element). As far as you're concerned, the proxy behaves exactly like the actual object, but none of the data will be retrieved from the database until the first time you ask for it. This makes it possible to write the following code:

Sale sale = new Sale(session.Load<Model>(1), DateTime.Now, 46000.00);
salesPerson.AddSales(sale);
session.SaveOrUpdate(salesPerson);
without ever having to actually hit the database to load the model.

Download
I've included a download which'll hopefully provide a base for you to start playing with NHibernate. The code is well documented - take special care to read the annotations withint he mapping files. To get it running:

  1. Create a new database and run the CREATE TABLE commands located in CREATE_TABLES.sql,
  2. Modify the hibernate.connection.connection_string property within the app.config so that it can connect to your newly created database
Once configured, take a look at the Run method within Sample.cs and walk through each call one at a time.

Download Project

Conclusion
We've only touched the tip of what you can do with NHibernate. We haven't looked at its Criteria Queries (which is a query API tied even closer to your domain than HQL), its caching capabilities, filtering of collections, performance optimizations, logging, or native SQL abilities. Beyond NHibernate the tool, hopefully you've learnt more about object relational mapping, and alternative solutions to the limited toolset baked into .NET. It is hard to let go of hand written SQL statement, but looking beyond the bias of what's comfortable, it's impossible to rationalize doing all that work upfront.

Author: "karl" Tags: "Featured, Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Friday, 28 Dec 2007 15:12

You may be wondering what happened to part 6. Well, it's still being worked on and should be available early next week.

I wasn't sure if there would be a part 7 and if so what it would be about - but I was heavily considering writing about an ActiveRecord implementation. Turns out that Kent Sharkey, beat me to it with his overview of Subsonic on DotNetSlackers.

I want to spend just a couple paragraphs trying to tie in Kent's article with the Foundations series.

In Part 3 we talked about persistence and manually wrote our data access layer and mapping code. As I said at the time, doing that manually is fine for simple cases with few domain objects and straightforward mapping, but can quickly get out of hand. To solve the problem and keep us efficient, we can leverage existing O/R mappers to do all the mundane work. Some O/R mappers, like NHibernate which we'll look at in Part 6, are unbelievably flexible and can be used to address very large systems (as systems grow in size, it's common for their impedance mismatch to grow as well, which makes a flexible tool all the more important). Of course, NHibernate isn't the simplest thing to configure and it has a relatively steep learning curve.

An alternative approach is to use the ActiveRecord (AR) design pattern. AR is still considered an O/R mapper, but it's specifically targeted at more straightforward applications (which likely represent 95% of the work we're all doing). AR is one of the two components that has made Ruby on Rails so popular and productive (the other is their first class MCV pattern implementation). There are two popular .NET implementations that I'm aware of: Subsonic and Castle ActiveRecord. Both the Castle and Subsonic line of products are open source, and come with a full range of products (MVC, ActiveRecord, Scaffolding, utility functions, etc) that integrate nicely with each other. Interestingly, Castle is implemented on top of NHibernate - which gives you a hit about just how flexible NHibernate really is.

With Castle ActiveRecord, you take your domain object and decorate it with attributes. You add the ActiveRecordAttribute to your class, and the PropertyAttribute to your properties. So, if you have a Car class with a Name property which are properly set up, Castle will automatically map to a table named Car and a column named Name.

It turns out that ActiveRecord implementations can be setup within minutes and handle 99% of all your data access needs. They can make you unbelievably productive by providing functionality and performance above that of DataSets, while maintaining your rich domain layer and allow you to easily write unit tests.

It's still worthwhile to learn a more complex framework like NHibernate for a couple reasons. First, once you're used to it, it too can be set up quickly and efficiently (avoiding upfront learning is a losing strategy). Secondly, NHibernate is a lot closer to important fundamental concepts. With AR implementations, a lot of the details are taken care of for you - which is great in most cases, but doesn't teach you nearly as much. If you spend some time with NHibernate, you'll learn about first and second level caching, identity map, concurrency and locking and various other concepts relating to persistence.

Enjoy Kent's article, and I hope take the time to download and play with either Subsonic or Castle.

Happy holidays to all our CB readers and your families. 

 

 

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Thursday, 20 Dec 2007 13:22

Throughout this series we've talked about the importance of testability and have looked at techniques to make it easier to test our system. It goes without saying that a major benefit of writing tests for our system is the ability to deliver a better product to our client. Although this is true for unit tests as well, the main reason I write unit tests is that nothing comes close to improving the maintainability of a system as much as a properly developed suite of unit tests. You'll often hear unit test advocates speak of how much confidence unit tests give them – and that's what it's really all about. On a project I'm currently working on, we're continuously making changes and tweaks to improve the system (functional improvements, performance, refactorings, you name it). Being that it's a fairly large system, we're sometimes asked to make a change that flat out scares us. Is it doable? Will it have some weird side effect? What bugs will be introduced? Without unit tests, we'd likely refuse to make the higher risk changes. But we know, and our client knows, that it's the high risk changes that have the most potential for success. It turns out that having 700+ unit tests which run within a couple minutes lets us rip components apart, reorganize code, and build features we never thought about a year ago, without worrying too much about it. Because we are confident in the completeness of the unit tests, we know that we aren't likely to introduce bugs into our production environment – our changes might still cause bugs but we'll know about them right away.

Unit tests aren't only about mitigating high-risk changes. In my programming life, I've been responsible for major bugs caused from seemingly low-risk changes as well. The point is that I can make a fundamental or minor change to our system, right click the solution, select "Run Tests" and within 2 minutes know where we stand.

Why wasn't I unit testing 3 years ago?
For those of us who've discovered the joy of unit testing, it's hard to understand why everyone isn't doing it. For those who haven't adopted it, you probably wish we'd shut up about it already. For many years I'd read blogs and speak to colleagues who were really into unit testing, but didn't practice myself. Looking back, here's why it took me a while to get on the bandwagon:

  1. I had misconception about the goals of unit testing. As I've already said, unit testing does improve the quality of a system, but it's really all about making it easier to change / maintain the system later on. Furthermore, if you go to the next logical step and adopt Test Driven Development, unit testing really becomes about design. To paraphrase Scott Bellware, TDD isn't about testing because you're not thinking as a tester when doing TDD – you're thinking as a designer.
  2. Like many, I used to think developers shouldn't write tests! I don't know the history behind this belief, but I now think this is just an excuse for lazy programmers. Testing is the process of both finding bugs in a system as well as validating that it works as expected. Maybe developers aren't good at finding bugs in their own code, but they are the best suited to make sure it works the way they intended it to (and clients are best suited to test that it works like it should (if you're interested to find out more about that, I suggest you research Acceptance Testing and FitNess (http://fitnesse.org/)). Even though unit testing isn't all that much about testing, developers who don't believe they should test their own code simply aren't accountable.
  3. Testing isn't fun. Sitting in front of a monitor, inputting data and making sure everything's ok sucks. But unit testing is coding, which means there are a lot of challenges and metrics to gauge your success. Sometimes, like coding, it's a little mundane, but all in all it's no different than the other programming you do every day.
  4. It takes time. Advocates will tell you that unit testing doesn't take time, it SAVES time. This is true in that the time you spend writing unit tests is likely to be small compared to the time you save on change requests and bug fixes. That's a little too Ivory Tower for me. In all honesty, unit testing DOES take a lot of time (especially when you just start out). You may very well not have enough time to unit test or you client might not feel the upfront cost is justified. In these situations I suggest you identifier the most critical code and test it as thoroughly as possible – even a couple hours spent writing unit tests can have a big impact.

Ultimately, unit testing seemed like a complicated and mysterious thing that was only used in edge cases. The benefits seemed unattainable and timelines didn't seem to allow for it anyways. It turns out it took a lot of practice (I had a hard time learning what to unit test and how to go about it), but the benefits were almost immediately noticeable.

The Tools

With StructureMap already in place from the last part, we need only add 2 frameworks and 1 tool to our unit testing framework: nUnit, RhinoMocks and TestDriven.NET.

TestDriven.NET is an addon for Visual Studio that basically adds a "Run Test" option to our context (right-click) menu. We won't spend any time talking about it. The personal license of TestDriven.NET is only valid for open source and trial users. Don't worry too much if the licensing doesn't suite you, nUnit has its own test runner tool, just not integrated in VS.NET. (http://www.testdriven.net/) (Resharper users can also use its built-in functionality).

nUnit is the testing framework we'll actually use. There are alternatives, such as mbUnit, but I don't know nearly as much about them as I ought to. (http://www.nunit.org/)

RhinoMocks is the mocking framework we'll use. In the last part we manually created our mock – which was both rather limited and time consuming. RhinoMocks will automatically generate a mock class from an interface and allow us to verify and control the interaction with it. (http://www.ayende.com/projects/rhino-mocks.aspx)

nUnit
The first thing to do is to add a reference to the nunit.framework.dll and the Rhino.Mocks.dll. My own preference is to put my unit tests into their own assembly. For example, if my domain layer was located in CodeBetter.Foundations, I'd likely create a new assembly called CodeBetter.Foundations.Tests. This does mean that we won't be able to test private methods (more on this shortly). In .NET 2.0+ we can use the InternalsVisibleToAttribute to allow the Test assembly access to our internal method (we'd open Properties/AssemblyInfo.cs and add [assembly: InternalsVisibleTo("CodeBetter.Foundations.Tests")] – which is something I typically do.

There are two things you need to know about nUnit. First, you configure your tests via the use of attributes. The TestFixtureAttribute is applied to the class that contains your tests, setup and teardown methods. The SetupAttribute is applied to the method you want to have executed before each test – you won't always need this. Similarly, the TearDownAttribute is applied to the method you want executed after each test. Finally, the TestAttribute is applied to your actual unit tests. (There are other attributes, but these 4 are the most important). This is what it might look like:

 
using NUnit.Framework;

[TestFixture]
public class CarTests
{
   [SetUp]
   public void SetUp()
   {
      //todo
   }
   [TearDown]
   public void TearDown()
   {
      //todo
   }
   [Test]
   public void SaveThrowsExceptionWhenInvalid() 
   {
      //todo
   }
   [Test]
   public void SaveCallsDataAccessAndSetsId()
   {
      //todo
   }
   //more tests
}

Notice that each unit test has a very explicit name – it's important to state exactly what the test is going to do, and since your test should never do too much, you'll rarely have obscenely long names.

The second thing to know about nUnit is that you confirm that your test executed as expected via the use of the Assert class and its many methods. I know this is lame, but if we had a method that took a param int[] numbers and returned the sum, our unit test would look like:

[TestFixture]
public class MathUtilityTester
{
  [Test]
  public void MathUtilityReturnsZeroWhenNoParameters()
  {
     Assert.AreEqual(0, MathUtility.Add());
  }
  [Test]
  public void MathUtilityReturnsValueWhenPassedOneValue()
  {
     Assert.AreEqual(10, MathUtility.Add(10));
  }
  [Test]
  public void MathUtilityReturnsValueWhenPassedMultipleValues()
  {
     Assert.AreEqual(29, MathUtility.Add(10,2,17));
  }
  [Test]
  public void MathUtilityWrapsOnOverflow()
  {
     Assert.AreEqual(-2, MathUtility.Add(int.MaxValue, int.MaxValue));
  }
}

You wouldn't know it from the above example, but the Assert class has more than one function, such as Assert.IsFalse, Assert.IsTrue, Assert.IsNull, Assert.IsNotNull, Assert.AreSame, Assert.AreNotEqual, Assert.Greater, Assert.IsInstanceOfType and so on.

What is a Unit Test
Unit tests are methods that test behavior at a very granular level. Developers new to unit testing often let the scope of their tests grow. Most unit tests follow the same pattern: execute some code from your system and assert that it worked as expected. The goal of a unit test is to validate a specific behavior. You might have noticed that the above two examples use multiple unit tests on the same function. We don't want to write an all encompassing test for Save, but rather want to write a test for each of the behavior it contains – failing when the object is in an invalid state, calling our data access's Save method and setting the id, and calling the data access's Update method. It's important that our unit test pinpoint a failure as best possible.

I'm sure some of you will find the 4 tests used to cover the MathUtility.Add method a little excessive. You may think that all 4 tests could be grouped into the same one – and in this minor case I'd say whatever you prefer. However, when I started unit testing, I fell into the bad habit of letting the scope of my unit tests grow. I'd have my test which created an object, executed some of its members and asserted the functionality. But I'd always end up saying, well as long as I'm here, I might as well throw in a couple extra asserts to make sure these fields are set the way they ought to be. This is very dangerous because a change in your code could break numerous unrelated tests – definitely a sign that you've given your tests too little focus.

This brings us back to the topic about testing private methods. If you google you'll find a number of discussions on the topic, but the general consensus seems to be that you shouldn't test private methods. I think the most compelling reason not to test private methods is that our goal is not to test methods or lines of code, but rather to test behavior. This is something you must always remember. If you thoroughly test your code's public interface, then private methods should automatically get tested. Another argument against testing private methods is that it breaks encapsulation. We talked about the importance of information hiding already. Private methods contain implementation detail that we want to be able to change without breaking calling code. If we test private methods directly, implementation changes will likely break our tests, which doesn't bode well for higher maintainability. Here's a question for you: should we care that a change to a private method broke a test?

...

...

Hopefully you're starting to get my drift, the answer is NO. What we care about is whether the behavior/functionality is broken, which our tests against the public API will do.

Mocking
To get started, it's a good idea to test simple pieces of functionality. Before long though, you'll want to test a method that has a dependency on an outside component – such as the database. For example, you might want to complete your test coverage of the Car class by testing the Save method. Since we want to keep our tests as granular as possible (and as light as possible – tests should be quick to run so we can execute them often and get instant feedback) we really don't want to figure out how we'll set up a test database with fake data and make sure it's kept in a predictable state from test to test. In keeping with this spirit, all we want to do is make sure that Save interacts property with the DAL. Later on we can unit test the DAL on its own. If Save works as expected and the DAL works as expected and they interact properly with each other, we have a good base to move to more traditional testing.

In the previous part we saw the beginnings of testing with mocks. We were using a manually created mock class which had some pretty major limitations. The most significant of which was our inability to confirm that calls to our mock objects were occurring as expected. That, along with ease of use, is exactly the problem RhinoMock is meant to solve. Using RhinoMock couldn't be simpler, tell it what you want to mock (an interface or a class – preferably an interface), tell it what method(s) you expect to be called, along with the parameters, execute the call, and have it verify that your expectations were met.

Before we can get started, we need to give RhinoMock access to our internal types. This is quickly achieved by adding [assembly: InternalsVisibleTo("DynamicProxyGenAssembly2")] to our Properties/AssemblyInfo.cs file.

Now we can start coding by writing a test to cover the update path of our Save method:

using NUnit.Framework;

[TestFixture]
public class CarTest
{
  [Test]   
  public void SaveCarCallsUpdateWhenAlreadyExistingCar()
  {     
     MockRepository mocks  = new MockRepository();
     IDataAccess dataAccess = mocks.CreateMock<IDataAccess>();
     ObjectFactory.InjectStub(typeof(IDataAccess), dataAccess);

     Car car = new Car();
     dataAccess.Update(car);
     mocks.ReplayAll();

     car.Id = 32;
     car.Save();

     mocks.VerifyAll();
     ObjectFactory.ResetDefaults();
  }
}

Once a mock object is created, which took 1 line of code to do, we inject it into our dependency injection framework (StructureMap in this case). When a mock is created, it enters record-mode, which means any subsequent operations against it, such as the call to dataAccess.UpdateCar(car), is recorded by RhinoMock. We exit record-mode by calling ReplayAll, which means we are now ready to execute our real code and have it verified against the recorded sequence. When we then call VerifyAll after having called Save on our Car object, RhinoMock will make sure that our actual call behaved the same as what we expected. In other words, you can think of everything before ReplayAll as stating our expectations, everything after it as our actual test code with VerifyAll doing the final check.

If we were to change the test to something along the lines of (notice the extra dataAccess.Update call):

using NUnit.Framework;

[TestFixture]
public class CarTest
{
  [Test]   
  public void SaveCarCallsUpdateWhenAlreadyExistingCar()
  {         
     MockRepository mocks  = new MockRepository();
     IDataAccess dataAccess = mocks.CreateMock<IDataAccess>();
     ObjectFactory.InjectStub(typeof(IDataAccess), dataAccess);

     Car car = new Car();
     dataAccess.Update(car);
     dataAccess.Update(car);
     mocks.ReplayAll();

     car.Id = 32;
     car.Save();

     mocks.VerifyAll();
     ObjectFactory.ResetDefaults();
  }
}

RhinoMock would cause our test to fail and tell us that it didn't expect two calls to update.

For the save behavior, the interaction is slightly more complex – we have to make sure the return value is properly handled by the Save method. Here's the test:

using NUnit.Framework;

[TestFixture]
public class CarTest
{
  [Test]   
  public void SaveCarCallsSaveWhenNew()
  {         
     MockRepository mocks  = new MockRepository();
     IDataAccess dataAccess = mocks.CreateMock<IDataAccess>();
     ObjectFactory.InjectStub(typeof(IDataAccess), dataAccess);

     Car car = new Car();
     Expect.Call(dataAccess.Save(car)).Return(389);
     mocks.ReplayAll();

     car.Save();

     mocks.VerifyAll();
     Assert.AreEqual(389, car.Id);
     ObjectFactory.ResetDefaults();
  }
}

Using the Expect.Call method allows us to specify the return value we want. Also notice the Assert.Equals we've added – which is the last step in validating the interaction. Hopefully the possibilities of having control over return values (as well as output/ref values) lets you see how easy it is to test for edge cases. Imagine that we changed our Save function to throw an exception if the returned id was invalid, our test would look like:

using NUnit.Framework;

[TestFixture]
public class CarTest
{
  private MockRepository _mocks;
  private IDataAccess _dataAccess;

  [SetUp]
  public void SetUp()
  {
     _mocks = new MockRepository();
     _dataAccess = _mocks.CreateMock<IDataAccess>();
     ObjectFactory.InjectStub(typeof(IDataAccess), _dataAccess);
  }
  [TearDown]
  public void TearDown()
  {
     _mocks.VerifyAll();
  }

  [Test, ExpectedException("CodeBetter.Foundations.PersistenceException")]   
  public void SaveCarCallsSaveWhenNew()
  {                  
     Car car = new Car();
     using (_mocks.Record())
     {
        Expect.Call(_dataAccess.Save(car)).Return(0);
     }
     using (_mocks.Playback())
     {
        car.Save();
     }     
  }
}

We've actually changed a lot. First, we've shown how nUnit ExpectedException attribute can be used to test for an exception. Secondly, we've extracted the repetitive code that creates, sets up and verifies the mock object into the SetUp and TearDown methods. Finally, we used a different, more explicit, RhinoMocks syntax for setting up our record and playback states - I generally prefer this syntax.

More on nUnit and RhinoMocks
So far we've only looked at the basic features offered by nUnit and RhinoMocks, but there's a lot more than can actually be done with them. For example, RhinoMocks can be setup to ignore the order of method calls, instantiate multiple mocks but only replay/verify specific ones, or mock some but not other methods of a class (a partial mock), or simply create a stub.

Combined with a utility like NCover (http://www.ncover.com/), you can also get reports on your tests coverage. Coverage basically tells you what percentage of an assembly/namespace/class/method was executed by your tests. NCover has a visual code browser that'll highlight any un-executed lines of code in red. Generally speaking, I dislike coverage as a means of measuring the completeness of unit tests. After all, just because you've executed a line of code does not mean you've actually tested it. What I do like NCover for is to highlight any code that has no coverage. In other words, just because of line of code or method has been executed by a test, doesn't mean you're test is good. But if a line of code or method hasn't been executed, then you need to look at adding some tests.

We've mentioned Test Driven Development briefly throughout this series. As has already been mentioned, Test Driven Development, or TDD, is about design, not testing. TDD means that you write your test first and then write corresponding code to make your test pass. In TDD we'd write our Save test before having any functionality in the Save method. Of course, our test would fail. We'd then write the specific behavior and test again. The general mantra for developers is red ? green ? refactor. Meaning the first step is to get a failing unit testing, then to make it pass, then to refactor the code as required.

In my experience, TDD goes very well with Domain Driven Design, because it really lets us focus on the business rules of the system. If our client says tracking dependencies between upgrades has been a major pain-point for them, then we set off right away with writing tests that'll define the behavior and API of that specific feature. I recommend that you familiarize yourself with unit testing in general before adopting TDD.

 

UI and Database Testing
Unit testing your ASP.NET pages probably isn't worth the effort. The ASP.NET framework is complicated and suffers from very tight coupling. More often than not you'll require an actual HTTPContext, which requires quite a bit of work to setup. If you're making heavy use of custom HttpHandlers, you should be able to test those with quite a bit of ease.

 

On the other hand, testing your Data Access Layer is possible and I would recommend it. There may be better methods, but my approach has been to maintain all my CREATE Tables / CREATE Sprocs in text files along with my project, create a test database on the fly, and to use the Setup and Teardown methods to keep the database in a known state. The topic might be worth of a future blog post, but for now, I'll leave it up to your creativity.

Conclusion
Unit testing wasn't nearly as difficult as I first thought it was going to be. Sure my initial tests weren't the best – sometimes I would write near-meaningless test (like testing that a plain-old property was working as it should) and sometimes they were far too complex and well outside of a well-defined scope. But after my first project, I learnt a lot about what did and didn't work. One thing that immediately became clear was how much cleaner my code became. I quickly came to realize that if something was hard to test and I rewrote it to make it more testable, the entire code became more readable, better decoupled and overall easier to work with. The best advice I can give is to start small, experiment with a variety of techniques, don't be afraid to fail and learn from your mistakes. And of course, don't wait until your project is complete to unit test – write them as you go!

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Tuesday, 11 Dec 2007 01:22

It's common to hear developers promote layering as a means to provide extensibility. The most common example, and one I used in Part 2 when we looked at interfaces, is the ability to switch out your data access layer in order to connect to a different database. If your projects are anything like mine, you know upfront what database you're going to use and you know you aren't going to have to change it. Sure, you could build that flexibility upfront - just in case - but what about keeping things simple and You Aren't Going To Need IT (YAGNI)?

I used to write about the importance of domain layers in order to have re-use across multiple presentation layers: website, windows applications and web services. Ironically, I've rarely had to write multiple front-ends for a given domain layer. I still think layering is important, but my reasoning has changed. I now see layering as a natural by-product of highly cohesive code with at least some thought put into coupling. That is, if you build things right, it should automatically come out layered.

The real reason we're spending a whole part on decoupling (which layering is a high-level implementation of) is because it's a key ingredient in writing testable code. It wasn't until I started unit testing that I realized how tangled and fragile my code was. I quickly became frustrated because method X relied on a functional class Y which needed a database up and running. In order to avoid the headaches I went through, we'll first cover coupling and then look at unit testing in the next part.

(A point about YAGNI. While many developers consider it a hard rule, I rather think of it as a general guideline. There are good reasons why you want to ignore YAGNI, the most obvious is your own experience. If you know that something will be hard to implement later, it might be a good idea to build it now, or at least put hooks in place. This is something I frequently do with caching, building an ICacheProvider and a NullCacheProvider implementation that does nothing, except provide the necessary hooks for a real implementation later on. That said, of the numerous guidelines out there, YAGNI, DRY and Sustainable Pace are easily the three I consider the most important.)

Sneak Peak at Unit Testing
Talking about coupling with respect to unit testing is something of a chicken and egg problem – which to talk about first. I think it's best to move ahead with coupling, providing we cover some basics about unit testing. Most importantly is that unit tests are all about the unit. You aren't focusing on end-to-end testing but rather on individual behavior. The idea is that if you test each behavior of each method thoroughly and test their interaction with one and other, you're whole system is solid. This is tricky given that the method you want to unit test might have a dependency on another class which can't be easily executed within the context of a test (such as a database, or a web-browser element). For this reason, unit testing makes use of mock classes – or pretend class.

Let's look at an example, saving a car's state:

public class Car
{
   private int _id;     
   public void Save()
   {
      if (!IsValid())
      {
         //todo: come up with a better exception
         throw new InvalidOperationException("The car must be in a valid state");
      }
      if (_id == 0)
      {
         _id = DataAccess.CreateInstance().Save(this);
      }
      else
      {
         DataAccess.CreateInstance().Update(this);
      }
   }
   private bool IsValid()
   {
      //todo: make sure the object is in a valid state
      return true;
   }
}

To effectively test the Save method, there are three things we must do:

  1. Make sure the correct exception is thrown when we try to save a car which is in an invalid state,
  2. Make sure the data access' save method is called when it's a new car, and
  3. Make sure the Update method is called when it's an existing car.

What we don't want to do (which is just as important as what we do want to do), is test the functionality of IsValid or the data access' Save and Update functions (other tests will take care of those). The last point is important – all we want to do is make sure these functions are called with the proper parameters and their return value (if any) is properly handled. It's hard to wrap your head around mocking without a concrete example, but mocking frameworks will let us intercept the Save and Update calls, ensure that the proper arguments were passed, and force whatever return value we want. Mocking frameworks are quite fun and effective....unless you can't use them because your code is tightly coupled.

Not ALL coupling is bad
In case you forgot from Part 1, coupling is simply what we call it when one class requires another class in order to function. It's essentially a dependency. All but the most basic lines of code are dependent on other classes. Heck, if you write string site = "CodeBetter", you're coupled to the System.String class – if it changes, your code could very well break. Of course the first thing you need to know is that in the vast majority of cases, such as the silly string example, coupling isn't a bad thing. We don't want to create interfaces and providers for each and every one of our classes. It's ok for our Car class to hold a direct reference to the Upgrade class – at this point it'd be overkill to introduce and IUpgrade interface. What isn't ok is any coupling to an external component (database, state server, cache server, web service), any code that requires extensive setup (database schemas) and, as I learnt on my last project, any code that generates random output (password generation, key generators). That might be a somewhat vague description, but after this and the next part, and once you play with unit testing yourself, you'll get a feel for what should and shouldn't be avoided.

Since it's always a good idea to decouple your database from your domain, we'll use that as the example throughout this part.

Dependency Injection
In Part 2 we saw how interfaces can help our cause – however, the code provided didn't allow us to dynamically provide a mock implementation of IDataAccess for the DataAccess factory class to return. In order to achieve this, we'll rely on a pattern called Dependency Injection (DI). DI is specifically tailored for the situation because, as the name implies, it's a pattern that turns a hard-coded dependency into something that can be injected at runtime. We'll look at two forms of DI, one which we manually do, and the other which leverages a third party library.

Constructor Injection
The simplest form of DI is constructor injection – that is, injecting dependencies via a class' constructor. First, let's look at our DataAccess interface again and create a fake (or mock) implementation (don't worry, you won't actually have to create mock implementations of each component, but for now it keeps things obvious):

internal interface IDataAccess
{
   int Save(Car car);
   void Update(Car car);
}
internal class MockDataAccess : IDataAccess
{
   private readonly List<Car> _cars = new List<Car>();
   public int Save(Car car)
   {
      _cars.Add(car);
      return _cars.Count;
   }
   public void Update(Car car)
   {         
      _cars[_cars.IndexOf(car)] = car;
   }
}

Although our mock's upgrade function could probably be improved, it'll do for now. Armed with this fake class, only a minor change to the Car class is required:

public class Car
{
   private int _id;
   private IDataAccess _dataProvider;
   public Car() : this(new SqlServerDataAccess())
   {
   }
   internal Car(IDataAccess dataProvider)
   {
      _dataProvider = dataProvider;
   }
   public void Save()
   {
      if (!IsValid())
      {
         //todo: come up with a better exception
         throw new InvalidOperationException("The car must be in a valid state");
      }
      if (_id == 0)
      {
         _id = _dataProvider.Save(this);
      }
      else
      {
         _dataProvider.Update(this);
      }
   }
   private bool IsValid()
   {
      //todo: make sure the object is in a valid state
      return true;
   }
}

Take a good look at the code above and follow it through. Notice the clever use of constructor overloading means that the introduction of DI doesn't have any impact on existing code – if you choose not to inject an instance of IDataAccess, the default implementation is used for you. On the flip side, if we do want to inject a specific implementation, such as a MockDataAccess instance, we can:

public void AlmostATest()
{
   Car car = new Car(new MockDataAccess());
   car.Save();
   if (car.Id != 1)
   {
      //something went wrong
   }
}

There are minor variations available – we could have injected an IDataAccess directly in the Save method or could set the private _dataAccess field via an internal property – which you use is mostly a matter of taste.

Frameworks
Doing DI manually works great in simple cases, but can become unruly in more complex situations. A recent project I worked on had a number of core components that needed to be injected – one for caching, one for logging, one for a database access and another for a web service. Classes got polluted with multiple constructor overloads and too much thought had to go into setting up classes for unit testing. Since DI is so critical to unit testing, and most unit testers love their open-source tools, it should come as no surprise that a number of frameworks exist to help automate DI. The rest of this article will focus on StructureMap, a Dependency Injection framework created by fellow CodeBetter blogger Jeremy Miller. (http://structuremap.sourceforge.net/)

Before using StructureMap you must configure it using an XML file (called StructureMap.config) or by adding attributes to your classes. The configuration essentially says this is the interface I want to program against and here's the default implementation. The simplest of configurations to get StructureMap up and running would look something like:

<StructureMap>
  <DefaultInstance PluginType="CodeBetter.Foundations.IDataAccess, CodeBetter.Foundations" 
                   PluggedType="CodeBetter.Foundations.SqlServerDataAccess, CodeBetter.Foundations" /> 
</StructureMap>

While I don't want to spend too much time talking about configuration, it's important to note that the XML file must be deployed in the /bin folder of your application. You can automate this in VS.NET by selecting the files, going to the properties and setting the Copy To Ouput Directory attribute to Copy Always. (There are a variety of more advanced configuration options available. If you're interested in learning more, I suggest the StructureMap website).

Once configured, we can undo all the changes we made to the Car class to allow constructor injection (remove the _dataProvider field, and the constructors). To get the correct IDataAccess implementation, we simply need to ask StructureMap for it, the Save method now looks like:

public class Car
{
   private int _id;
   public void Save()
   {
      if (!IsValid())
      {
         //todo: come up with a better exception
         throw new InvalidOperationException("The car must be in a valid state");
      }
      IDataAccess dataAccess = ObjectFactory.GetInstance<IDataAccess>();
      if (_id == 0)
      {
         _id = dataAccess.Save(this);
      }
      else
      {
         dataAccess.Update(this);
      }
   }
   private bool IsValid()
   {
      //todo: make sure the object is in a valid state
      return true;
   }
}

To use a mock rather than the default implementation, we simply need to inject the mock into StructureMap:

public void AlmostATest()
{
   ObjectFactory.InjectStub(typeof(IDataAccess), new MockDataAccess());
   Car car = new Car();
   car.Save();
   if (car.Id != 1)
   {
      //something went wrong
   }
   ObjectFactory.ResetDefaults();
}

We use InjectStub so that subsequent calls to GetInstance return our mock, and make sure to reset everything to normal via ResetDefaults.

DI frameworks such as StructureMap are as easy to use as they are useful. With a couple lines of configuration and some minor changes to our code, we've greatly decreased our coupling which increased our testability. In the past, I've introduced StructureMap into existing large codebases in a matter of minutes – the impact is minor.

Conclusion
Reducing coupling is one of those things that's pretty easy to do yet yields great results towards our quest for greater maintainability. All that's required is a bit of knowledge and discipline – and of course, tools don't hurt either. It should be obvious why you want to decrease the dependency between the components of your code – especially between those components that are responsible for different aspects of the system (UI, Domain and Data being the obvious three). In the next part we'll look at unit testing which'll really leverage the benefits of dependency injection. If you're having problems wrapping your head around DI, take a look at my more detailed article on the subject at http://dotnetslackers.com/articles/designpatterns/IntroducingDependencyInjectionFrameworks.aspx.

Author: "karl" Tags: "Featured, Foundations"
Comments Send by mail Print  Save  Delicious 
Date: Wednesday, 05 Dec 2007 16:10

Part 3 of the Foundations series has been posted on DotNetSlackers. You can see it at:
http://dotnetslackers.com/articles/net/FoundationsOfProgrammingPersistence.aspx 

This is an introduction to persistence using objects, which will be re-examined in a later part. 

I've always believed that CodeBetter and communities like DNS have a symbiotic relationship, and this is my ongoing effort to help that relationship flourish.

Stay tuned to the blog for more parts.

Author: "karl" Tags: "Foundations"
Comments Send by mail Print  Save  Delicious 
» You can also retrieve older items : Read
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader