Date: Mon, 20 May 2013 06:13:15 +0200
- Richard Hale Shaw's Blog
What's New in the .NET 2.0 Base Class Libraries? (Part 1)
During the .NET 2.0 Beta, I started keeping statistics on differences in the number of public features (classes, members, structs, interfaces, etc.) in the old .NET 1.1 Framework, vs. those in .NET 2.0. Part of the point was to obtain a statistical idea of how the two Base Class Libraries (BCL) changed; partly, it would let me drill down into what the differrences were.
Motivation? Personal curiosity coupled with a desire to add a lot of interesting meat to a new BootCamp class that I've been working on. (Ok, as of this date, I've mentioned it several times in previous posts, but I still haven't posted the Outline. With a little luck, I'll have it posted this weekend en route to see clients in Europe.)
The BCL is stored in a well-documented directory, and the Common Language Runtime (CLR) loads it from there. (This is "C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727" on my system, as it's always loaded from the Windows installation folder, and then "Microsoft.NET\Framework\" and the Framework version number. The simplest trick is to use the value returned by a method called System.Runtime.InteropServices.RuntimeEnvironment.GetRuntimeDirectory()). You can load these yourself from that folder (use AssemblyName.GetAssemblyName to obtain the AssemblyName object for each assembly -- and to screen out any DLLs that aren't assemblies -- and pass the AssemblyName that's returned to Assembly.Load to obtain a reference to the loaded assembly). Then use Assembly.GetExportedTypes to find the number of public types in a given assembly, and drill town to the rest from there. (This is stock in trade for most .NET developers, and a standard lab for my students in my C# BootCamp -- a shameless plug, that.)
One problem in the process was making sure I'd really loaded assemblies (and as I mentioned, that's taken care of for you when you call AssemblyName.GetAssemblyName: it will throw if you pass it the name of an ordinary non-assembly DLL). But another problem -- raised by my colleague Martin Shoemaker -- is how do I know I have processed all the BCL assemblies, and what about the ones in the Global Assembly Cache (GAC)?
First of all: the ones in the GAC are copies, put there when you install .NET so that the CLR will find them (and find them quickly: the CLR's binding algorithm causes it to look for strong-named assemblies in the GAC before it looks anywhere else, and virtually all the BCL libraries are strong-named). You can easily validate that they're copies by comparing their AssemblyName references: because they're strong-named, the references will be identical if they match the usual strong-name criterial (short filename without extension, Culture setting, public key token, and version number).
Searching the GAC
To search the GAC, you start with c:\Windows\Assembly -- but not using that silly shell extension that displays when you open it with Windows Explorer -- but by opening a Command Prompt window there. Where FW1.1 used the GAC folder under it, FW2.0 uses GAC_32 and GAC_MSIL (for non-portable 32-bit and above assemblies, and for portable assemblies, respectively) to store assemblies. (By default, strong-named assemblies you create will be portable, and will go into the GAC_MSIL folder tree.) You can write a program that will recurse those two folders, as all GAC-based assemblies will be stored in a hierarchical set of folders using one of those 2 as a root.
For example, suppose you store a strong-named assembly in the GAC under FW2.0, and the assembly filename is HelloServer.DLL with a public key token of fa8b520ddd539743 and a version number of 188.8.131.52. Then the path to that assembly will be:
I.e., GACUtil will create a folder under GAC_MSIL using the short assembly name, and then use the version# and public key token to create a unique folder name under it, storing the assembly there. (In other words, the GAC really uses side-by-side execution just as we do with assemblies outside the GAC.) Of course, I'm not advocating that you store anything in the GAC (let's save that topic for another blog entry -- as general rule you should only do so if you absolutely have to, but the problem is Microsoft has done a pretty poor job of explaining when you really have no other choice).
So finding the GAC equivalents of the BCL meant searching the two GAC root folders, GAC_32 and GAC_MSIL, recursively. To find all the DLLs in either of these two folders, you can use a new overload of System.IO.Directory.GetFiles which will let you pass in a start folder and a search pattern (e.g., "*.dll"), and the method will return an array of strings, one for each file found that meets the criteria. While 2 of the 3 overloads of GetFiles have been part of .NET for awhile, the one that offers recursion is new in FW2.0, and the two older methods have been refactored in this release to internally call the new implementation.
The VB2005 Approach
(If you're using VB2005, you might instead use the My object services which generates a call to Microsoft.VisualBasic.FileIO.FileSystem.GetFiles, which effectively does the same thing -- except that it returns System.Collections.ObjectModel.ReadOnlyCollection instead of string, and it can search either files or directories. Out of curiosity, I examined at the implementation of FileSystem.GetFiles: it's actually a wrapper around an orchestrated set of calls to Directory.GetFiles to search for files or directories in order to recurse a directory tree. Figuring that most VB2005 developers would use FileSystem.GetFiles, I wrote a small program to test the performance differences between the two: using a series of controlled scenarios, I setup calls to Directory.GetFiles and FileSystem.GetFiles. I used the System.Diagnostics.StopWatch class (new in FW2.0) to time the performance of the two methods. Hands-down, Directory.GetFiles out-performed FileSystem.GetFiles by at least 2-1 -- and sometimes 4-1. If you're concerned about performance above other criteria when you're executing a recursive file search, use Directory.GetFiles.)
A Handy CopyToArray Method
There was, fortunately, a serendipitous outcome of testing the FileSystem.GetFiles method. The latter returns ReadOnlyCollection; I needed the results in an array. While classes like ReadOnlyCollection implement a CopyTo method, I can't remember how many times I've had to write code like the following inside of a method that uses a collection object internally:
string results = new string[someCollection.Count];
I realized it was time to refactor this into a method I could use over-and-over, and wrote the following:
public static T CopyToArray( ICollection coll )
T results = new T[coll.Count];
The beauty of this Generic method is that it'll work with any type that implements ICollection to copy the contents of the collection out to an array of T. Definitely beats writing the same code over and over.
Matching the GAC-Based Assemblies
Using Directory.GetFiles, I wrote a method that would take the resulting filenames of GAC-based assemblies (stored in a collection of strings using List), and get an AssemblyName object for each one:
AssemblyName gacImage = AssemblyName.GetAssemblyName(fileName);
Then, I stripped off the path, and used the base filename to search the RuntimeDirectory (again, by calling GetRuntimeDirectory), find a matching filename, and get an AssemblyName object for it as well:
AssemblyName netImage =
RuntimeEnvironment.GetRuntimeDirectory + Path.GetFileName(fileName));
Finally, I compared the two AssemblyName objects using a new FW2.0 method, ReferenceMatchesDefinition:
Using this method, I could verify that the two assemblies were identical (if you delve into the help for ReferenceMatchesDefinition, you'll see that the outcome could depend on which parameter was passed first: I ran the program both ways to verify the results).
Conversely, I used the same materials to search in the opposite direction: (a) recursively search the 2 GAC folders mentioned above to create a list of AssemblyName objects (List), (b) search the RuntimeDirectory folder to find every assembly there, and (c) use each assembly found to search the cached List and call ReferenceMatchesDefinition on each one looking for a match.
Here are the results.
The following are all assemblies in the FW2.0 RuntimeDirectory folder with counterparts in one of the 2 GAC folders:
And, FYI, the remaining .DLLs in the RuntimeDirectory folder are just ordinary DLLs, with no counterpart in the GAC:
I'll follow with another blog entry about the types in the first list, above.