Introduction
Sunday, April 1, 2001
In this topic, I'll talk about how you collect things in VB.NET. There are significant changes in the way we deal with arrays, dictionaries, and other collections. Much has changed for the better, but if you're looking for VBA.Collection, you might be a little disappointed. I'll talk about approches to help you out on that.
New classes
You'll find the relevant .NET framework classes in the System.Collections namespace in mscorlib.dll and in System.Collections.Specialized in System.dll. When you create a new project, both should be referenced by default. If you want to try things out, you might want to import the whole bunch into your code file. There's also a very familiar class in the namespace Microsoft.VisualBasic (in the library Microsoft.VisualBasic.dll), but I'll get to that later.
In VB5/6 all we had was Collection, Dictionary, and the humble array. Although Collection's interface was pretty dull, it was, due to the hacks and workarounds possible, the premier workhorse for our collection tasks (you'll find my own general purpose workarround in the VB6Core component). Dictionary was a half-baked copy of Perl's associative array. It was tempting to replace Collection with it (in VBScript, Collection didn't even exist), but it was fundamentally different. Array's weren't objects; we'll see how they work now, later.
Now, all collections are classes. There are several interfaces that they implement, so even though the object browser seems more bloated now, you can handle the stuff easily. When you create your own collections, you'll primarily implement ICollection and IEnumerator. But mostly, you'll just inherit from an existing class (there are also special MustInherit / Abstract classes). It beats the heck out of delegating to Collection and setting the proc ID of NewEnum to negative four. Sorting now means calling a method, perhaps providing a comparer, but you don't need to write your own sort algorithm.
Classes ready to use
So let's see what we got. Here's a short overview of the most important classes; I grouped them by the major characteristics:
- Array: yes, it's a class. You can exchange "standard arrays" (well, arrays) with Array objects. The Array class handles multiple dimensions as well. I'll get to arrays later.
- ArrayList: an Array that implements IList (see below); works closely with arrays; you can access elements by index, and perform sorts. For a simple collection task, an ArrayList is often a good choice. It is more efficient that ordinary arrays, since it can grow, and does so in large steps automatically. ArrayList can also be used as a wrapper for any collection that supports IList - this enables ArrayList's Sort and BinarySearch methods for that class.
- StringCollection: specializes in Strings. Access by index.
- NameValueCollection, SortedList: they have two arrays, one for the keys and one for the values. They are all sorted by the keys. See the help files for the differences.
- Stack, Queue: Stack means first in, last out. Queue means first in, first out. Both have in common that the order of removing elements is predetermined. Use Queue for command queues or for message queues.
- Dictionary, HashTable/CaseInsensitiveHashTable: collections of key/value pairs. The key is the "key" thing here; watch out how you iterate. You can iterate through the values and the keys, but keep in mind that the order in which you walk through is not the order in which items where added. HashTable orders entries according to their hash code, but Dictionary doesn't appear to order them at all.
Base classes
As I mentioned, there are also a number of abstract (MustInherit) classes you can use. Here are three examples you might find useful:
- CollectionBase: Use it to create strongly typed collections. It comes down to using array lists. Keys are not implemented. This class resides in System.Collections.
- DictionaryBase: With this you can create strongly typed dictionaries. It has keys, but the order of entries is not defined. It implements an IDictionaryEnumerator interface in its respective enumerator class. This class, too, lives in System.Collections.
- NameObjectCollectionBase: let's you work with keys and indices. You can make it strongly typed as well. It gets close to VBA.Collection, but the enumerator you get is obsessed with the keys, though. Note that keys ("names") need not be unique, so you have to adjust your Add method when you need unique keys. You'll find it in System.Collections.Specialized in System.dll.
Interfaces
There are several interfaces used with collections. Mostly, you don't need to implement them, but you should definitely consider using them. Here's a short description of most:
- IList: This interface expresses the idea that things can be accessed/ordered by index. IList derives from ICollection (this means not every implementor of ICollection is necessarily indexed). IList exposes members like Add, Insert, and Item, which have System.Object as their parameter or return types, respectively. However, the methods implementing an interface can be private, so even strongly typed collections can support this interface.
- ICollection: All collections implement this interface. It defines their size, determines whether the colletion is read-only, and whether access to the collection is thread-safe (IsSynchronized property). The elements can be copied to an Array via CopyTo. This interface is derived from IEnumerable (see below), so it returns the enumerator as well. Dictionary and HashTable implement IDictionary, which is derived from ICollection. ICollection is useful because of the Count property and the CopyTo method - there are many places in the framework where arrays are expected, and with the information that ICollection provides, data conversion to arrays is painless.
- IEnumerable: This one has exactly one method, GetEnumerator, which returns a object whose class implements IEnumerator (below). GetEnumerator replaces [_NewEnum] in the Collection class in Microsoft.VisualBasic. When you create your own collection, you implement this interface (and create an enumerator class) to make For-Each loops work.
- IEnumerator: This one is implemented by enumerator classes. Any collection will implement IEnumerable.GetEnumerator (above) to return the enumerator to the client; the enumerator must then implement members like Current, MoveNext, and Reset to allow for enumeration. But Nothing keeps you from adding a Skip method or a Clone method to your enumerator to allow for more advanced iteration techniques. Dictionary's and HashTable's enumerator classes implement IDictionaryEnumerator (it derives from IEnumerator) instead, which let's the client access both keys and values. However, you can walk through either a Dictionary or a HashTable using an IEnumerator if you call either the Keys or the Values property, which return an ICollection and therefore an IEnumerable (this way you can use a standard enumeration routine if you only need the keys or the values, respectively).
What do we do with all these interfaces? Well, it appears more confusing than it really is. When you create your own collection from scratch, you're just concerned with ICollection and IEnumerator. Otherwise you can extend existing classes. If you just use them, you'll find that the all work similiarly, but watch out for enumeration order and keys vs. values.
Arrays
Nature of arrays
Arrays are reference types deriving from the class System.Array. An array of a reference type stores references, whereas an array of values stores the values directly: there is no boxing when you have an array of, say, Doubles; however, an array of System.Object or of System.ValueType can store references pointing to boxed values. You cannot assign an array of values to a reference of type "Array of System.Object" or even "Array of System.ValueType", because the elements in the array aren't boxed.
For copying an array, use the Shared Copy method of the System.Array class.
Declarations and initalization
You use arrays differently than before. Now, you can only specify the size of the array, not its lower bound (Option Base is gone, "n To m" is gone, too). When you put a number in the parens after the identifier, it identifies the upper bound of the array (like in VB6 - think end-point-inclusive, which is unusual). ReDim cannot change the number of dimensions in the array. You cannot use Redim to initially declare an array.
Declaring array references
There is now a new way to declare an array; the following two lines mean the same:
Dim animals() As String Dim animals As String()
The new Syntax let's you think: "String array". Strictly speaking, the preceding two lines of code only declare references, which are initialized to Nothing.
Creating arrays by specifiying the upper bound
Specifiying the size of an array in the declaration actually means to declare a reference, create an array of the given size, and assign it to the reference.
Dim flowers(9) As CFlower Dim flowers As CFlower() = New CFlower(9) {}
You cannot combine declaration and array creation like you can do when instatiating a class (using the "As New" syntax). Similiarly, the ReDim statement translates like this:
ReDim flowers(19) As CFlower flowers = New CFlower(19) {}
The last line is more in sync with what's really happening, so I find this syntax preferable. ReDim does not resize an existing array, it creates a new one; ReDim Preserve does the same, plus copy the original contents to the new array. The operation assigns the new array to the reference, so although arrays are reference types, the change is not reflected elsewhere.
The fact that VB uses parens everywhere necessitates the curly braces after the "New" statement, because the array creation expression must be distinguished from a constructor call. If VB.NET allowed (or even required) the "LBound To UBound" syntax, things would be much clearer.
people = new CPerson[100]; // C# array syntax people = New CPerson(0 To 99) ' I wish ... people = New CPerson(99) {} ' sad VB.NET reality
Creating arrays by inline initialization
Also, arrays can now be initialized right away (although then you don't specify a size, you simply initialize all elements, and the compiler figures out the size needed):
Dim days() As String = New String() {"Monday", "Tuesday", "Wednesday", _ "Thursday", "Friday", "Saturday", "Sunday"}
Multidimensional arrays
Multidimensional arrays work like this:
Dim ints As Integer(,) ' just a reference Dim ints(1, 1) As Integer Dim ints As Integer(,) = New Integer(1, 1) {} Dim ints As Integer(,) = New Integer(,) {{1, 2}, {3, 4}}
Note that you can also explicitly create an empty array (by using the "New" syntax and removing the upper bound literals).
Ragged arrays
Create arrays of arrays this way:
Dim shorts As Short()() ' just a reference Dim shorts As Short(1)() ' can only dim top rank Dim shorts As Short()() = New Short(1)() {} ' ditto (slots are Nothing) Dim shorts As Short()() = New Short()() {New Short() {}, New Short() {}}
Again, you can create empty arrays, but don't confuse this with Nothing references in the second and third lines.
Mixing arrays types
Dim chars()(,) As Char ' just a reference Dim chars(1)(,) As Char Dim chars()(,) As Char = New Char(1)(,) {} Dim chars()(,) As Char = New Char()(,) {New Char(,) {}, New Char(,) {}}
Enough said.
Array bounds
Anyway, now that all arrays start at zero, I might add that all collections now start at zero as well. In my opinion, the default should be one, for sane people start counting at one, and Basic is a language for sane people. I also think it's a gratutious nonsense to remove user-defined bounds from the language. VB looses some easy flexibility just because of harmonizing with C and Java. There is, however, something to be said in favor of having a consistent starting point to start indexing both arrays and collections (as a default or as a convention, anyway) - I just wish we had other choices as well, for those cases where it's convenient and where cross-language interop doesn't matter.
Array covariance
For now, see "Changes in VB.NET".
Arrays in structures
For now, see "Changes in VB.NET".
Using collections
Let's get familiar with the new collections. You'll find that their methods and properties are a lot more powerful than VBA.Collection's gang of four. But, when you browse the help files, you'll also notice that there is a heavy flavour of the dictionary/hashtable/map philosophy: keys everywhere, but not necessarily a defined order of entries.
Let's start with enumerations (other operations should be fairly easy). You know how For-Each works (assuming a magic collection called NCollection):
Dim col As New NCollection("John Doe", "Sue Doe"), s As String For Each s In col Console.WriteLine(s) Next s
For-Each is a nice feature that other languages lack, but now VBers can get their hands dirty, too, if they like. First, get an enumerator object:
Dim ce As NCollectionEnumerator ce = col.GetEnumerator Do While ce.MoveNext Console.WriteLine(ce.Current) Loop
An interesting aspect is that you call MoveNext before you deal with the very first item. With Do-Loop, you could test at the end of the loop, but it just works differently (just consider an empty collection). Ah, forgot to say, MoveNext returns True as long as the current position is valid.
If you're using Dictionary of HashTable, or anything else that gives disproportionate importance to keys, the enumerator has three more properties: Key, Value, and Entry. The latter is of type DictionaryEntry, which is a structure consisting of Key and Value. It can be convinient to pass that arround, but it's shorter to use Key or Value directly; this explains the ostensible redundancy. As I've mentioned, you can also call Keys or Values, and then use the enumerator you get from the interface (ICollection) that these properties return to iterate with a "normal" IEnumerator.
Rolling your own
The most important thing is implementing a way for clients to enumerate. They can use For- Each loops on collections, but of course you're interested of what goes on behind the scenes. How do For-Each loops work? A loop is executed exactly so often as exist entries in a collection; the client need not know how many. The power of that control structure also stems from the fact that you get a reference to (or a copy of, depending on the types) the objects in the collection automatically. That's more efficient if you need to access several properties of an object (also note that with inheritance on collections and the high-level nature of the framework, there can be many levels of indirection, slowing down member access). The For-Each control structure calls a method (traditionally named [_NewEnum], EnumObjects, or GetEnumerator) on the collection; this method returns an object called an enumerator. For-Each knows that this method is called GetEnumerator because it's a standard stipulated by the IEnumerable interface, which all collections implement (by way of implementing ICollecion, which derives from it). GetEnumerator creates a new enumerator object; the enumerator objects knows about the collection object because it's created by that collection object. Here's an implementation of GetEnumerator:
Class NCollection Implements IEnumerable Public Function GetEnumerator() As IEnumerator _ Implements IEnumerable.GetEnumerator Return New NCollectionEnumerator(Me) End Function End Class
Here, the collection class NCollection implements the IEnumerable Interface, which has only one member, GetEnumerator (note it could also implement ICollection). GetEnumerator is typed as IEnumerator, because clients (using For-Each loops) expect it that way. So the object returned must implement that interface. Here, NCollectionEnumerator does that. When the enumerator object is constructed, it's passed a referenced to the NCollection instance, so the enumerator knows how to talk to it. Here's NCollectionEnumerator:
Class NCollectionEnumerator Implements IEnumerator ' store reference to collection; current position Private m_Collection As NCollection Private m_Pos As Integer ' on init, pass ref to the collection we walk through ' make sure init pos is invalid Sub New(ByVal col As NCollection) m_Collection = col m_Pos = -1 End Sub Public ReadOnly Property Current As Object _ Implements IEnumerator.Current Get If m_Pos < 0 Or m_Pos > m_Collection.UpperBound Then Throw New InvalidOperationException Else Return m_Collection.Item(m_Pos) End If End Get End Property Public Function MoveNext() As Boolean Implements IEnumerator.MoveNext m_Pos += 1 Return CType(m_Pos >= 0 And m_Pos < m_Collection.Count, Boolean) End Function Public Sub Reset() Implements IEnumerator.Reset m_Pos = -1 End Sub End Class
Note that our NCollection class is zero-based. When the enumerator is created, the current position must be -1, for enumerators work that way - by convention, the user calls MoveNext first (as far as a convention proposed by the help files of a Beta version goes; one may argue that the rules come with the interface description, but they're not enforced by the language).
A new (and potentially better) Collection
So how do we resurrect VBA.Collection? No, I'm not kidding. It had some characteristics that no class in System.Collections has in that combination:
- It had keys. Keys were optional, but when used had to be unique.
- It had indices. The order of items stayed the same, whether you iterated with For-Next or For-Each.
- When you iterated with For-Each, you got the values, not the keys or beasts called DictionaryEntry.
If you study the classes in System.Collections carefully, you'll find that none has all of them. On the other hand, the new ones have features like inserting or removing entire sections, sorting, or checking for existence that VBA.Collection missed (I hardly need to advocate that).
In any case, it's worth analysing your collection needs and decide on a case-by-case basis as to which collection to use, or whether to create a new class. If you only want to map keys and values, use dictionary. If you don't care about keys, but want to sort a few items, try using an ArrayList, or derive a class from CollectionBase. In any case, time it. After all, this is still a Beta and well, we all need to learn new things.
But there are cases when the distinct features of VBA.Collection are just what you need. So why not use the one in the compatibility namespace? Because it's not good enough. You didn't like it in VB5/6, so if it is to serve you now, you want to extend it. Also, it exposes a weakly (Object) typed Item property; this means that the only way to create strongly typed collection is by way of delegation. Another issue is that the lower bound for indexing is one, which is a sane choice in itself, but it's incompatible with every other collection in the .NET framework.
Base class: NameObjectCollectionBase
This base knows keys (well, sort of, but eventually it will) and indices. It's got a little problem with the enumerator, though (I'll explain). We'll start with a generic derivation though, which has "Object"-type elements , one we can immediately use in VB6-style code. You can easily change that to any other type (of course, you could also switch to C++ which has a language feature called "templates" ..., but that's not why you're here). A better strategy, however, is to leave type-specific members (such as "Item" and "Add" [renamed, of course]) with protected access, allowing the creation of strongly-typed collection while reusing other implementation details in this class (such as the GetEnumerator override), but that's an exercise for the reader.
So let's do some inheritance:
Public Class NCollection Inherits System.Collections.Specialized.NameObjectCollectionBase ' constructors are not inherited Sub New() MyBase.New End Sub End Class
NOCB has got many protected methods it expects us to call; so much of the code looks like delegation, but it's not. Whenever you inherit, there are some members that you override; all in all NOCB offers a good balance between code reuse and flexibility. We'll also add new features. For example, now that all collections are zero-based, which we'll stick to in this exercise, you might want to add an UpperBound property that eases use in For-Next loops (many MFC collections have a corresponding method):
Public ReadOnly Property UpperBound As Integer Get Return MyBase.Count - 1 End Get End Property
Potentially, you can allow user-defined bounds as well, offsetting the indices behind the scenes.
Let's stick with the properties. Remember how to set the default property in VB6? Still know how you passed Variants to Item? That you had to implement Get/Let/Set properties (if your collection allowed modifiying existing variant items)? Here's the new Item property:
Public Default Overloads Property Item(ByVal index As Integer) As Object Get Return MyBase.BaseGet(index) End Get Set MyBase.BaseSet(index, value) End Set End Property Public Default Overloads Property Item(ByVal sKey As String) As Object Get Return MyBase.BaseGet(sKey) End Get Set MyBase.BaseSet(sKey, value) End Set End Property
Here are some methods. Note that NOCB allows for duplicate keys; we make them unique in the Add method (maybe there are other scenarios where you indeed choose to use duplicate names):
Public Sub Add(ByVal value As Object, _ Optional ByVal sKey As String = Nothing) If MyBase.BaseGet(sKey) Is Nothing Then MyBase.BaseAdd(sKey, value) Else Throw New ArgumentException End If End Sub Public Overloads Sub Remove(ByVal sKey As String) MyBase.BaseRemove(sKey) End Sub Public Overloads Sub Remove(ByVal index As Integer) MyBase.BaseRemove(index) End Sub Public Sub Clear() MyBase.BaseClear End Sub Public Function ExistsKey(ByVal sKey As String) As Boolean Return Not (MyBase.BaseGet(sKey) Is Nothing) End Function
The enumerator
We also have to provide an enumerator to use with For-Each. But the one our base class has is Public, so the clients can just use that, right?
But when they iterate with For-Each, they'll find it prints the keys. Again, even NOCB is a collection of the dictionary style, to some extend. So maybe we can override the GetEnumerator method?
In Beta 1, it was not marked as "Overridable"; in Beta 2, we were free to use our own implementation:
Public Overrides Function GetEnumerator () As IEnumerator Return New NCollectionEnumerator(Me) End Function
This worked in Beta 2, but the sad news is the in the final version, they've back paddled - you can't override GetEnumerator if you derive from NameObjectCollectionBase.
So if you're interested in creating a collection from scratch, check out the Lists topic, or see the Lists project from my Gregor.NET series.