Wednesday, August 23, 2006

Iterators in c# 2.0

In C# 1.1, you can iterate over data structures such as arrays and collections using a foreach loop.
string[] cities = {"New York","Paris","London"};
foreach(string city in cities)
{
Console.WriteLine(city);
}
In fact, you can use any custom data collection in the foreach loop, as long as that collection type implements a GetEnumerator method that returns an IEnumerator interface. Usually you do this by implementing the IEnumerable interface:
public interface IEnumerable
{
IEnumerator GetEnumerator();
}
public interface IEnumerator
{
object Current{get;}
bool MoveNext();
void Reset();
}
Often, the class that is used to iterate over a collection by implementing IEnumerable is provided as a nested class of the collection type to be iterated. This iterator type maintains the state of the iteration. A nested class is often better as an enumerator because it has access to all the private members of its containing class. This is, of course, the Iterator design pattern, which shields iterating clients from the actual implementation details of the underlying data structure, enabling the use of the same client-side iteration logic over multiple data structures, as shown in Figure 1.




Figure 1



Figure 1 Iterator Design Pattern
In addition, because each iterator maintains separate iteration state, multiple clients can execute separate concurrent iterations. Data structures such as the Array and the Queue support iteration out of the box by implementing IEnumerable. The code generated in the foreach loop simply obtains an IEnumerator object by calling the class's GetEnumerator method and uses it in a while loop to iterate over the collection by continually calling its MoveNext method and current property. You can use IEnumerator directly (without resorting to a foreach statement) if you need explicit iteration over the collection.
But there are some problems with this approach. The first is that if the collection contains value types, obtaining the items requires boxing and unboxing them because IEnumerator.Current returns an Object. This results in potential performance degradation and increased pressure on the managed heap. Even if the collection contains reference types, you still incur the penalty of the down-casting from Object. While unfamiliar to most developers, in C# 1.0 you can actually implement the iterator pattern for each loop without implementing IEnumerator or IEnumerable. The compiler will choose to call the strongly typed version, avoiding the casting and boxing. The result is that even in version 1.0 it's possible not to incur the performance penalty.
To better formulate this solution and to make it easier to implement, the Microsoft .NET Framework 2.0 defines the generic, type-safe IEnumerable and IEnumerator interfaces in the System.Collections.Generics namespace:
public interface IEnumerable
{
IEnumerator GetEnumerator();
}
public interface IEnumerator : IDisposable
{
ItemType Current{get;}
bool MoveNext();
}
Besides making use of generics, the new interfaces are slightly different than their predecessors. Unlike IEnumerator, IEnumerator derives from IDisposable and does not have a Reset method.
The second and more difficult problem is implementing the iterator. Although that implementation is straightforward for simple, it is challenging with more advanced data structures, such as binary trees, which require recursive traversal and maintaining iteration state through the recursion. Moreover, if you require various iteration options, such as head-to-tail and tail-to-head on a linked list, the code for the linked list will be bloated with various iterator implementations. This is exactly the problem that C# 2.0 iterators were designed to address. Using iterators, you can have the C# compiler generate the implementation of IEnumerator for you. The C# compiler can automatically generate a nested class to maintain the iteration state. You can use iterators on a generic collection or on a type-specific collection. All you need to do is tell the compiler what to yield in each iteration. As with manually providing an iterator, you need to expose a GetEnumerator method, typically by implementing IEnumerable or IEnumerable.
You tell the compiler what to yield using the new C# yield return statement.

You can also use C# iterators on non-generic collections.
In addition, you can use C# iterators on fully generic collections. When using a generic collection and iterators, the specific type used for IEnumerable in the foreach loop is known to the compiler from the type used when declaring the collection—a string in this case:
LinkedList list = new LinkedList();
/* Some initialization of list, then */
foreach(string item in list)
{
Trace.WriteLine(item);
}
This is similar to any other derivation from a generic interface.
If for some reason you want to stop the iteration midstream, use the yield break statement. For example, the following iterator will only yield the values 1, 2, and 3:
public IEnumerator GetEnumerator()
{
for(int i = 1;i< 5;i++)
{
yield return i;
if(i > 2)
yield break;
}
}
Your collection can easily expose multiple iterators, each used to traverse the collection differently. For example, to traverse the CityCollection class in reverse order, provide a property of type IEnumerable called Reverse:
public class CityCollection
{
string[] m_Cities = {"New York","Paris","London"};
public IEnumerable Reverse
{
get
{
for(int i=m_Cities.Length-1; i>= 0; i--)
yield return m_Cities[i];
}
}
}
Then use the Reverse property in a foreach loop:
CityCollection collection = new CityCollection();
foreach(string city in collection.Reverse)
{
Trace.WriteLine(city);
}
There are some limitations to where and how you can use the yield return statement. A method or a property that has a yield return statement cannot also contain a return statement because that would improperly break the iteration. You cannot use yield return in an anonymous method, nor can you place a yield return statement inside a try statement with a catch block (and also not inside a catch or a finally block).