Sunday, August 22, 2010

Containers with multiple enumerators

Paul Ishenin is, AFAIK, the author of the FPC enumerators support implementation. In his article on the FPC Wiki "FOR-IN LOOP", section "Proposed extensions/Select which enumerator to use", one can read (I hope it's allowed/legal/fair use to reproduce here the part of interest):
It is impossible to choose among different possible enumerators. For example you can traverse a tree using different orders. The well known algorithms are: preorder, postorder, inorder and breadth‑first traversals. Therefore it would be useful to have an ability to choose an enumerator. For example using the following syntax:
 
type
  TTreeEnumeratorType = (tePreOrder, tePostOrder, teInOrder, teBreadthFirst)

procedure TraverseTree(Tree: TTree);
var
  Node: TNode;
begin
  // Variant1. For the class instances we can call the method Tree.GetEnumerator(teInOrder). 
  // For the classes we can call a class method
  for Node in Tree using GetEnumerator(teInOrder) do
    Dosomething(Node);
 
  // Variant2. Or we can call the global function
  for Node in Tree using GetEnumerator(Tree, teInOrder) do
    Dosomething(Node);
 
  // Variant3. In the previous variant 'in Tree' is useless so the next code is a simplified form:
  for Node using GetEnumerator(Tree, teInOrder) do
    Dosomething(Node);
 
  // Variant4. We can try to avoid new context key-word 'using' by calling method:
  for Node in Tree.GetSomeEnumerator(teInOrder) do
    Dosomething(Node);
  // but this brings ambiguity to the compiler since Tree.GetSomeEnumerator(teInOrder) can be translated into
  // Tree.GetSomeEnumerator(teInOrder).GetEnumerator
  // This ambiguity might be resolvable by checking whether the class implements IEnumerator interface
end;
It is true that any type can have only one GetEnumerator method attached to it. But that's IMO not the same as "It is impossible to choose among different possible enumerators."

I guess, i.e. this was not tested and not even tried - it is already possible to have different enumerators on a type. The desired behaviour could be achieved presumably in two ways.

Using IEnumerator?

Iff the current FPC implementation checks for existence of a IEnumerator derived implemented interface of a type (i.e. as opposed to checking for existence of exactly only IEnumerator interface), then it should be possible to have multiple distinct IEnumerator derived interfaces implemented by a type (each one returning a different enumerator object) and then "select" the enumerator like this:

 
  // maybe parenthesis would be required bellow, i.e. (Tree as ITreeInOrder)
  for Node in Tree as ITreeInOrder do
    Dosomething(Node);
 
  for Node in Tree as ITreeBreadthFirst do 
    Dosomething(Node);
 

Using the enumerator operator?

Another possibility is to declare multiple enumerator operators with differently typed operands and then cast the real type to select the specific enumerator:

 
type
  TInOrderTree = class(TTree);
  TBreadthFirstTree = class(TTree);
 
operator Enumerator(Op: TInOrderTree): TTreeEnumerator;
begin
  // implementation
end;
 
operator Enumerator(Op: TBreadthFirstTree): TTreeEnumerator;
begin
  // different implementation
end;
 
//...
  for Node in TInOrderTree(Tree) do
    Dosomething(Node);
 
  for Node in TBreadthFirstTree(Tree) do 
    Dosomething(Node);
 

The choice

The interface based variant seems a bit heavy to me. The casting is maybe somewhat lighter (per opinion), but at the same time a bit dangerous as type casts in FPC are - and have to be for a reason - usually a way how to tell the compiler not to take care of any type checking no more (an oversimplification, I admit, but ~applies to the current task). Both variants do also require declaration of new types, but that's a minor issue.

Let's get back to Paul's original proposal and specifically the Variant4. The observation "but this brings ambiguity to the compiler since Tree.GetSomeEnumerator(teInOrder) can be translated into Tree.GetSomeEnumerator(teInOrder).GetEnumerator" is correct. IMO the compiler must interpret the code exactly in this and only this way, but that's another story. Variant4 is also interesting here as it was chosen to play with in an attempt to get a usable case/implementation/usage pattern.

It's a matter of only personal preferences, but I like distinctly naming the specific "GetEnumerators" more than parameterizing a single one, so the container's client code will be something like:

 
  for Node in Tree.InOrder do
    Dosomething(Node);
 
  for Node in Tree.BreadthFirst do
    Dosomething(Node);
 
It's a few characters shorter and IMO pretty readable also. Conceptually, one should choose which enumerating behaviour should be the default one, supposedly based on the estimated typical usage pattern, and implement that enumerator as the (single) .GetEnumerator method of such type. All the other enumerators, if properly named, would make the intention clear. For example a TSomeMap could have a "default" enumerator returning the key-value pairs and the the specific/special .Key and .Value enumerators:
 
type
  TKey = String;
  TValue = Integer;
  TPair = record
    Key: TKey;
    Value: TValue;
  end;
 
  TMyMap = class(specialize TSomeMap<TKey, TValue>)
  // ...
  end;
 
// ...
var 
  M: TMyMap;
  P: TPair;
  K: TKey;
  V: TValue;
// ...
  for P in M do
    ProcessPair(P);
  for K in M.Key do
    ProcessKey(K);
  for V in M.Value do
    ProcessValue(V);
 

The implementation

Bellow are excerpts from heContnrs.pas implementing both the default (0 .. Count - 1) and the .Reversed (Count - 1 .. 0) enumerators of a vector.

 
//...
interface
 
type
 
  { TheEnumerator }
 
  generic TheEnumerator<TIterator, TValue> = object
  public type
    TGetCurrent = function(const Iterator: TIterator): TValue of object;
    TMoveNext = function(var Iterator: TIterator): Boolean of object;
  private
    FGetCurrent: TGetCurrent;
    FIterator: TIterator;
    FMoveNext: TMoveNext;
    function GetCurrent: TValue;
  public
    procedure Init(const InitialIterator: TIterator; const Mover: TMoveNext; const Getter: TGetCurrent);
    function MoveNext: Boolean;
    property Current: TValue read GetCurrent;
  end;
 
  { TheEnumeratorProvider }
 
  generic TheEnumeratorProvider<TProvidedEnumerator> = object
  public
    FEnumerator: TProvidedEnumerator;
    function GetEnumerator: TProvidedEnumerator;
  end;
 
  { TheVector }
 
  generic TheVector<TItem> = class
  public type
    //...
    TEnumerator = specialize TheEnumerator<Integer, TItem>;
    TReverseEnumeratorProvider = specialize TheEnumeratorProvider<TEnumerator>;
  private type
    PItem = ^TItem;
  private
    //...
    function GetItems(const Index: Integer): TItem;
    function MoveNext(var Index: Integer): Boolean;
    function MovePrev(var Index: Integer): Boolean;
  public
    //...
    function GetEnumerator: TEnumerator;
    function Reversed: TReverseEnumeratorProvider;
  end;
 
implementation
 
uses
  SysUtils;
 
{ TheEnumerator }
 
function TheEnumerator.GetCurrent: TValue;
begin
  Result := FGetCurrent(FIterator);
end;
 
function TheEnumerator.MoveNext: Boolean;
begin
  Result := FMoveNext(FIterator);
end;
 
procedure TheEnumerator.Init(const InitialIterator: TIterator; const Mover: TMoveNext; const Getter: TGetCurrent);
begin
  Assert(Assigned(Mover));
  Assert(Assigned(Getter));
  FIterator := InitialIterator;
  FMoveNext := Mover;
  FGetCurrent := Getter;
end;
 
{ TheEnumeratorProvider }
 
function TheEnumeratorProvider.GetEnumerator: TProvidedEnumerator;
begin
  Result := FEnumerator;
end;
 
{ TheVector }
 
//...
function TheVector.GetItems(const Index: Integer): TItem;
begin
  Assert((Index >= 0) and (Index < Count));
  Result := FData[Index];
end;
 
function TheVector.MoveNext(var Index: Integer): Boolean;
begin
  Inc(Index);
  Result := Index < Count;
end;
 
function TheVector.MovePrev(var Index: Integer): Boolean;
begin
  Dec(Index);
  Result := Index >= 0;
end;
 
function TheVector.GetEnumerator: TEnumerator;
begin
  Result.Init(-1, @MoveNext, @GetItems);
end;
 
function TheVector.Reversed: TReverseEnumeratorProvider;
begin
  Result.FEnumerator.Init(Count, @MovePrev, @GetItems);
end;
 
end.
 

The example

This code ...

 
program project1;
 
{$mode objfpc}{$H+}
 
uses heContnrs;
 
type
  TStrVector = specialize TheVector<String>;
 
var
  V: TStrVector;
  S: String;
 
begin
  {$if declared(HaltOnNotReleased)}
  HaltOnNotReleased := True;
  {$endif}
  V := TStrVector.Create;
  try
    V.Add('foo');
    V.Add('bar');
    V.Add('baz');
    Writeln('for S in V ...');
    for S in V do
      Writeln('S = ''', S, '''');
    Writeln('for S in V.Reversed ...');
    for S in V.Reversed do
      Writeln('S = ''', S, '''');
  finally
    V.Free;
  end;
end.
 
... compiled and executed, produces this output:
$ ./project1
for S in V ...
S = 'foo'
S = 'bar'
S = 'baz'
for S in V.Reversed ...
S = 'baz'
S = 'bar'
S = 'foo'
Heap dump by heaptrc unit
9 memory blocks allocated : 892/912
9 memory blocks freed     : 892/912
0 unfreed memory blocks : 0
True heap size : 360448
True free heap : 360448
$

The credits

The authorship of the original ideas on multiple enumerators, leading to this post and the above implementation, should be fully credited to and only to Paul Ishenin. My best thanks to him for sharing his work.

2 comments:

  1. Nice to see that someone found this feature useful and made such a good testing :)

    ReplyDelete
  2. Many thanks to you Paul for making most of the heLib/heContnrs functionality possible :-)

    ReplyDelete