Data Access and Strong Typing

Sunday, July 31, 2005

There is a gap between the world of SQL data and objects in statically typed language environments. Efforts are being made to bridge it, and there are different approaches pursued.

I'm running experiments as well. The idea is to use generic types, dynamically constructed from column types. Doing so with an interpreter (.NET Console), the goal is to hide the fact that reflection is used to construct types, and make things seamless at that language level. But before mucking with the language, let's create utility functions as a proof of implementation.

The point here is that records have a type, just like in SQL, even though that type changes with the select statement - it depends on the table and columns qualified. The application should be able to use such information for greater type safety.

Type safety, anyway. An interpreter doesn't perform static type checking - there's no difference between compile time analysis and runtime (even though it's possible and often useful to detect certain errors before any code is executed). Apart from that difference, type checking in scripting can be just as strict as in a compiled language. Using expressive types helps recuce errors.

What's most important is that type coercion not occur. Especially anything that involves parsing, like string/integer conversions. Errors should occur always when there's a type mismatch, not just some of the time. That's what I mean saying that strong typing has value even in a scripting environment.

Lab Time

We'll use the code interpreter in WebEdit.NET (.NET Console). The user function can be coded in an editor window and evaluated with code:Vars.Eval(Vars.TextAll). Furthermore, we need the following:

Whidbey (Beta 2 is OK).
The DataDocs AddIn, up and running.
The Tuples library.

As a prerequisite, the Tuples library must be built as a DLL, copied into WebEdit's "bin" directory, and loaded with the command code:load Tuples.dll.

First, we need a data reader. With the DataDocs AddIn we can make use of data source definitions as well as existing data items. Or, we can just pass an ad-hoc command string to some routine for querying against any data source defined:

dataGetReader("data:item?Documents.All@WebEdit");
dataGetReader("data:sql? select * from Documents @WebEdit");

Of course, you can obtain the data reader any way you like. In this example, I assume a certain data source definition. So, let's have a function:

dataGetReader(sCmd)
{
    dsp = DataDocs.CDataDocsConnector.Instance.DataStorageProvider;
    entry = dsp.CreateHistoryEntry(sCmd);
    storage = dsp.CreateStorage(new CWebEditDocument(), entry);
    cmd = storage.GetDataSelectCommand();
    cmd.Connection.Open();
    cmd.ExecuteReader();
}

Now, we've got a data reader in open state. Later, we need to close it, as well as the connection, but that's not part of this exercise.

The data reader has information about the column types. We shall now turn the to the generics facilities in .NET 2.0, so that we can turn records into objects of a type specific to the table:

dataGetTupleType(record)
{
    Check.ValueSmallerOrEqual(record.FieldCount, 5, "Generic tuple support is limited to 5 columns max.");
    sbType = new StringBuilder();
    sbType.Append("Tuple<");
    fAfterFirst = false;
    loop(i in record.FieldCount){
        if(fAfterFirst){
            sbType.Append(",");
        }
        sbType.Append(record.GetFieldType(i).FullName);
        fAfterFirst = true;
    }
    sbType.Append(">");
    Reflect.FindType(sbType.ToString());    
}

You can test this function and conveniently trace the type name with the following statements:

tp = dataGetTupleType(reader);
Reflect.GetTypeName(tp, new CCSharpSyntaxProvider(), CCodeOptions.Natural);
// Tuple<string, string>

We'll collect the row objects in a generic list bound to that tuple type:

dataGetListType(tpTuple)
{
    tpGenericList = Reflect.FindType("System.Collections.Generic.List<>");
    typeArgs = Ary.CreateArray(tpTuple.GetType(), tpTuple);
    tpGenericList.MakeGenericType(typeArgs);
}

Finally, the following function instantiates record objects and puts them into a new strongly-type list, when given a data reader:

dataGetTupleList(reader)
{
    tpTuple = dataGetTupleType(reader);
    tpList = dataGetListType(tpTuple);
    list = Activator.CreateInstance(tpList);
    while(reader.Read()){
        args = new object[reader.FieldCount];
        loop(i in reader.FieldCount){
            args[i] = reader.GetValue(i);
        }
        rec = Activator.CreateInstance(tpTuple, args);
        list.Add(rec);
    }
    reader.Close();
    list;
}

Language Integration

If we knew the row type at coding time, we could create both the objects and the list more simply, writing:

new Tuple<string, string>(reader.GetValue(0), reader.GetValue(1));
new List<Tuple<string, string>>();

Remember that we're in a scripting scenario here. The helper functions above could just as well live somewhere in the interpreter, and mapping query results to record-specific types could be part of the language:

list = select * from documents;

Since SQL statements are terminated by semicolons, just like C# statements, they could occur anywhere an expression is expected. We'd just need more keywords, but hey, that hasn't stopped me before. How far are you willing to go?

There's one interesting language project for integrating data: The Cω (pronounced "C Omega") research language. It's based on C#, and under the hood makes use of new Whidbey features, such as iterators and generics. Interestingly, it also has tuple types.