[Journal - Data Types as Measurement Abstractions]

Data Types as Measurement Abstractions

Tuesday, March 22, 2005

You'd think now that high-level languages have won the battle, and developers have easy ways to define abstract data types, we could finally remember a few lessons from physics class.

The lesson is that we distinguish quantities from measurement units. For example, speed is defined as distance divided by time, and the units of measurement for speed could be miles per hour, kilometers per hour, meters per second, and so on.

Note that speed is not the same thing as velocity, although the latter sounds more sophisticated. Such pretentious sophistication is often contradicted, however, by defining classes called "MphVelocity" and the like, which is one of the points this post is trying to make.

Now, if there was only one unit per quantity, we wouldn't have a problem. We could just use ints, doubles, or decimals, and in general follow the misguided approch of java.util.Date. One of the reason for defining ADTs is of course type safety, and it's a good reason.

The second reason is related to the general mess of measurements: data types can abstract these things, so when you have a TimeSpan object, you don't need to know the units in order to calculate, and you can express that quantity in any suitable unit (not in meters or ounces, though).

System.DateTime and System.TimeSpan are good expamples for such types, and for evil .NET clowns like myself, mark a refreshing contrast to the simple-mindedness of Java [What is it with me today? Why do I bash Java when I'm on vacation?]. These two data types also reinforce the point made here.

Sure, doing calculations in a meaningful way requires the definition of the right routines - functions, methods, or operators: functions are best, because there's mostly a many-to-many relations between data and operations - but it all starts with having the right data types.

What is not needed, in my view, is defining classes for individual units, like MphMeasurement of MpsMeasurement. Any unit conversions should be done in the quantity class (Speed, in our case). Contrast such unit conversions (which are the reason for having the class in the first place) with calculations involving other quantities.

That still leaves us with some questions about input and output. At the end of the day, measurements are just numbers, and for serialization, it is usually best save a unit descriptor along with the data, and use scientific units. For display, the whole issue is related to localization, and implementing System.IFormattable will take care of that.

One quantity that the software industry has not yet abstracted (at least in common frameworks) is, of all things, memory size. We usually deal in bytes, but sometimes (ig., for display purposes), other units are used. So that's a good case for defining an ADT.