Hidden Image for Share

Saturday, July 7, 2012

On Types - Part 1

Welcome to the first instalment of my Code Architecture blog, a place for me to document thoughts I've gained while working on multiple projects, where I think app development is headed, and how to get there.
As always, please feel free to add discussions at the bottom of each post or suggestions on topics to cover.

This initial post is going to cover what I consider one of the most fundamental topics - data types. That is, the values you can represent within an application, and how to use them. 

Primitives - Numbers, Letters, ...

These are the most low-level types available - ones that map directly into binary values in hardware; For example, if you have a single bit, you can have a boolean value. This can be extended to 8-, 16-, 32- and 64-bit values (or higher), which can be interpreted as a number of types - integers, floating-point numbers, characters, enums, ... but that's about it in terms of how it's stored.

This gets us to the first area for expansion that I see, which is in terms of variable-length primitives. For example, UTF-8 is a common way to have a variable-length character set, and a few languages have arbitrary-precision integers, though there seems to be no one agreed-upon way of representing arbitrary-bit values using only 2^n-bit numbers.

Complex - Lists (and Strings), Maps ...

There are a few more concepts that appear all over the place while writing software - the most basic is a list (or array, vector, ...). Simply put, it's an ordered collection of primitives, with the most typical example being a list of characters, or a string. These are used everywhere for text, but at a higher level, lists are required to represent anything ordered, and can be combined for higher dimensions - e.g. a list of lists for a grid/table.

For unordered items however, by far the most useful type appears to be the Map, at least where model data for an application is concerned - and in particular, StringsMaps, where the values are each keyed by a string. A StringMap (or Dictionary) is the fundamental type in both JavaScript and Python, as well as representing the underlying structure of an object in O-O languages, with field names mapping to their values.

More types exist (collections, sets, binary trees, ...) but mostly these can be implemented in terms of the above, and are less common in usage - so a modern language mostly just needs primitives, lists, strings and stringmaps for its data model types. As an aside, perhaps for a later post, I feel that (nested) stringmaps are the best approximation so far of how humans index information themselves - e.g. chair = {legs: 4, colour: "black", cushion: true} etc... hence their prevalence and usefulness today.

Type extension: Nullability

Given a language with defined value families, this gives us the ability to represent a lot of real-world entities in a program - but there's still a problem in that all values are required; to properly represent optional values, a type system will also need the concept of nullability - to be used when a value is unset.

For an example of what this gives you, consider a recursively defined data structure, like a binary tree; If all nodes require two child nodes, and they require two children, and those four grandchildren require two each, and ...etc, suddenly simply can't define all the (infinite) objects required, there needs to be a way of optionally not having some. Of course, it's possible to have these without nullability - e.g. add an 'isSet' boolean  to every type, and a singleton 'null' value to use when isSet is false (or in the tree example, have 'parent' and 'leaf' types) but these all degenerate into essentially equivalent scenarios, with the concept of nullability being the simplest generalisation. Linked lists, skip lists, B trees, tries etc, and anything using them, are all improved by nullability.

Nullability also assists with asynchronous data (later post coming), for a convenient default value of things before they're fetched; it does have the downside often of forcing repeated null checks, though one possible way of dealing with those will be coming later, and other techniques exist (@NotNull, or the Elvis operator).

The combination of the types above (primitives, complex, nulls) give you enough to write a lot of things in a language - but there's still more aspects to types, problems which I see appearing that are still to be solved by different constructs - that said, this post is already pretty long, so tune in next time for types part 2! And as mentioned before, please post any thoughts or requests in the comments section below.

No comments:

Post a Comment