Hidden Image for Share

Sunday, September 30, 2012

On naming arguments.

I've recently started trying out one of the course offerings over at Coursera.org - a guide to functional programming, using Scala. The approach I take to coding certainly seems functional-based, so it seemed a good way to practice but also learn the approaches taken by a new, familiar-ish language.

The relevance here is that the recent lectures included an introduction to currying, and got me thinking about something which has been growing on me for a while now: the current default convention of ordered function arguments is inferior to using named function arguments.

A brief history of arguments

Firstly, some background. Traditionally, in languages like C, Basic and Java, a function is given a name and an argument list - they can be referred to in the function by name, but this is for convenience - things like bash allow explicit index references, and all callers always specify the arguments by list. The background for this, I believe, is the underlying machine instructions used for function calling - parameters are on a stack, so having them in an ordered list is an easy way for callers and callees to agree where in the stack they are situated.

That said, there appears to be a trend in languages like Python, Dart and Ruby towards passing parameters by name. If you have read my earlier post on types, you might recognise what's going on here - it's a simple change, instead of every function accepting a list<type> as arguments, it now accepts stringmap<type>, where the keys of the map are the (string) names of the arguments.

The next questions to ask are why newer languages have gone this way, is it really better (or should we revert to argument lists), and at what cost? The immediate cost is clear - maps can't easily be serialised to the function stack, so there's a potentially unacceptable additional cost to pay; that said, I don't think this cost is as big as it first seems (and, I believe, could actually be an improvement) - and here's why: the way apps are written, the function stack architecture could do with a cleanup.

An inefficient stack?

This is what I was mentioning earlier when talking about the Scala course - you see, you can allow Scala functions to be curried by iteratively currying the first parameter(s). As an example:
sum(a: Int, b:Int) = ... // sum of numbers between a and b
triangle = sum(0) // triangular(n) = sum(0)(n) = 0 + ... + n
The second function is a currying of the first, where the first parameter is always set to zero. However, what happened if I wanted a function which sums up to a constant (e.g. negtri = sum(?, 0)) - this requires a different syntax and may result in a different underlying implementation. Why? Well, consider the most efficient implementation of the code above. Assuming a normal stack architecture, we could have the following:
triangle:
  push 0; // curry on the new param
  invoke sum; // call sum instead.
Now, say I have a function with 5 parameters, and want to curry in two, not at the start. With this change, the function has to pop off the arguments from the stack, then re-push everything in the new order. With longer call chains, multiple arguments and liberal usage of currying, I'd be intrigued to find out how much time the average program spends rearranging arguments. My guess is, it's non-trivial.

A scoped 'stack'

No, instead what is needed is for functions to know where on the stack their parameters can be read from - so really a random-access list instead of a stack. This has a number of complications, though I feel that most of these can be solved with a few compilation tweaks to carry around more state, now that functions don't exist in isolation, but instead within the scope of an executing program. Which is exactly what I see in the way that much of application code is being written - within contextual scopes. Given the prevalence of server architectures, request scope (or RPC scope) is an obvious example, but even also just simple object scope: if your program is a data storage layer for UserProfile objects (say), then you will likely be spending much of the execution time calling functions scoped within a given UserProfile object, getting/setting its values, updating its listeners, persisting, ... - if we assume that the profile ID (or a reference to it) is needed for all of those functions, then rather than passing that around all over the place, it'd be ideal to fix its location on the stack in a known location, and leave it there for the entire profile scope.

The method above will result in minimal rewriting of argument locations, but also now leads to the ability to curry arbitrary parameters, something that simply could not be done before, at the cost of callers needing to know which scopes to fill out before calling a function. These scopes are then the name of the parameter (e.g. currentUser above) and map to a location in the 'stack' (/array).

There's still some implementation specifics to sort out, but I'm sure they're tractable, and for the switch to named parameters you get a possible improvement in parameter passing, and the ability to curry arbitrary subsets - a big win! Plus: adding a new parameter to the stringmap (by overwriting the old keyed value) is much simpler than the old list (requiring linear insertion), and callers are less likely to make mistakes in specifying parameters when you have to name them (..."I'm passing true,false,false,true...is that the right order"?).

The biggest against / for?

It's probably a good time to mention another downside at this point, which is its wordiness - having to name each parameter you pass, instead of inferring the names from the index, is quite verbose: you've just doubled the amount of text per function call, with a bunch of duplication. There are a few things to mitigate it (better IDE support, assuming the param name is the name of the variable by default, ...) but a verbose language is always going to detract programmers.

This last part leads in to the biggest gain however, in my eyes. You can always afford to make something a bit more verbose if you are using it to better map the digital program to the real-world objects/concepts that its is representing. In this case, sum above does not take two ordered ints - it takes a lowerBound and upperBound, callers can specify what the bounds they are giving are, and currying code can specify the names of the concepts for which it is putting in scope. It'd be good to acknowledge the argument list was a good, albeit flawed, approximation of named parameters, and the sooner we support the ('truer') stringmap version, the sooner we can make cleaner programs.

Hopefully how we can do that will be covered in the next post - on rewriting executing code. As always, I'm very keen to hear your thoughts on ordered vs. named arguments! Starting points: are there more pros/cons? how big of a problem is being verbose? is it feasible to rearchitect the stack frame?

1 comment:

  1. Nice post, one thought I have is that each has its place. The very low cost of the stack is nice at a very low level, but I definitely end up using named arguments in python (though that has the serious gotcha of how default arguments are instantiated...).

    ReplyDelete