Wednesday 24 December 2008

Christmas Profiling Tip

If you're profiling some code and you get something like this from hp2ps ...


c:/ws/fpfdev/depot/QA/EDG/EDG_priv/FPF_Dev.br/src $ hp2ps -c
GenBarriers.hp
c:\ws\fpfdev\depot\QA\EDG\EDG_priv\FPF_Dev.br\ThirdParty\ghc\ghc-6.10.1\
bin\hp2ps.exe: Disaster! (bucket out of range)


...then the chances are it's because you have a partially truncated .hp file (eg if your program didn't exit cleanly). This is easily fixed, by searching for the last BEGIN_SAMPLE line in the file, and deleting everything from there onwards.

Sunday 9 November 2008

Why does Functional Programming matter?

Of course we all know the answer...

...but recently I've been wondering how to explain what I feel is important about FP in a pithy, succinct way. (I've frequently found myself failing to explain it well).

I think in the future I'm going to say this:

"Functional Programming makes programs easier to understand. And that means they're less likely to go wrong."
For my money this is why functional programming is vitally important - because the biggest problem we have right now isn't concurrency ... it's the fact that we can't even write single-threaded programs that work properly. (Of course "go wrong" is used above in the standard rather than the Milner sense).

One other comment I'd make is that I think the "easier to understand" bit becomes more apparent with bigger programs.

I'd be interested in hearing if anyone's got a better way of describing it.


Thursday 30 October 2008

2 Minute intro to Associated Types / Type Families

Type Families are an important recent addition to GHC which have been developed over the past couple of years (and indeed are still being developed). They promise to address some of the same issues addressed by functional dependencies whilst avoiding some of the nastier corner cases and allowing for a more solid theoretical underpinning and implementation.

There is a lot of material available on Type Families, in fact it seemed somewhat bewildering to me initially. There doesn't seem to be an "idiot's guide" overview - so that's what this post attempts to do - to provide a just enough background to help you work out which papers etc you want to read next.

Terminology and Syntax

The thing to note is that there are essentially four different concepts, each of which have a couple of different terms for them:

Associated (Data) Type
class ArrayElem e where
data Array e
index :: Array e -> Int -> e

instance ArrayElem Int where
data Array Int = IntArray UIntArr
index (IntArray a) i = ...
Associated (Type) Synonym
class Collection c where
type Elem c
insert :: Elem c -> c -> c

instance Eq e => Collection [e] where
type Elem [e] = e
...
Data (Type) Family
data family Array e

data instance Array Int = IntArray UIntArr

data instance Array Char = MyCharArray a b
(Type) Synonym Family
type family Elem c

type instance Elem [e] = e

type instance Elem BitSet = Char


Associated or Family?


The first "axis" of categorization is "Associated" vs "Family". The "Associated ..." variants (which were invented first) are those which are declared inside a standard typeclass declaration, the "... Family" variants are stand-alone, top-level, use the "family" keyword and were invented a year or so later. The Family variants (collectively known as Type Families) are strict generalizations of the associated ones, and the associated ones are simple syntactic sugar for the family variants.


Data or Type Synonym?


The other "axis" of categorization is "Data" vs "Type Synonym". This distinction mirrors that of normal Haskell "data" and "type" declarations. The key point is that associated data types and data families let you create a bijection (ie one-to-one mapping) from source type to destination type. Associated type synonyms and synonym families on the other hand allow you to map two different souce types onto the same destination type. The best reference for more details on the difference is the first part of section 7 of the "Associated Type Synonyms" paper.


Equality Constraints
The final piece of syntax allows us to assert type-level equalities using the above:

sumCollection :: (Collection c, Elem c ~ Int) => c -> Int
sumCollection c = sum (toList c)

This (as with the other examples) is slightly modified from one of the papers - in this case the "Associated Type Synonyms" paper where the syntax uses "=" rather than the "~" which was ultimately used.


Compatability
Many of the above features are actually available in the GHC 6.8 branch but they are only really supported under 6.10 onwards. (I understand that the main reason the code is in 6.8 at all is to facilitate merging of other, unrelated fixes between those branches). I've had success with both kinds of associated types under 6.8 but ran into problems using equality constraints.

Sunday 3 August 2008

Join-ing the Blogosphere

I'd like to share a little Haskell snippet (I wanted to say "idiom" but actually I'm not sure if I've seen it used anywhere else...) that I quite like:


join (,) 9


What does this do? On its own, not much - in fact you just get:


(9,9)


...which you'd have been much better off typing in yourself. Where it can come-in handy though is when writing code in a pointfree style if you have a need to duplicate an anonymous value into both parts of a pair:


functionWhichTakesAPair . join (,) . functionWhichProducesASingleValue


Anyway, how / why does this construct work....?

The (,) is just the standard data constructor for a 2-tuple, a pair. (This constructor has type a -> b -> (a,b) so you can always write (,) 9 9 as an alternative notation for (9,9) if you wish).

The join is the standard function :: Monad m => m (m a) -> m a we know from monadic programming (Control.Monad). But this just raises the question of how it can make sense to apply it to (,) which as we've seen is a function of type a -> b -> (a,b).

The answer is of course that this function may be considered to be of type m (m a). This is because Control.Monad.Instances declares any partially applied function (eg a -> ...) to be a monad (the type a can be seen as representing an encapsulated environment which is threaded through by the monadic bind). This then means that a function of two arguments (such as (,)) can be seen as having type m (m a) which is just what we need to be able to apply join. The net result is that the single argument of the resulting function (join (,)) is passed in twice as desired.

This of course generalises to functions other than (,) - whenever we use join f where f is a function of two arguments, we end up with a function of one argument which is passed into f twice.