One Reason Typeclasses Are Useful

valenterry · on Dec 13, 2021

Nice article.

Not a lisp developer, but typeclasses have become a bread&butter feature of programming languages to me: if there is a new language and it doesn't support typeclasses, it's almost guaranteed that I'm not very interested in this language.

sgeisenh · on Dec 13, 2021

Type classes are useful but can be overly restrictive as implemented in languages such as Haskell. In particular, Haskell prevents multiple implementations of a type class for a particular type (a property called canonicity). But there are lots of situations where there are many reasonable implementations of a type class for a particular type.

I hadn't seen a great type-safe way to support ad hoc polymorphism until I read about [modular implicits][1] in OCaml. In situations where there is a reasonable module that satisfies a specified signature for the type of an argument, the module can be inferred.

[1]: https://arxiv.org/pdf/1512.01895.pdf

curtisf · on Dec 13, 2021

Having canonical implementations is a feature, not a bug, of type-classes.

For example, consider the type `Data.MaxHeap a`, which is a persistent max-priority heap.

Max-heaps can be combined in logarithmic time, as long as they are ordered internally by the same ordering. Haskell provides `Data.Heap.union :: Ord a => MaxHeap a -> MaxHeap a -> MaxHeap a`.

This is only possible because the type guarantees that the ordering of the two MaxHeaps is the same (the one single implementation of `Ord a`). It also means that you don't have to store any v-table within the `MaxHeap` value, making lookup just a bit faster and making optimizations like monomorphization easier.

If you didn't have canonical implementations, you would have to give up being able to write safe & efficient data-structures like this.

You would need to be able to dynamically store orderings (wasting space), dynamically compare orderings (wasting time), and have redundant failure paths only for the extremely uncommon and undesirable case where you try to `union` two MaxHeaps with different internal orderings (MaxHeap a -> MaxHeap a -> Maybe (MaxHeap a)).

-----

An alternative might be dependent types, where you can instead insist that the ordering _values_ within the two heaps are compatible, gaining the efficiency (by using "ghost" variables) and safety of canonical implementations while avoiding the , but this is more complicated to use, and Haskell doesn't have proper dependent types (yet?).

----

However, in most situations where you might want to try violating canonical instances, defining a newtype is a perfectly satisfactory way to do it.

jules · on Dec 13, 2021

If you compare it with how you'd implement such a data structure using ML modules, it becomes very clear that the bug here is the definition of Data.Heap.Union, not the non-canonicity of type classes.

With ML modules you'd define a Heap module that is parameterised by an Ord module. Different instantiations of the Heap module for the same type `a` but different instances of `Ord a` are different: heaps with Ord1 are of a different type than heaps of Ord2.

From this point of view, passing the Ord instance into each individual heap function call (such as union) is clearly not the right setup. The canonicity of type classes is merely a bandaid for that problem.

tel · on Dec 13, 2021

Generative functors indeed fixes the issue, but it’s kind of interesting because they’re similar in a way to newtypes. In each case, you generate a new type, distinct from the others, which holds a new implementation of the same interfaces.

It comes down to ergonomics where OCaml’s approach tends to make local reasoning easier and Haskell’s approach makes it a little easier to transform structurally isomorphic types into one another by wrapping/unwrapping.

jules · on Dec 17, 2021

Probably the last word on this problem hasn't been said, because all approaches seem to have some downsides. Could you give an example where Haskell's approach makes it easier?

curtisf · on Dec 13, 2021

Is this not just moving the problem to a different place? (Maybe it's more ergonomic for some uses)

In Haskell the solution is to use different type parameters via newtypes.

In ML, you define a distinct instantiation of the MaxHeap.

So the difference (in Haskell terms) is something like

`HeapInstantiation1` vs `Heap Wrapped1`

Is this practically a big enough difference to call the Haskell approach wrong? Is it not useful to have a single instantiation able to handle all parameters?

jules · on Dec 17, 2021

I think one practical disadvantage of the Haskell approach is that you need to wrap and unwrap your types if you want a different ordering on an existing type.

But the main reason why I didn't like the Haskell approach isn't ergonomic but conceptual. To me, the type of the heap should depend on the Ord instance, because the invariant satisfied by the values of type `Heap a` depends on the Ord instance of `a`, not just on the type `a` itself. This invariant is not encoded in the type system in Haskell, but in a dependently typed language you could do that. But in order to even write down the invariant, you need access to the Ord instance.

So the type signature of union could be:

    union : {a:Type} → {o:Ord a} → Heap o → Heap o → Heap o

In this case the problem of mixing up heaps with different orderings is ruled out by ordinary type checking, rather than via an extra meta-theoretical invariant satisfied by the language. Although ML modules don't let you encode the invariant either, they at least set it up in the same way, because the type Heap(o).t depends on the o and not just on o.t, and the type checker also views it that way and will say Heap(o).t ≠ Heap(o').t even when o.t = o'.t.

In Haskell on the other hand, if we desugar type classes to passing around records, then we can suddenly break the invariant in type safe code.

Maybe I'm imposing values on Haskell that have no practical significance in that context, but to me the way Haskell does it feels like a hack.

volta83 · on Dec 13, 2021

> If you didn't have canonical implementations, you would have to give up being able to write safe & efficient data-structures like this.

No? Just make the type generic on the ordering.

Instead of MaxHeap(T), make it MaxHeap(T, Ord:(T,T)->bool) or something like that.

curtisf · on Dec 13, 2021

I explained why this doesn't work -- what do you do when you combine two heaps with different internal orderings?

You need some way to enforce that the orderings are the _same_. A canonical ordering per type is how Haskell does it; dependent types are a more complicated alternative method.

remexre · on Dec 14, 2021

I think GP is suggesting... well, dependent types, but in a restricted enough way that I don't believe it adds additional complexity over DataKinds. At least, I read

> Instead of MaxHeap(T), make it MaxHeap(T, Ord:(T,T)->bool)

as saying the kind of MaxHeap should be, in Coq syntax, (forall (T : Set), (T -> T -> Bool)). I think this doesn't add any additional complexity vs DataKinds since the function doesn't need to be evaluable at typechecking time, just unified against at construction-time.

volta83 · on Dec 15, 2021

> You need some way to enforce that the orderings are the _same_

Just require that the Ord type is the same for all heaps ?

MaxHeap(MaxHeap(T, MyOrd), MyOrd) uses the same ordering, but MaxHeap(MaxHeap(T, Ord0), Ord1) does not.

volta83 · on Dec 16, 2021

A simple way to do this is to:

MyNestedMaxHeap(T, Ord) = MaxHeap(MaxHeap(T, Ord), Ord)

such that when using MyNestedMaxHeap only one Ord type can be passed.

remexre · on Dec 13, 2021

Does it break anything to allow datatypes to be parametric over instances? eg

    instance absOrd : Ord Int where
        -- ...

    union :: (ord : Ord a) => MaxHeap a ord -> MaxHeap a ord -> MaxHeap a ord

If it's still impossible to create instances at runtime, I don't think this is equivalent to proper dependent types, only DataKinds.

kccqzy · on Dec 13, 2021

What's wrong with newtypes? Even Rust is using them, see for example Wrapping: https://doc.rust-lang.org/std/num/struct.Wrapping.html

> I hadn't seen a great type-safe way to support ad hoc polymorphism

This is an oxymoron. Type classes were invented to create type-safe ad hoc polymorphism (as opposed to parametric polymorphism). You should check out the original paper that introduces type classes back in 1989—it literally had a title of "How to make ad-hoc polymorphism less ad hoc" by Wadler (who else?)

I only skimmed that paper about modular implicits, but it does seem inspired by Scala implicits, yet another earlier work to make ad hoc polymorphism useful.

joppy · on Dec 13, 2021

What's wrong with newtypes is that if you want to define "the monoid of integers under addition" and "the monoid of integers under multiplication" (or similar, the second isn't really a monoid because of zero), then an expression like a*x+b becomes some kind of newtype hell:

toAdditive ((toMultiplicative a) <> (toMultiplicative x)) <> toAdditive b

or something similar. This is of course an insane way to program, and so really what is done is that we define two entirely different operations of addition and multiplication so that they can be used without newtypes.

What would be cool is if operations could ad-hoc be bound to instances of typeclasses, for instance you could accumulate a list of integers inside the (0, +) monoid, or inside the (1, *) monoid. Of course this is basically what the fold functions in Haskell do, but you could imagine being able to formalise this pattern at the type level in a less newtype-hellish way.

greydius · on Dec 13, 2021

> toAdditive ((toMultiplicative a) <> (toMultiplicative x)) <> toAdditive b > This is of course an insane way to program

If you have a set with two binary operations, then a monoid isn't the correct abstraction.

How would you propose to define <> so that a<>x<>b means ax+b and not e.g. a+xb. And how would your proposal be materially different from newtype conversions?

joppy · on Dec 13, 2021

> If you have a set with two binary operations, then a monoid isn't the correct abstraction.

I guess my point is that even if I used "the correct abstraction" here, meaning a ring or something, then the whole newtype setup still makes it annoying for me to split off one operation and feed it into a function expecting a monoid: the way that fold and friends work is much more natural, just taking a starting value 1 and an operation * for example, rather than some kind of "Multiplicative" newtype or whatever.

In general I see a lot of the abstract algebra type towers in Haskell and Purescript as being unhelpful, and making a huge assumption that (for example) there will rarely be two monoid instances on a type. In reality in abstract algebra we use tons of different operations on the same objects all the time, and I really just want to define those operations and then be able to use them.

kccqzy · on Dec 13, 2021

An expression like a*x+b can be written that way directly. It's only when you need to take advantage of some folds do you really start using newtypes. And in Haskell you'd usually just do a coerce on an entire collection of them.

gugagore · on Dec 13, 2021

I don't know why the Wikipedia article says otherwise, but 0 poses no problem for the integers to be a monoid under multiplication. Right? You just need an identity and associativity, not inverses.

joppy · on Dec 13, 2021

Yeah you're totally right, my bad.

YorkshireSeason · on Dec 13, 2021

According to [1] and [2, Section 3.3], type-classes were first introduced in [3] under the name "parametric overloading". Wadler's work [4] came a little later albeit was (probably) arrived at independently.

[1] O. Kieselyov, Increasing Haskell modularity. https://mail.haskell.org/pipermail/haskell-cafe/2014-October...

[2] P. Hudak, J. Hughes, S. Peyton Jones, P. Wadler, A History of Haskell: Being Lazy With Class.

[3] S. Kaes, Parametric overloading in polymorphic programming languages.

[4] P. Wadler, S. Blott, How to make ad-hoc polymorphism less ad hoc.

marcosdumay · on Dec 13, 2021

Another problem with newtypes, different from joppy's, is that they don't solve the problems with generic implementations.

For example, define some `instance Ord a => SomeClass a`, or more interestingly, `instance (SomeClass m, MonadTrans n) => SomeClass (n m)` without much problem, but if you do that, you almost certainly can not define any other generic instance for that class, and nothing you can do with newtypes will save you.

wereHamster · on Dec 13, 2021

In Haskell you typically work around this by using newtype to wrap your type so that you can give the new type specific implementations of type classes.

reikonomusha · on Dec 13, 2021

What do you think about most object-oriented languages only allowing one definition of a method per class?

lmm · on Dec 13, 2021

It's infuriating when it means you can't implement the same interface in two ways.

Joker_vD · on Dec 13, 2021

Well, how do you chose which implementation will get picked? In my experience, explicit wrappers is the cleanest way ― but then again, I also find explicit casts/conversions (which essentially are the same thing) to be the cleanest way too.

lmm · on Dec 14, 2021

If it would be ambiguous then you would have to specify yes. But for the cases where the required type means there's only one implementation that could work, I don't see the value in having to specify.

Joker_vD · on Dec 14, 2021

Uh, how would such an interface look? The one that can be implemented in two ways, and yet "the required type means there's only one implementation that could work"?

lmm · on Dec 15, 2021

I'm mainly thinking of parameterized interfaces, like `Comparable` or some kind of "Keyed<T>". Another example would be datastructures that can be iterated in multiple ways, e.g. a multidimensional array that can implement both `Iterable<Int>` and `Iterable<Row<Int>>` or something like that. You could have cases where you're passing that to a function that expects Iterable<Int> so it's obvious which implementation you want, and cases where you would have to disambiguate.

silon42 · on Dec 13, 2021

Same interface? Or two interfaces with the same method signature?

remexre · on Dec 13, 2021

My 2c is that it makes a lot of sense to have a single unnamed instance of a typeclass, and have that follow the orphan rules, but to allow other _named_ ones that don't follow those rules. Then, add syntax (possibly reuse TypeApplications' @ prefix?) to explicitly specify a typeclass when referencing a name. Possibly also have some lexically-scoped construct to make the instance that instance search will find be a particular named instance too?

4ad · on Dec 13, 2021

See instance arguments in Agda: https://dl.acm.org/doi/abs/10.1145/2034574.2034796

https://agda.readthedocs.io/en/v2.6.2.1/language/instance-ar...

nerdponx · on Dec 13, 2021

Idris 2 solves the problem by letting you define "named implementations" of interfaces: http://docs.idris-lang.org/en/latest/tutorial/interfaces.htm...

whateveracct · on Dec 13, 2021

Canonicity has value tho..to frame it as a design flaw is just wrongheaded.

aidenn0 · on Dec 13, 2021

If the author is here, I'd be interested to hear why multiple-value-setq is used instead of the more idiomatic setf (values ...)

jakear · on Dec 13, 2021

I'd have liked to see an implementation of `compress` in the DSL. As the author mentions, n-fold is pretty easily implemented in pure lisp by reading a "static" identity property of the transformation's prototype. In compress there may be no such prototype available.

reikonomusha · on Dec 13, 2021

    (define (compress l)
      (fold combine identity l))

The inferred type will be

    ((Transformation :t) => (List :t) -> :t)

jakear · on Dec 13, 2021

How does it pick which identity to use? If I compress the empty list is that a translation by 0,0 a dilation by 1, or a rotation by 0? I don't love the ambiguity here.

reikonomusha · on Dec 14, 2021

You can't pass that list. That would be a type error. The list must be (List :t), and :t can't change from element to element. In other words, it's not valid if you can't fill in :t. A rotation has type :t = Rotation; a dilation has type :t = Dilation. It can't be both at the same time. If you wanted to be able to do this, you'd either (1) need a more powerful type system, or (2) express all kinds of transformations into a single sum type.

jakear · on Dec 14, 2021

Well thats... limiting. But reasonable I suppose. If you go the sum type route would you then need to implement identify specifically for the sum type?

artemonster · on Dec 13, 2021

So, in the end the library will inject „identity“ symbol into the execution environment of n-fold by a compiler magic? How is that better than (get-identity tran) ??

lmm · on Dec 13, 2021

You can't use the latter polymorphically (particularly since you don't always have an instance of your transformation to work with - for example, if you make a method that combines a list of transformations, it needs to return the identity when given an empty list). So you end up having to thread it down your call stack. Once you have a non-toy method that's using two or three type parameters it gets pretty cumbersome.

artemonster · on Dec 13, 2021

Then you have to use a method that will get an identity given a container (even if empty). Relying on types here, when we say thats a list of rotation transformers so their identity will he THAT, is exactly s compiler magic to me.

lmm · on Dec 14, 2021

You resolve which method is called based on the type, yes. I don't see that as "magic" since it's fundamental language functionality that you rely on all the time - in Haskell typeclasses are the way you do polymorphism.

If you want to implement polymorphism at all you will have to do something that's more or less equivalent (e.g. an OO virtual function table is effectively the same thing as a typeclass instance - it's just that the language forces you to bundle the data members and the virtual function table together, whereas in Haskell they're decoupled and can be used separately). Or even function overloading requires a similar "the compiler chooses which one is actually called" concept. (Of course there are purist languages that insist on e.g. using different forms of + for different-sized integers, but they're very much a minority)