Recursive ADT (Algebraic Data Types)

Recursive ADT encoding and decoding

Certain shapes of data are difficult to write correct Decoder/Encoder instances for, however much of the complexity can be avoided by leveraging the Defer typeclass from cats.

Recursive Decoder example

It's important to understand the failure modes that are specific to writing a Decoder for a recursive data structure, because Defer only resolves 3 of the 4.

Consider the following ADT:

import io.circe.{Json, Decoder, HCursor}
import io.circe.Decoder.{Result, AccumulatingResult}
import io.circe.syntax._
import cats.syntax.all._

sealed trait Tree[A]
case class Branch[A](left: Tree[A], right: Tree[A]) extends Tree[A]
case class Leaf[A](value: A) extends Tree[A]

First Attempt: StackOverflowError during Decoder creation

And these encoder / decoder instances:

implicit def branchDecoder[A: Decoder]: Decoder[Branch[A]] =
  Decoder.forProduct2[Branch[A], Tree[A], Tree[A]]("l", "r")(Branch.apply)

implicit def leafDecoder[A: Decoder]: Decoder[Leaf[A]] =
  Decoder[A].at("v").map(Leaf(_))

implicit def treeDecoder[A: Decoder]: Decoder[Tree[A]] =
  List[Decoder[Tree[A]]](
    Decoder[Branch[A]].widen,
    Decoder[Leaf[A]].widen
  ).reduceLeft(_ or _)
val decoder = 
  try implicitly[Decoder[Tree[Int]]] 
  catch {
    case _: StackOverflowError => 
      Decoder.failedWithMessage[Tree[Int]]("Lookup caused StackOverflowError")
  }
// decoder: Decoder[Tree[Int]] = io.circe.Decoder$$anon$21@2e86734e

Json.obj("v" := 1).as[Tree[Int]](decoder)
// res0: Result[Tree[Int]] = Left(
//   value = DecodingFailure at : Lookup caused StackOverflowError
// )

This implementation looks quite reasonable, however it will result in a StackOverflowError at runtime. This happens because Tree is generic, and it's Decoder instances must be generic defs, producing and endless loop of calls to treeDecoder and branchDecoder.

Second Attempt: Diverging Implicits error

The StackOverflowError can be converted to a compilation error by adjusting the implicit parameters to be more granular. This diverging implicits error is more clear about the source of the issue.

implicit def branchDecoder[A](implicit TD: Decoder[Tree[A]]): Decoder[Branch[A]] =
  Decoder.forProduct2[Branch[A], Tree[A], Tree[A]]("l", "r")(
    Branch.apply
  )

implicit def leafDecoder[A: Decoder]: Decoder[Leaf[A]] =
  Decoder[A].at("v").map(Leaf(_))

implicit def treeDecoder[A](implicit BD: Decoder[Branch[A]], LD: Decoder[Leaf[A]]): Decoder[Tree[A]] =
  List[Decoder[Tree[A]]](
    Decoder[Branch[A]].widen,
    Decoder[Leaf[A]].widen
  ).reduce(_ or _)
implicitly[Decoder[Tree[Int]]]
// error: diverging implicit expansion for type io.circe.Decoder[repl.MdocSession.MdocApp.Tree[Int]]
// starting with method branchDecoder

Third Attempt: Undesirable creation of Decoder instances

Switching to lazily creating the instances needed is a way to get past the diverging implicits issue, however the cost of this approach is the creation of a new instance of Decoder each time recursion occurs.

var instanceCounter = 0
// instanceCounter: Int = 0

implicit def branchDecoder[A: Decoder]: Decoder[Branch[A]] = {
  instanceCounter += 1
  Decoder.instance { c =>
    (c.downField("l").as[Tree[A]], c.downField("r").as[Tree[A]]).mapN(Branch.apply)
  }
}

implicit def leafDecoder[A: Decoder]: Decoder[Leaf[A]] = {
  instanceCounter += 1
  Decoder[A].at("v").map(Leaf(_))
}

implicit def treeDecoder[A: Decoder]: Decoder[Tree[A]] = {
  instanceCounter += 1
  Decoder.instance { c =>
    c.as[Branch[A]].orElse(c.as[Leaf[A]])
  }
}
Json.obj(
  "l" -> Json.obj("v" := 1),
  "r" -> Json.obj(
    "l" -> Json.obj("v" := 2),
    "r" -> Json.obj("v" := 3)
  )
).as[Tree[Int]]
// res2: Result[Tree[Int]] = Right(
//   value = Branch(
//     left = Leaf(value = 1),
//     right = Branch(left = Leaf(value = 2), right = Leaf(value = 3))
//   )
// )

instanceCounter
// res3: Int = 19

Fourth Attempt: A workable solution

The solution to the previous issues is to generate the instances once, and cache them.

var instanceCounter = 0
// instanceCounter: Int = 0

implicit def branchDecoder[A](implicit DTA: Decoder[Tree[A]]): Decoder[Branch[A]] = {
  instanceCounter += 1
  Decoder.forProduct2[Branch[A], Tree[A], Tree[A]]("l", "r")(Branch.apply)
}

implicit def leafDecoder[A: Decoder]: Decoder[Leaf[A]] = {
  instanceCounter += 1
  Decoder[A].at("v").map(Leaf(_))
}

implicit def treeDecoder[A: Decoder]: Decoder[Tree[A]] = {
  instanceCounter += 1
  new Decoder[Tree[A]] {
    private implicit val self: Decoder[Tree[A]] = this
    private val delegate =
      List[Decoder[Tree[A]]](
        Decoder[Branch[A]].widen,
        Decoder[Leaf[A]].widen
      ).reduce(_ or _)
      
    override def apply(c: HCursor): Result[Tree[A]] = delegate(c)
  }
}
Json.obj(
  "l" -> Json.obj("v" := 1),
  "r" -> Json.obj(
    "l" -> Json.obj("v" := 2),
    "r" -> Json.obj("v" := 3)
  )
).as[Tree[Int]]
// res4: Result[Tree[Int]] = Right(
//   value = Branch(
//     left = Leaf(value = 1),
//     right = Branch(left = Leaf(value = 2), right = Leaf(value = 3))
//   )
// )

instanceCounter
// res5: Int = 3

Note that, because delegate is a fixed instance, it is not necessary to explicitly cache branchDecoder or leafDecoder.

This implementation solves most of the issues with a recursive decoder, at the cost of being annoying to write, and easy to break by omitting private implicit val self: Decoder[Tree[A]] = this, which will cause this decoder to produce a StackOverflowError at runtime, but the compiler won't complain about removing it.

Fifth Attempt: A more elegant solution

The Defer typeclass allows us to write Decoders and Encoders that have access to their eventual instance, enabling us to write an equivalent instances in a more natural and less fragile fashion. This can be done directly, using Defer[Decoder].fix, or the recursive helper provided on the Decoder and Encoder companion objects.

The Decoder instances for Tree using Decoder.recursive would look like this:

var instanceCounter = 0
// instanceCounter: Int = 0

implicit def branchDecoder[A](implicit DTA: Decoder[Tree[A]]): Decoder[Branch[A]] = {
  instanceCounter += 1
  Decoder.forProduct2[Branch[A], Tree[A], Tree[A]]("l", "r")(Branch.apply)
}

implicit def leafDecoder[A: Decoder]: Decoder[Leaf[A]] = {
  instanceCounter += 1
  Decoder[A].at("v").map(Leaf(_))
}

implicit def treeDecoder[A: Decoder]: Decoder[Tree[A]] = {
  instanceCounter += 1
  Decoder.recursive { implicit recurse =>
    List[Decoder[Tree[A]]](
      Decoder[Branch[A]].widen,
      Decoder[Leaf[A]].widen
    ).reduce(_ or _)
  }
}
Json.obj(
  "l" -> Json.obj("v" := 1),
  "r" -> Json.obj(
    "l" -> Json.obj("v" := 2),
    "r" -> Json.obj("v" := 3)
  )
).as[Tree[Int]]
// res6: Result[Tree[Int]] = Right(
//   value = Branch(
//     left = Leaf(value = 1),
//     right = Branch(left = Leaf(value = 2), right = Leaf(value = 3))
//   )
// )

instanceCounter
// res7: Int = 3

Note that, in the same manner that caching delegate implicitly caches the branch and leaf decoders, because treeDecoder provides a fixed instance, it is not necessary to use Decoder.recursive to write branchDecoder or leafDecoder

Limitations

While both the fourth and fifth attempts avoid the issues of the previous attempts, they are not perfect. Because the recursive calls are not tail recursive, they are not fully stack safe. This is generally not an issue for data structures that have a depth of at most log(size), linear structures (like a linked list) could run into trouble.

var instanceCounter = 0
// instanceCounter: Int = 0

implicit def branchDecoder[A](implicit DTA: Decoder[Tree[A]]): Decoder[Branch[A]] = {
  instanceCounter += 1
  Decoder.forProduct2[Branch[A], Tree[A], Tree[A]]("l", "r")(Branch.apply)
}

implicit def leafDecoder[A: Decoder]: Decoder[Leaf[A]] = {
  instanceCounter += 1
  Decoder[A].at("v").map(Leaf(_))
}

implicit def treeDecoder[A: Decoder]: Decoder[Tree[A]] = {
  instanceCounter += 1
  Decoder.recursive { implicit recurse =>
    List[Decoder[Tree[A]]](
      Decoder[Branch[A]].widen,
      Decoder[Leaf[A]].widen
    ).reduce(_ or _)
  }
}
val decoder = Decoder[Tree[Int]]
// decoder: Decoder[Tree[Int]] = io.circe.Decoder$$anon$9@5fbc9073
try 
  List
    .range(0, 10000)
    .map(i => Json.obj("v" := i))
    .reduceLeft { (a, b) =>
      Json.obj("l" -> a, "r" -> b)
    }
    .as[Tree[Int]](decoder)
catch {
  case _: StackOverflowError => Leaf(-1).asRight
}
// res8: Result[Tree[Int]] = Right(value = Leaf(value = -1))

instanceCounter
// res9: Int = 3

Note that despite the successful creation of a Decoder[Tree[Int]] instance, and the number of Decoder instances remaining constant at 3, the size of the unbalanced Tree[Int] represented in the JSON still caused a StackOverflowError.