Skip to main content
  1. Posts/

criteria4s. The type the JVM never sees

·8 mins
Rafael Fernandez
Author
Rafael Fernandez
Mathematics, programming, and life stuff
criteria4s - This article is part of a series.
Part 2: This Article
Part 2: This Article

When I first explained criteria4s to a colleague, his reaction was: “so the type parameter just… disappears at runtime?” Yes. That is exactly what happens. And that was the moment I knew the next post could not stop at the public API.

In the previous post I showed the API surface: write a predicate once, evaluate it against any backend by swapping a type parameter. What I did not show is what is actually happening inside. This post does that. We read the source code together, file by file, and see how something that looks like magic is just careful use of the type system.

criteria4s-phantom-types-in-practice-img-3.svg

An empty trait with a full-time job
#

A phantom type is a type parameter that the compiler tracks but the runtime never sees. Think of it like a luggage tag at the airport. The tag tells the system where your bag is going. The bag itself does not change. And when you land, the tag is discarded.

In criteria4s, the luggage tag is CriteriaTag:

// core/src/main/scala/.../Criteria.scala

trait Criteria[T <: CriteriaTag] {
  def value: String
}

object Criteria {
  private[criteria4s] def pure[T <: CriteriaTag](v: String): Criteria[T] =
    new Criteria[T] {
      override def value: String = v
    }

  private[core] trait CriteriaTag
}

CriteriaTag is an empty trait. No fields. No methods. No state. It exists only at the type level. Criteria[T] wraps a String. That is it. At runtime, a Criteria[SQL] and a Criteria[MongoDB] are identical objects: both are just wrappers around a string. The T is gone, erased by the JVM after compilation.

The interesting part is pure. It is private[criteria4s]. You cannot call it. The only way to produce a Criteria[T] is through the predicates and conjunctions criteria4s provides. This means the string inside is always well-formed. Nobody sneaks in a raw "DROP TABLE users" and labels it Criteria[SQL].

Values that know nothing about formatting
#

Every operand in a criteria4s expression is a Ref[D, V]:

sealed trait Ref[D <: CriteriaTag, V] {
  def asString(using show: Show[V, D]): String
}

object Ref {
  trait Value[D <: CriteriaTag, V]      extends Ref[D, V]
  trait Col[D <: CriteriaTag]           extends Ref[D, Column]
  trait Collection[D <: CriteriaTag, V] extends Ref[D, Seq[V]]
  trait Range[D <: CriteriaTag, V]      extends Ref[D, (V, V)]
}

case class Column(colName: String) extends AnyVal

Look at asString. It does not produce a string on its own. It requires a Show[V, D] in scope. A Ref holds a value but has no idea how to format it. Formatting is someone else’s job.

That someone is Show.

The one line that changes everything
#

trait Show[-V, D <: CriteriaTag] {
  def show(v: V): String
}

Each dialect provides its own Show instances. Here is SQL’s column rendering:

given showColumn: Show[Column, SQL] =
  Show.create(col => col.colName)

And here is MongoDB’s:

given showColumn: Show[Column, MongoDB] =
  Show.create(col => s"\"${col.colName}\"")

One line of difference. SQL writes the column name as-is. MongoDB wraps it in JSON double quotes. PostgreSQL uses SQL double quotes. MySQL uses backticks. The entire rendering difference between four dialects lives in four lines of code, one per Show[Column, D] instance.

The obvious question is: why not just use toString? Every value in Scala already converts itself to a string. The answer is that toString carries no dialect information. Column("age").toString gives you something like Column(age). That is not a rendered column name for any backend. More importantly, toString cannot distinguish between SQL and MongoDB. Show[Column, SQL] and Show[Column, MongoDB] are different types. The compiler picks the right one based on the T flowing through the expression. toString would give you the same output regardless of dialect, which is precisely the problem we set out to solve.

The type class is also contravariant in V. Show[-V, D] means a Show[Any, SQL] can render anything as SQL, while a Show[String, SQL] renders strings specifically. The compiler picks the most specific instance available. toString gives you no such precision. It is one-size-fits-all. Show is surgical.

Everything else, the logic of >, =, AND, LIKE, the parenthesization, the nesting, is handled separately. The type class design keeps these responsibilities cleanly separated. Show knows how to print a value. Predicates know how to combine printed values. They do not need to know about each other.

How a predicate becomes a string
#

EQ[T], GT[T], AND[T]: each predicate is a type class that knows how to combine two rendered strings into an expression for its dialect. The actual mechanism is a BuilderBinary:

trait BuilderBinary[H[_ <: CriteriaTag]] {
  def build[T <: CriteriaTag](F: (String, String) => String): H[T]
}

Give a builder a function (String, String) => String and it returns a predicate instance. For SQL’s EQ:

given eqPred: EQ[T] = build[T, EQ](predExpr("="))
// predExpr("=")("age", "18") == "age = 18"

For MongoDB’s EQ:

given eqPred: EQ[T] = build[T, EQ](predExpr("eq"))
// predExpr("eq")("age", "18") == {"age": {$eq: 18}}

Same type class. Same interface. Different string template. This is what makes the library extensible: adding a new dialect is adding new string templates, not touching the predicate logic. Adding a new predicate is defining a new type class and wiring it to the dialects, not touching the existing predicates.

Both axes remain open. This is what we talked about in the post on the Expression Problem.

One override. Four dialects.
#

The SQL dialect defines all predicates once, in a shared trait:

trait SQL extends CriteriaTag

object SQL {
  trait SQLExpr[T <: SQL] {
    given eqPred: EQ[T]         = build[T, EQ](predExpr("="))
    given gtPred: GT[T]         = build[T, GT](predExpr(">"))
    given andConj: AND[T]       = build[T, AND](conjExpr("AND"))
    given likePred: LIKE[T]     = build[T, LIKE](predExpr("LIKE"))
    given isnullPred: ISNULL[T] = build[T, ISNULL](predExpr1("IS NULL"))
    // ... all predicates, defined once for the whole SQL family
  }
}

package object sql extends SQL.SQLExpr[SQL]

PostgreSQL inherits all of that and overrides exactly one thing:

trait PostgreSQL extends SQL

object PostgreSQL extends SQL.SQLExpr[PostgreSQL] {
  given showColumn: Show[Column, PostgreSQL] =
    Show.create(col => s""""${col.colName}"""")
}

That is the entire PostgreSQL dialect. One override. The rest comes from SQLExpr[PostgreSQL], where the inherited builders produce instances typed PostgreSQL instead of SQL. MySQL is the same, with backticks. DuckDB is the same, with double quotes. Four SQL-family dialects, and the actual dialect-specific code for three of them fits on a single screen.

When I first saw this, I wanted to show it to every developer who has ever copy-pasted a repository implementation across backends.

The compiler as the last line of defense
#

From the previous post:

val broken = F.col[SQL]("age") :> F.col[MongoDB]("age")
// Does not compile.

Now the reason is obvious. F.col[SQL]("age") is a Col[SQL]. F.col[MongoDB]("age") is a Col[MongoDB]. The :> operator resolves GT[T].eval, which requires both operands to be Ref[T, _] for the same T. SQL and MongoDB are different types. The compiler cannot unify them.

No runtime check. No exception thrown. No error message at startup. The program does not compile. The phantom type T flows through the entire expression tree, and any mismatch surfaces before the bytecode exists.

It is worth pausing on what would happen without this guarantee. Remember that phantom types are erased at runtime. A Criteria[SQL] and a Criteria[MongoDB] are the same thing at the JVM level: a wrapper around a string. There is no ClassCastException waiting to happen. The type information is gone. If the library let you mix dialects, you would get a Criteria containing a malformed query string. You would send it to the database. The database might reject it with a cryptic driver error. It might silently return empty results. It might do something worse. And you would be debugging at runtime, in a log file, possibly after it reached users.

The phantom type converts a semantic bug, the kind that slips past the compiler and surfaces quietly in production, into a structural error that the compiler catches before the code exists as a runnable artifact. That is a qualitative difference, not just a convenience.

What zero overhead actually looks like
#

When my colleague asked “so the type parameter just disappears at runtime?”, the next question was: “what does all this machinery cost?”

Nothing. Literally nothing.

Phantom types are erased at compile time. given instances are resolved statically. The build helper constructs a closure that concatenates strings. At runtime, criteria4s is string concatenation. The abstraction lives entirely in the type system, which is to say, it lives entirely in the compiler. The JVM never sees it.

This is what zero-cost abstraction means: the abstraction is real, the overhead is not.

Eight steps, zero surprises
#

When you write F.col[SQL]("age") geq F.lit(18), here is what actually happens:

  1. F.col[SQL]("age") creates a Col[SQL] wrapping Column("age")
  2. F.lit(18) creates a Value[SQL, Int] wrapping 18
  3. geq resolves GEQ[SQL] from the implicit scope
  4. GEQ[SQL].eval calls asString on both operands
  5. Col[SQL].asString asks Show[Column, SQL] and gets "age"
  6. Value[SQL, Int].asString asks Show[Int, SQL] and gets "18"
  7. GEQ[SQL] applies its string template: "age >= 18"
  8. Criteria.pure[SQL]("age >= 18") wraps the result

Eight steps. All resolved at compile time. The JVM executes string concatenation.

What comes next
#

The next post turns all of this into practice: criteria4s 3. Building your own dialect. We stop looking under the hood and use build and Show to teach criteria4s a new language, step by step.

The source is at github.com/eff3ct0/criteria4s. Reading Criteria.scala, then Show.scala, then SQL.scala in that order gives you the whole picture in about twenty minutes.

criteria4s - This article is part of a series.
Part 2: This Article
Part 2: This Article