Skip to main content
  1. Posts/

criteria4s. Do you speak my language?

·7 mins
Rafael Fernandez
Author
Rafael Fernandez
Mathematics, programming, and life stuff
criteria4s - This article is part of a series.
Part 3: This Article
Part 3: This Article

A friend read the previous post and sent me a message: “My database isn’t on the list. I guess the library can’t help me.” He was working with an internal query engine, the kind every company builds and nobody outside has ever seen.

He was wrong. An hour later he had a working dialect. The function from post one, the one that renders activeAdults for SQL and MongoDB without changing a line, was now rendering his company’s query syntax too. Same function, same predicates, new output.

This post walks through that process. We build a dialect from nothing, then look at the shortcut for databases that already speak SQL with a slightly different accent.

criteria4s-do-you-speak-my-language-img-3.svg

What a dialect is made of
#

Every criteria4s dialect needs three things:

A trait that extends CriteriaTag. This is the phantom type from the previous post. It has no fields and no methods. It exists so the compiler can tell your dialect apart from every other one.

Show instances. These teach criteria4s how to render values in your syntax. How do you quote a column name? How do you format a string literal? How do you write a sequence? Each answer is a one-line Show instance.

given instances for predicates and conjunctions. These tell criteria4s how to assemble rendered values into expressions. You provide a string template function, call build, and get a full type class instance back.

That is it. No interface to implement. No abstract class to extend. No registration step. You define the givens, bring them into scope, and the compiler does the rest.

Something that looks nothing like SQL
#

We need an example that makes the translation visible. If we built another SQL dialect, the output would look too similar to prove anything. So we are going to build a dialect that produces S-expressions: the syntax of Lisp. Everything in prefix notation, everything in parentheses. age >= 18 becomes (gte age 18). a AND b becomes (and a b).

The trait:

package com.example.dialect

import com.eff3ct.criteria4s.core.*
import com.eff3ct.criteria4s.core.Criteria.CriteriaTag
import com.eff3ct.criteria4s.instances.build

trait SExpr extends CriteriaTag

One line. SExpr is now a dialect tag. The compiler can track it through every expression. Nothing knows how to render it yet, but the type system already has a name for it.

String templates
#

The rendering logic of any dialect fits in a few private functions. SQL’s equality template is s"$left = $right". MongoDB’s is s"{$left: {$$eq: $right}}". For S-expressions, everything is prefix:

object SExpr {

  private def predExpr(op: String)(left: String, right: String): String =
    s"($op $left $right)"

  private def predExpr1(op: String)(value: String): String =
    s"($op $value)"

  private def conjExpr(op: String)(left: String, right: String): String =
    s"($op $left $right)"

Three functions. predExpr handles binary predicates like (eq age 18). predExpr1 handles unary ones like (null? age). conjExpr handles conjunctions like (and expr1 expr2). In this dialect they happen to share the same shape. In SQL they do not: predicates are infix while conjunctions wrap each operand in parentheses. The entire difference between those two conventions lives in these few lines.

Rendering values
#

Four Show instances cover every value type:

  // Columns render as bare names
  given showColumn: Show[Column, SExpr] =
    Show.create(_.colName)

  // Strings render with double quotes
  given showString: Show[String, SExpr] =
    Show.create(s => s""""$s"""")

  // Sequences render as space-separated lists in parens
  given showSeq[V](using show: Show[V, SExpr]): Show[Seq[V], SExpr] =
    Show.create(_.map(show.show).mkString("(", " ", ")"))

  // Ranges render as two values side by side
  given showTuple[V](using show: Show[V, SExpr]): Show[(V, V), SExpr] =
    Show.create { case (l, r) => s"${show.show(l)} ${show.show(r)}" }

SQL strings use single quotes with escaping. MongoDB columns wear JSON double quotes. Here, columns are bare and strings get double quotes. Same type class, completely different convention.

You might notice there is no Show[Int, SExpr]. The core library provides default Show instances for AnyVal types that call toString. Since 18.toString is "18", which is exactly what we want, we inherit it for free. You only write a Show when the default does not match your syntax.

One line per operation
#

This is where the pattern pays off. Fourteen operations, fourteen lines:

  // Predicates
  given eqPred: EQ[SExpr]               = build[SExpr, EQ](predExpr("eq"))
  given neqPred: NEQ[SExpr]             = build[SExpr, NEQ](predExpr("neq"))
  given gtPred: GT[SExpr]               = build[SExpr, GT](predExpr("gt"))
  given geqPred: GEQ[SExpr]             = build[SExpr, GEQ](predExpr("gte"))
  given ltPred: LT[SExpr]               = build[SExpr, LT](predExpr("lt"))
  given leqPred: LEQ[SExpr]             = build[SExpr, LEQ](predExpr("lte"))
  given likePred: LIKE[SExpr]           = build[SExpr, LIKE](predExpr("like"))
  given inPred: IN[SExpr]               = build[SExpr, IN](predExpr("in"))
  given isnullPred: ISNULL[SExpr]       = build[SExpr, ISNULL](predExpr1("null?"))
  given isnotnullPred: ISNOTNULL[SExpr] = build[SExpr, ISNOTNULL](predExpr1("not-null?"))
  given betweenPred: BETWEEN[SExpr]     = build[SExpr, BETWEEN](predExpr("between"))

  // Conjunctions
  given andConj: AND[SExpr] = build[SExpr, AND](conjExpr("and"))
  given orConj: OR[SExpr]   = build[SExpr, OR](conjExpr("or"))
  given notConj: NOT[SExpr] = build[SExpr, NOT](predExpr1("not"))
}

Every line follows the same pattern: build[Dialect, Predicate](templateFunction("operator")). The build helper takes your template and returns a complete type class instance. Behind the scenes, BuilderBinary or BuilderUnary calls asString on the operands, triggers Show, feeds the rendered strings into your template, and wraps the result in Criteria.pure.

The operator name is the only new information on each line. Everything else was already in place.

Running it
#

Bring the givens into scope and call the same activeAdults function from post one:

import com.example.dialect.SExpr.{given, *}
import com.eff3ct.criteria4s.core.*
import com.eff3ct.criteria4s.extensions.*
import com.eff3ct.criteria4s.functions as F

def activeAdults[T <: CriteriaTag: GEQ: EQ: AND](
    using Show[Column, T]
): Criteria[T] =
  (F.col[T]("age") geq F.lit(18)) and (F.col[T]("active") === F.lit(true))

activeAdults[SExpr].value
// (and (gte age 18) (eq active true))

That function was written before SExpr existed. It asked for a GEQ[T], an EQ[T], and an AND[T]. The import put them on the table. The compiler found everything it needed. The function has no idea it is producing Lisp.

Compile-time safety transfers too:

val broken = F.col[SExpr]("age") :> F.col[SQL]("age")
// Does not compile. SExpr is not SQL.

No special safety code. No registration. The phantom type flows through the expression tree exactly as it does for every built-in dialect.

The five-line shortcut
#

Everything above was the from-scratch path. But if your database speaks SQL and only differs in how it quotes column names, you do not need any of it. You inherit.

A database that wraps column names in angle brackets (<column>):

trait AngleBracketSQL extends SQL

object AngleBracketSQL extends SQL.SQLExpr[AngleBracketSQL] {
  given showColumn: Show[Column, AngleBracketSQL] =
    Show.create(col => s"<${col.colName}>")
}

Five lines. Every predicate, every conjunction, every string-formatting rule is inherited from SQLExpr[AngleBracketSQL]. The only thing we specified is the column quoting convention. This is not a toy example. This is exactly how PostgreSQL, MySQL, DuckDB, ClickHouse, and Spark SQL are implemented in the library. One override per dialect. Nothing else.

activeAdults[AngleBracketSQL].value
// (<age> >= 18) AND (<active> = true)

What thirty lines buy you
#

Counting the S-expression dialect generously:

  • 1 trait declaration
  • 3 template functions (6 lines with spacing)
  • 4 Show instances (8 lines)
  • 14 predicate and conjunction givens (14 lines)

Thirty lines. In return you get full predicate coverage (=, !=, >, >=, <, <=, LIKE, IN, IS NULL, IS NOT NULL, BETWEEN), conjunction support (AND, OR, NOT), compile-time safety, and compatibility with every polymorphic function already written against criteria4s. Both API styles work without extra code.

My friend had his internal dialect running by the end of the afternoon. His team uses it alongside PostgreSQL and MongoDB in the same codebase, with the same predicates, without a translation layer in between. One more language, zero drama.

What comes next
#

The next post moves from the library to the architecture around it: criteria4s 4. Your domain has no database. We stop asking how to build a dialect and start asking a more uncomfortable question: where in the system should we even know which database we are using?

The full source for all built-in dialects is at github.com/eff3ct0/criteria4s. The examples/ directory includes a WeirdDatastore dialect that follows the same from-scratch pattern we built here.

criteria4s - This article is part of a series.
Part 3: This Article
Part 3: This Article