Skip to main content
  1. Posts/

criteria4s. The filter you were always going to rewrite

·7 mins
Rafael Fernandez
Author
Rafael Fernandez
Mathematics, programming, and life stuff
criteria4s - This article is part of a series.
Part 1: This Article
Part 1: This Article

I have rewritten the same filter three times in my career. Not because I forgot I had already written it, but because each time it lived in a different backend, and each backend speaks a different language.

The first time it was a SQL string. The second time it was a MongoDB BSON document. The third time it was an Elasticsearch JSON query. The predicate was identical every time: find users over 18 who are active. The meaning never changed. What changed was the syntax, the quoting, the nesting, the operator names. And so I opened a new file, wrote the same logic again, and moved on.

This is the kind of duplication that does not feel like duplication. Each version looks different enough that you tell yourself they are separate things. They are not. They are the same thing wearing different clothes, and they will diverge. One day someone will fix a bug in the SQL version and forget the MongoDB one.

This is the Expression Problem showing up at work, unannounced.

criteria4s-write-your-filters-once-run-them-everywhere-img-3.svg

The same sentence in three languages
#

To make this concrete: I have had to write the same search for active adult users for three different backends.

-- SQL
SELECT * FROM users WHERE age >= 18 AND active = true
// MongoDB
{ "$and": [{ "age": { "$gte": 18 } }, { "active": { "$eq": true } }] }
// Elasticsearch
{
  "bool": {
    "must": [
      { "range": { "age": { "gte": 18 } } },
      { "term": { "active": true } }
    ]
  }
}

They are the same sentence in three different languages. age >= 18 AND active = true. That is all they say. But the translation is not mechanical, you cannot derive one from the other with a regex. Each backend has its own grammar, its own way of composing predicates, its own conventions for strings and numbers and operators.

And this is before you add BETWEEN, LIKE, IS NULL, or anything more interesting.

The usual response to this is pragmatic: write a repository interface, put the database-specific code behind it, and move on. This is correct architecture. The problem is that it does not eliminate the duplication, it just contains it. You still have three implementations of the same predicate. They live in different files now, but they will still diverge.

What if the predicate itself were polymorphic?

What if the predicate itself were polymorphic
#

criteria4s is a library I built to answer that question. The idea is simple: a filter expression parameterized by a phantom type. The phantom type encodes the backend. The compiler resolves the rendering. You write the logic once and let the type system do the translation.

This is what it looks like:

import com.eff3ct.criteria4s.core.*
import com.eff3ct.criteria4s.functions as F

def activeAdults[T <: CriteriaTag: GEQ: EQ: AND](
    using Show[Column, T]
): Criteria[T] =
  (F.col[T]("age") geq F.lit(18)) and (F.col[T]("active") === F.lit(true))

Read the signature carefully. activeAdults is polymorphic in T. The constraint T <: CriteriaTag says T must be a dialect. The context bounds : GEQ: EQ: AND say that whatever dialect T is, it must know how to render those three operations. The Show[Column, T] says it must also know how to render a column name. The body is the same predicate, written once.

Now call it:

import com.eff3ct.criteria4s.dialect.sql.{given, *}
activeAdults[SQL].value
// (age >= 18) AND (active = true)

import com.eff3ct.criteria4s.dialect.mongodb.{given, *}
activeAdults[MongoDB].value
// {$and: [{"age": {$gte: 18}}, {"active": {$eq: true}}]}

import com.eff3ct.criteria4s.dialect.elasticsearch.{given, *}
activeAdults[Elasticsearch].value
// {"bool": {"must": [{"range": {"age": {"gte": 18}}}, {"term": {"active": true}}]}}

Same function. Three different outputs. The only difference between the calls is the import. You swap the dialect, you get a different rendering. The predicate itself never changes.

No intermediate AST. No runtime dispatch. No pattern matching on an enum of backends. The compiler knows which given instances to use for each T and resolves everything at compile time.

Tagless final, applied to databases
#

If you have read the post on tagless final, you already see the pattern. criteria4s is tagless final applied to query rendering instead of effects. Each predicate (EQ, GT, AND, LIKE, …) is a type class. Each dialect (SQL, MongoDB, Elasticsearch, …) provides given instances for those type classes. The Show[V, T] type class handles value rendering: quoting strings, formatting numbers, naming columns.

This is the Expression Problem solved by the same technique:

                 SQL       PostgreSQL    MongoDB     Elasticsearch
           ┌──────────┬──────────────┬───────────┬────────────────┐
EQ         │  =       │  =           │  $eq      │  term          │
GT         │  >       │  >           │  $gt      │  range.gt      │
AND        │  AND     │  AND         │  $and     │  bool.must     │
LIKE       │  LIKE    │  LIKE        │  $regex   │  wildcard      │
           └──────────┴──────────────┴───────────┴────────────────┘

Rows are predicates. Columns are dialects. With a concrete ADT, you pick one axis. With type classes, both axes are open. Adding a new predicate means defining a new type class and providing instances for each existing dialect. Adding a new dialect means providing instances for each existing type class. Neither operation modifies existing code.

There is something satisfying about a design where the abstractions actually hold. No sealed trait to reopen, no // TODO: add MongoDB support.

The mistake that never reaches production
#

Here is something I find genuinely useful in day-to-day use. This does not compile:

// Type error: SQL vs MongoDB
val broken = F.col[SQL]("age") :> F.col[MongoDB]("age")

The phantom type T propagates through every expression. If you start building in SQL, every term must also be SQL. You cannot accidentally compose predicates from different dialects. The compiler tells you before the tests run, before the app starts, before production breaks.

This is the thing I wish I had the first two times I rewrote that filter.

Two styles, same output
#

criteria4s gives you two API styles. They produce the same output, so pick the one that reads better in your context.

Function style, explicit and namespace-qualified:

import com.eff3ct.criteria4s.functions as F

val filter = F.and(
  F.geq(F.col[SQL]("age"), F.lit(18)),
  F.===(F.col[SQL]("status"), F.lit("active"))
)

Extension style, chained and closer to prose:

import com.eff3ct.criteria4s.extensions.*

val filter =
  (F.col[SQL]("age") :>= F.lit(18)) and
  (F.col[SQL]("status") === F.lit("active"))

I tend to use the extension style for complex predicates, because it reads more naturally when you have several conditions. The function style is useful when composing programmatically or when the explicitness helps.

Eight dialects, one import
#

Eight dialects:

Dialect Column quoting Module
Base SQL unquoted criteria4s-sql
PostgreSQL "column" criteria4s-postgresql
MySQL `column` criteria4s-mysql
Spark SQL `column` criteria4s-sparksql
DuckDB "column" criteria4s-duckdb
ClickHouse `column` criteria4s-clickhouse
MongoDB JSON operators criteria4s-mongodb
Elasticsearch Query DSL criteria4s-elasticsearch

The SQL-family dialects are mostly inheritance. PostgreSQL only overrides column quoting; everything else comes from the base SQL dialect. MongoDB and Elasticsearch start from scratch.

If you need to push predicates all the way into the driver, integration modules bridge the gap:

// build.sbt
"com.eff3ct" %% "criteria4s-core"                   % "1.0.0"
"com.eff3ct" %% "criteria4s-postgresql"              % "1.0.0"
"com.eff3ct" %% "criteria4s-sql-jdbc"                % "1.0.0" // .toWhereClause
"com.eff3ct" %% "criteria4s-mongodb-driver"          % "1.0.0" // .toBson
"com.eff3ct" %% "criteria4s-elasticsearch-client"    % "1.0.0" // .toQuery

What comes next
#

This is the first post in a series on criteria4s. The next ones go under the hood:

  • criteria4s 2. Phantom types in practice: how CriteriaTag and Show enforce dialect safety at the type level, and why there is zero runtime cost.
  • criteria4s 3. Building your own dialect: a step-by-step tutorial for adding a new backend in under 50 lines.
  • criteria4s 4. Hexagonal architecture with type classes: using T <: CriteriaTag as a port so your domain layer never knows which database it is talking to.
  • criteria4s 5. From SQL to MongoDB without rewriting your queries: what migration actually looks like in practice.

The full documentation covers every predicate, every dialect, every integration. The source is MIT.

The third time I rewrote that filter, I decided I did not want to write it a fourth time. criteria4s came out of that decision. The next post opens the hood and shows why the trick works without costing anything at runtime.

criteria4s - This article is part of a series.
Part 1: This Article
Part 1: This Article