Skip to main content
  1. Posts/

Todo CLI in Rust 3. JSON persistence, contract vs implementation

·18 mins
Rafael Fernandez
Author
Rafael Fernandez
Mathematics, programming, and life stuff
Table of Contents
Todo CLI in Rust no fluff - This article is part of a series.
Part 3: This Article

We continue with the series.

In the previous chapters, we laid the foundations of the kitchen (hexagonal architecture) and prepared the main ingredients (immutable domain and typed errors). Now it’s time to open the pantry: the place where we store everything between service and service. Because a kitchen without a pantry is a kitchen that starts from scratch every day.

Persistence is where many small projects go awry. Not because of the complexity of the storage itself—saving JSON to a file is no mystery—but because of how it connects with the rest of the system. If the business layer knows about file paths, serialization formats, or I/O errors, it ceases to be business logic and becomes an infrastructure script with delusions of grandeur.

This chapter has a central axis: the difference between contract and implementation. In Rust, that difference is materialized with traits. We define what the application needs (the trait) before deciding how we solve it (the adapters). And that separation isn’t academic ceremony; it’s what allows having two implementations of the same contract without touching a line of business logic.

todo-cli-in-rust-3-json-persistence-contract-vs-implementation-img-9.png

Reference code:

The contract: a trait that knows nothing of JSON or disks
#

Before touching the disk, before thinking about JSON, before importing serde, we define the contract. In hexagonal architecture, the output port is the trait that the application layer needs to function, without knowing or caring who implements it.

pub trait TaskRepository {
    fn save(&mut self, task: Task) -> RepoResult<()>;
    fn list(&self, query: TaskQuery) -> RepoResult<Vec<Task>>;
    fn find_by_id(&self, id: Uuid) -> RepoResult<Option<Task>>;
    fn delete(&mut self, id: Uuid) -> RepoResult<bool>;
}

#[derive(Debug, Clone, Copy)]
pub enum TaskQuery {
    All,
    ByStatus(TaskStatus),
}

Four operations. A query enum. That is all the application needs to know about persistence. Notice what does not appear: there is no PathBuf, there is no serde::Serialize, there is no std::fs, there is no HashMap. The trait is agnostic to the storage technology. It is pure business semantics expressed as an interface.

This is the fundamental difference between contract and implementation: the trait says what can be done, not how it is done. And in Rust, that separation has an additional advantage that doesn’t exist in languages with classic interfaces (Java, Go): the compiler verifies at compile time that each implementation fulfills exactly the contract, including return types, mutability, and ownership.

Why a trait and not call JSON directly from use cases
#

The natural temptation in a small project is to go direct: serde_json::from_str() inside the use case, fs::write() at the end, and you’re done. It works. But the cost appears when you want to change something:

If the use case knows about JSON:

  • it depends on serde: a change in the serialization format breaks business logic.
  • it depends on filesystem paths: you can’t test without a disk.
  • it depends on I/O errors: std::io::Error mixes with domain errors.
  • and it ceases to be a use case to become an infrastructure script that does everything and cannot be tested in parts.

With the trait, the application layer only knows business operations: “save this task”, “give me the filtered tasks”, “find by ID”, “delete by ID”. The how is resolved by whoever implements the trait.

In practice, this is seen in the use cases. For example, AddTaskService receives a generic R: TaskRepository:

pub struct AddTaskService<R: TaskRepository> {
    repo: R,
}

The service doesn’t know if R is an in-memory HashMap or a JSON file on disk. It doesn’t care. It only knows that it can call .save(), .list(), .find_by_id(), and .delete(). If tomorrow you add a PostgresTaskRepository, the service works without changing a line.

This is the dependency inversion materialized in Rust code: the inner layer (application) defines the contract, the outer layer (infrastructure) implements it. The dependency arrow points inward, not outward.

Anatomy of the contract: each signature has a decision
#

It is no coincidence that the trait’s signatures are the way they are. Each one encodes an intentional design decision:

fn save(&mut self, task: Task) -> RepoResult<()>

It receives Task by value, not by reference. Why? Because the repository takes ownership of the task. Once saved, the caller shouldn’t continue mutating it without going through the repository again. This is consistent with the immutable domain from Post 2: state transitions produce new instances, and the repository saves the new version. If the signature were fn save(&mut self, task: &Task), the caller would retain the reference and could assume it’s “already saved” while continuing to mutate a local copy. With ownership transfer, that confusion is not possible.

The &mut self indicates that saving is an operation that modifies internal state. In the JSON adapter, it modifies the file. In the in-memory one, it modifies the HashMap. The compiler prevents you from calling save from a context that only has &self.

fn list(&self, query: TaskQuery) -> RepoResult<Vec<Task>>

It uses &self, not &mut self. Listing tasks is a pure read operation. This allows the compiler to verify that you are not modifying internal state when querying. In a concurrency context (which we don’t apply here, but would be the next step), this would allow multiple simultaneous readers.

The TaskQuery is a Copy enum with two variants (All, ByStatus(TaskStatus)). It is sufficient for current needs and extensible without breaking signatures: adding ByDateRange(DateTime, DateTime) tomorrow doesn’t change the signature of list, it just expands the enum.

fn find_by_id(&self, id: Uuid) -> RepoResult<Option<Task>>

Here the most important decision is in the return type: Option<Task>, not Result<Task, NotFoundError>. “Not found” is not an infrastructure error. It is an expected result. The disk worked, the read was correct, there simply wasn’t any task with that ID.

The use case decides what to do with None. In MarkTaskDoneService, for example, it converts None into DomainError::TaskNotFound. But in another context it could be perfectly valid for it not to exist (for example, before creating a task, verifying that there are no duplicates).

This distinction between “the operation failed” (Err) and “the operation worked but there was no data” (Ok(None)) is subtle but critical for keeping the error chain clean. If find_by_id returned Err(NotFound), every caller would have to distinguish “did the disk fail or does it simply not exist?”, and that is mixing levels of abstraction.

fn delete(&mut self, id: Uuid) -> RepoResult<bool>

It returns bool, not Option<Task>. The boolean indicates whether something was deleted (true) or there was nothing to delete (false). Neither is an error. This design greatly simplifies the CLI layer: true is shown as DELETED, false as NOT_FOUND, both with exit code 0. If delete returned Err for “doesn’t exist”, the CLI would have to decide if an idempotent delete is an error or not, and that UX decision would be contaminating the persistence layer.

The error type: generic but sufficient
#

pub type RepoResult<T> = Result<T, RepoError>;

#[derive(Debug, Error)]
pub enum RepoError {
    #[error("internal error: {error}")]
    InternalError { error: String },
}

RepoError is intentionally generic. It has a single variant (InternalError) with a free String field. Is it ideal? No. Is it sufficient for the current scope? Yes.

The alternative would be to have variants like IoError(std::io::Error), SerializationError(serde_json::Error), DirectoryNotFound(PathBuf). But that would leak implementation details (I/O, serde, paths) to the output port, which belongs to the domain/application layer. A trait that lives in ports/outputs/ should not import std::io::Error, that would couple the contract to a concrete technology.

With InternalError { error: String }, each adapter translates its specific errors to a string with context. It’s not strongly typed, but it preserves the abstraction boundary. It’s an explicit technical debt decision that we discuss in detail in Post 3.1.

The RepoResult<T> type alias simplifies signatures, just like CliResult<T> in the CLI layer and ApplicationResult<T> in the application layer. Three layers, three type aliases, three error enums. Consistency that reduces cognitive load.

Where the contract lives in the architecture
#

src/tasks/
├── domain/           # Entities, domain errors
├── application/      # Use cases, application errors
├── ports/
│   └── outputs/
│       ├── task_repository.rs    # ← THE TRAIT LIVES HERE
│       └── errors.rs             # ← RepoError, RepoResult
└── adapters/
    └── persistence/
        ├── mod.rs                          # pub mod declarations
        ├── in_memory_task_repository.rs    # ← IMPLEMENTATION 1
        └── json_file_task_repository.rs    # ← IMPLEMENTATION 2

Notice the separation: the trait lives in ports/outputs/ (belongs to the application/domain layer), while the implementations live in adapters/persistence/ (belong to the infrastructure layer).

This is dependency inversion visualized in the directory tree. The inner layer (ports/) defines what it needs. The outer layer (adapters/) decides how it provides it. The import arrows go from adapters/ to ports/, never the reverse. If an adapter imports the trait, that’s correct. If the trait imported an adapter, the architecture would be broken.

The implementations: two adapters, one interface
#

With the contract defined, we have two implementations. It’s no coincidence that there are two: each has a different and complementary purpose.

src/tasks/adapters/persistence/
├── mod.rs                          # pub mod declarations
├── in_memory_task_repository.rs    # In-memory HashMap
└── json_file_task_repository.rs    # JSON to disk

The mod.rs is minimal:

pub mod in_memory_task_repository;
pub mod json_file_task_repository;

Two modules, two lines, two completely different responsibilities.

InMemoryTaskRepository: the stunt double
#

In the kitchen, before serving a new dish to the customer, it is tested with the team. The InMemoryTaskRepository is that test: fast, disposable, without side effects. It’s the stunt double for real persistence.

#[derive(Debug, PartialEq, Eq, Clone)]
pub struct InMemoryTaskRepository {
    cache: HashMap<Uuid, Task>,
}

impl InMemoryTaskRepository {
    pub fn new() -> Self {
        Self {
            cache: HashMap::default(),
        }
    }
}

A HashMap<Uuid, Task>. Nothing more. No files, no serialization, no filesystem latency. But it implements exactly the same trait TaskRepository:

impl TaskRepository for InMemoryTaskRepository {
    fn save(&mut self, task: Task) -> RepoResult<()> {
        Self::add_task(self, task)
    }

    fn list(&self, query: TaskQuery) -> RepoResult<Vec<Task>> {
        let result = match query {
            TaskQuery::All => self.cache.values().cloned().collect(),
            TaskQuery::ByStatus(task_status) => self.get_task_by_status(task_status),
        };
        Ok(result)
    }

    fn find_by_id(&self, task_id: Uuid) -> RepoResult<Option<Task>> {
        Ok(self.get_task_by_id(task_id).cloned())
    }

    fn delete(&mut self, task_id: Uuid) -> RepoResult<bool> {
        match self.delete_task_by_id(task_id) {
            Some(_) => Ok(true),
            None => Ok(false),
        }
    }
}

Notice: save delegates to add_task, which internally uses HashMap::insert. insert does upsert by nature; if the key already exists, it replaces the value. That means the behavior of “saving a modified task” and “saving a new task” is the same. Exactly what we defined in the contract.

What is an implementation that loses its data when the process ends good for? For two things:

  1. Business logic tests. When you run the tests for AddTaskService or ListTasksService, those tests use InMemoryTaskRepository. They are not testing persistence; they are testing that the use case behaves correctly. No disk, no latency, no setup.

  2. Contract validation. If the in-memory one fulfills the same trait as the JSON one, and the use cases work with the in-memory one, then they will work with any implementation that fulfills the trait. The in-memory one is the living proof that the abstraction works.

Here is the power of separating contract from implementation: you can verify all the business logic against a trivial implementation, and trust that the real implementation (JSON to disk) only needs to pass its own infrastructure tests.

JsonFileTaskRepository: the real pantry
#

This is the adapter the user touches. It saves the tasks in a JSON file, survives between executions, and handles everything that implies: creating directories, reading files, serializing, deserializing, and failing with context when something goes wrong.

#[derive(Debug, Clone)]
pub struct JsonFileTaskRepository {
    file_path: PathBuf,
}

Just a PathBuf. The repository maintains no state in memory beyond the file path. Every operation reads the file, modifies the data, and writes it back. Is it the most efficient? No. Is it the simplest and most correct for a CLI that runs, does one operation, and terminates? Absolutely.

Compare with the in-memory one: there the state lives in a HashMap on the heap. Here the state lives in a file on disk. But both implement the same contract. The use case has no way to distinguish them, and that is exactly the idea.

Construction: directories and platform paths
#

The new() constructor resolves where the data file lives:

impl JsonFileTaskRepository {
    pub fn new() -> RepoResult<Self> {
        let project_dirs = ProjectDirs::from("com", "org", "todo-cli")
            .ok_or_else(|| RepoError::InternalError {
                error: "could not resolve project directories".to_string(),
            })?;
        let data_dir = project_dirs.config_dir().join("data");
        fs::create_dir_all(&data_dir).map_err(|e| RepoError::InternalError {
            error: format!(
                "could not create data directory '{}': {e}",
                data_dir.display()
            ),
        })?;
        let file_path = data_dir.join("tasks.json");
        Ok(Self { file_path })
    }
}

There are several decisions here that deserve explanation:

The directories crate. Instead of hardcoding a path like ~/.todo-cli/tasks.json, we use ProjectDirs::from("com", "org", "todo-cli") which resolves the path according to the platform:

  • Linux: ~/.config/todo-cli/data/tasks.json
  • macOS: ~/Library/Application Support/com.org.todo-cli/data/tasks.json
  • Windows: C:\Users\<user>\AppData\Roaming\org\todo-cli\data\tasks.json

Why does it matter? Because a CLI that saves data in a standard platform path is a CLI that respects the conventions of the user’s operating system. Backup tools, dotfile managers, and cleanup scripts know where to look.

Preemptive create_dir_all. The constructor creates the data/ directory if it doesn’t exist. This makes the first execution work without prior setup: cargo run -- add "My first task" creates everything necessary automatically. Without this, the user would have to run mkdir -p ~/.config/todo-cli/data/ before using the tool. Bad first-use experience.

Alternative using() constructor. Besides new(), there is a using(file_path: PathBuf) constructor that accepts an arbitrary path:

pub fn using(file_path: PathBuf) -> Self {
    Self { file_path }
}

This constructor exists exclusively for tests. It allows creating a repository that points to a file inside a tempdir(), completely isolating the tests from the user’s filesystem. It doesn’t create directories, it doesn’t resolve platform paths, it simply uses the path you give it. It’s a clean testing seam: same API, different configuration.

Reading and writing: the data lifecycle
#

The repository has two internal (private) methods that encapsulate all interaction with the disk:

fn read_task_file(&self) -> RepoResult<TasksFile> {
    if !self.file_path.exists() {
        Ok(TasksFile::default())
    } else {
        let file = fs::read_to_string(&self.file_path)
            .map_err(|e| RepoError::InternalError {
                error: format!("Reading data from file. E: {e:?}"),
            })?;
        serde_json::from_str(file.as_str())
            .map_err(|e| RepoError::InternalError {
                error: format!("Parsing data from file to tasks. E: {e:?}"),
            })
    }
}

fn write_tasks_file(&self, tasks_file: &TasksFile) -> RepoResult<()> {
    if let Some(parent) = self.file_path.parent() {
        fs::create_dir_all(parent).map_err(|e| RepoError::InternalError {
            error: format!(
                "could not create parent directory '{}': {e}",
                parent.display()
            ),
        })?;
    }

    let payload = serde_json::to_string(tasks_file)
        .map_err(|e| RepoError::InternalError {
            error: format!("Serializing data. E: {e:?}"),
        })?;

    fs::write(&self.file_path, payload)
        .map_err(|e| RepoError::InternalError {
            error: format!("Writing data. E: {e:?}"),
        })
}

These two methods are implementation details that do not appear in the trait. The contract says “save a task”; how that translates to reading a file, parsing it, modifying a vector, and writing it back is exclusively the adapter’s business. If tomorrow you changed to SQLite, you would replace these methods with SQL queries without touching the trait.

Missing file = empty state
#

The most important decision in read_task_file is on the first line: if the file doesn’t exist, it returns TasksFile::default() (a struct with an empty Vec<Task>). It doesn’t fail, it doesn’t create an empty file, it doesn’t print warnings. It simply assumes that “no file” equates to “no tasks”.

This covers the first-run case transparently. The user runs todo list for the first time and gets an empty list, not a “file not found” error. The file is created the first time it is written to (successful save or delete).

Invalid JSON = explicit error
#

If the file exists but contains invalid JSON, serde_json::from_str fails and is converted into a RepoError::InternalError with the parsing error detail. There is no silent autocorrection, there is no “create a new file because the old one was broken”.

Why not autocorrect? Because automatically “fixing” a corrupted JSON usually equates to losing data without warning. Imagine the file has 50 tasks and one byte got corrupted: do we truncate, discard, create a new one? All those options destroy information. We prefer to fail with context and let the user decide on recovery. An honest CLI says “your data file is corrupted, here is the parsing error”, it doesn’t silently delete 50 tasks and pretend everything is fine.

The intermediate TasksFile struct
#

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct TasksFile {
    tasks: Vec<Task>,
}

impl From<Vec<Task>> for TasksFile {
    fn from(value: Vec<Task>) -> Self {
        Self { tasks: value }
    }
}

Why a wrapper struct instead of serializing Vec<Task> directly? Two reasons:

  1. Extensibility. If tomorrow you need to add a version: u32 or last_modified: DateTime field to the file, you only modify TasksFile. The change does not affect the trait or the use cases.
  2. JSON Semantics. Serializing a Vec<Task> produces [{...}, {...}] as the JSON root. Serializing TasksFile produces {"tasks": [{...}, {...}]}. The second form is a JSON object at the root, which is easier to extend (add keys) without breaking existing parsers. It is a subtle but widely recommended convention: configuration and data JSONs should have an object as the root, not an array.

TasksFile is a type that lives inside the adapter, not in the trait. It is another implementation detail invisible to the rest of the system.

Implementation of the trait: the four operations
#

With the internal read/write methods resolved, implementing the trait is straightforward. But each operation has nuances that deserve attention:

save: upsert by ID
#

impl TaskRepository for JsonFileTaskRepository {
    fn save(&mut self, task: Task) -> RepoResult<()> {
        let mut tasks_file = self.read_task_file()?;

        if let Some(index) = tasks_file
            .tasks
            .iter()
            .position(|stored| stored.task_id() == task.task_id())
        {
            tasks_file.tasks[index] = task;
        } else {
            tasks_file.tasks.push(task);
        }

        self.write_tasks_file(&tasks_file)
    }
}

save implements upsert semantics: if a task with the same task_id() already exists, it replaces it; if not, it appends it to the end. This is consistent with the immutable design of the domain: when you execute mark_done, you produce a new instance of Task with the same ID but a different state. The repository saves the new version without the need for a separate update method.

Compare with the in-memory implementation: there, upsert is free because HashMap::insert does it by nature. Here we need to explicitly search for it with position(). Same semantics, different mechanism. That is what it means to implement a contract.

The position() + direct indexing pattern is more efficient than remove + push because it doesn’t shift elements in the vector. It’s the kind of detail that shows decisions are not accidental.

list: filtering in the adapter
#

fn list(&self, query: TaskQuery) -> RepoResult<Vec<Task>> {
    let TasksFile { tasks } = self.read_task_file()?;
    match query {
        TaskQuery::All => Ok(tasks),
        TaskQuery::ByStatus(status) => Ok(tasks
            .iter()
            .filter(|&t| t.status() == status)
            .cloned()
            .collect()),
    }
}

Filtering happens in the adapter, not in the use case. Is this correct from a hexagonal point of view? Strictly, you could argue that filtering is business logic and should live in the application layer. But there is a practical argument: if tomorrow you change to an SQL database, the query would do the filtering (WHERE status = ?). Putting it in the adapter allows each implementation to filter in the most efficient way for its technology.

Notice the destructuring let TasksFile { tasks } = .... This pattern extracts the tasks field from the struct directly, consuming it. It’s cleaner than let tasks = tasks_file.tasks and makes it clear that we don’t need the wrapper afterwards.

find_by_id: expected result
#

fn find_by_id(&self, id: Uuid) -> RepoResult<Option<Task>> {
    let TasksFile { tasks } = self.read_task_file()?;
    Ok(tasks.iter().find(|&t| t.task_id() == id).cloned())
}

Three lines. Read the file, find by ID, return Option. The .cloned() is necessary because .find() returns Option<&Task> (a reference to the element in the vector), but we need to return Option<Task> (owned value) because the vector is destroyed when exiting the function.

delete: idempotence with visibility
#

fn delete(&mut self, id: Uuid) -> RepoResult<bool> {
    let mut tasks_file = self.read_task_file()?;
    let initial_len = tasks_file.tasks.len();
    tasks_file.tasks.retain(|task| task.task_id() != id);

    if tasks_file.tasks.len() == initial_len {
        return Ok(false);
    }

    self.write_tasks_file(&tasks_file)?;
    Ok(true)
}

The implementation uses retain(), which filters in-place keeping only the elements that fulfill the predicate. If the size of the vector didn’t change, the task didn’t exist: it returns false without writing to disk. If it changed, it writes the updated file and returns true.

There is a subtle detail: if the task doesn’t exist, it is not written to disk. This is a micro-optimization but also a correctness decision: writing a file identical to what was already there is an unnecessary side effect that could confuse file monitoring tools or backup triggers.

Compare again with the in-memory one: there delete uses HashMap::remove, which returns Option<Task>. Here we use retain + length comparison. Two completely different mechanisms, same contract semantics (bool). The implementation is interchangeable. The contract is not.

The error chain: from disk to application
#

It’s worth seeing how persistence errors flow upwards. In Post 2 we defined the full error chain by layers. The persistence part fits like this:

// Output port (ports/outputs/errors.rs)
#[derive(Debug, Error)]
pub enum RepoError {
    #[error("internal error: {error}")]
    InternalError { error: String },
}

// Application layer (application/errors.rs)
#[derive(Debug, Error)]
pub enum ApplicationError {
    #[error(transparent)]
    Domain(#[from] DomainError),
    #[error(transparent)]
    Repository(#[from] RepoError),
}

When JsonFileTaskRepository fails reading a file, it produces a RepoError::InternalError. The use case propagates that error with ? and it is automatically converted into ApplicationError::Repository thanks to the #[from]. Then in main.rs, it is converted into CliError::Application. At no point in the chain is there an unwrap() or a panic!(). The user sees a readable error message and the process terminates with code 1.

Note that the InMemoryTaskRepository never produces errors (all its operations return Ok(...)). But the trait forces returning RepoResult. Is it overhead? No: it’s the cost of having a contract that covers real implementations where the disk can fail. The in-memory one simply never exercises that error path, but respects it.

Complement with repo history
#

If you want to see the evolutionary order in the repository:

You can clearly see how the full behavior is implemented first and then hardened with tests, a topic we cover in depth in the next chapter.

From the pantry to quality control
#

In this chapter, we have opened the pantry and equipped it with two storage solutions that share the same contract. The TaskRepository trait defines what the application needs; the InMemoryTaskRepository and JsonFileTaskRepository adapters decide how they provide it. Dependency inversion isn’t a diagram on a whiteboard, it’s a trait in ports/outputs/ and two impl in adapters/persistence/.

But having an equipped pantry is not enough. How do we know the ingredients are in good condition? How do we verify that the preserves haven’t spoiled between uses? In the next chapter we put on our quality control hat: behavior-driven testing strategy, isolation with tempdir, tests that validate the contract without coupling to the implementation, and the technical debt we decided to document instead of hide.

See you in the kitchen!

Todo CLI in Rust no fluff - This article is part of a series.
Part 3: This Article