Todo CLI in Rust 3.1. Testing strategy and explicit technical debt

Table of Contents

Todo CLI in Rust no fluff - This article is part of a series.

Part 0: Todo CLI in Rust 0. Series roadmap and repository map

Part 1: Todo CLI in Rust 1. Hexagonal architecture in a small project

Part 2: Todo CLI in Rust 2. Immutable domain and typed errors by layer

Part 3: Todo CLI in Rust 3. JSON persistence, contract vs implementation

Part 4: This Article

Part 5: Todo CLI in Rust 4. Building the CLI with clap: typed parsing, subcommands and dual output

Part 6: Todo CLI in Rust 5. Next step moving from CLI to a TUI with ratatui

We continue with the series.

In the previous chapter we opened the pantry, defined the TaskRepository contract, and set up two implementations: an in-memory HashMap and a JSON file to disk. The kitchen has fresh ingredients and well-labeled preserves. But how do we know the preserves haven’t spoiled between uses? How do we verify that the system behaves as promised?

This chapter is the kitchen’s quality control. We are not going to test each ingredient separately in a sterile lab; we are going to prepare real dishes with each implementation and check that the result is what we expect. It’s behavior-driven testing, not implementation-driven.

todo-cli-in-rust-31-testing-strategy-and-explicit-technical-debt-img-9.png

Reference code:

src/tasks/adapters/persistence/in_memory_task_repository.rs (tests module)
src/tasks/adapters/persistence/json_file_task_repository.rs (tests module)

Testing strategy: test behavior, not implementation
#

Before looking at the tests, it’s worth explaining what kind of tests we write and why.

In a project with a hexagonal architecture, persistence tests can fall into a common trap: testing implementation details. “Was it inserted into the HashMap?” “Was the JSON written with the correct indentation?” “Was fs::write called exactly once?” Those tests are fragile; they break when you refactor without changing the behavior.

We test the observable behavior of the contract. The questions our tests answer are:

If I save a task and search for it by ID, do I find it?
If I save a modified task with the same ID, does it update?
If I delete an existing task, does it return true?
If I delete one that doesn’t exist, does it return false without error?
If I filter by status, do I get only the correct tasks?

These questions are independent of the implementation. They work equally well for a HashMap, a JSON file, an SQLite database, or a remote REST service. And that makes them resistant to internal refactoring.

The pattern is always the same: Arrange → Act → Assert against the public interface of the TaskRepository trait, never against the adapter’s internal methods.

`InMemoryTaskRepository` tests
#

The in-memory one is the simplest adapter. Its tests validate that the contract works correctly in the base case, with no disk, no latency, and no side effects. Every test starts with an empty repository and exclusively uses methods from the trait.

Full code: in_memory_task_repository.rs#tests

The `new_task` helper
#

fn new_task(title: &str) -> Task {
    Task::new(title.to_string()).expect("task should be created")
}

All test suites share a minimal helper: creating a valid task with a given title. The .expect() is intentional; if Task::new fails here, it’s a bug in the domain that should blow up immediately, not be silenced with an anonymous unwrap(). The message "task should be created" makes it clear what broke if the test fails.

`save_and_find_by_id_returns_task`
#

#[test]
fn save_and_find_by_id_returns_task() {
    let mut repo = InMemoryTaskRepository::new();
    let task = new_task("learn rust");
    let id = task.task_id();

    repo.save(task).expect("save should succeed");

    let found = repo.find_by_id(id).expect("find should succeed");
    assert!(found.is_some());
    let found = found.expect("task should exist");
    assert_eq!(found.task_id(), id);
    assert_eq!(found.title(), "learn rust");
}

The most fundamental test: save and retrieve. It verifies that the task_id is preserved and that the title is neither lost nor mutated along the way. If this test fails, nothing else works; it’s the foundation upon which all operations rest.

Notice the sequence of unwraps: first expect on save, then expect on find_by_id (the RepoResult), then expect on the Option. Each with a message diagnosing exactly where the chain failed.

`save_is_upsert_when_same_id_is_saved_again`
#

#[test]
fn save_is_upsert_when_same_id_is_saved_again() {
    let mut repo = InMemoryTaskRepository::new();
    let original = new_task("pay rent");
    let id = original.task_id();
    repo.save(original.clone()).expect("save should succeed");

    let updated = original
        .mark_done()
        .expect("status transition should succeed");
    repo.save(updated).expect("save should succeed");

    let all = repo.list(TaskQuery::All).expect("list should succeed");
    assert_eq!(all.len(), 1);

    let found = repo.find_by_id(id).expect("find should succeed");
    let found = found.expect("task should exist");
    assert_eq!(found.status(), TaskStatus::Done);
}

This test validates the upsert semantics of the contract. When you save a task with the same ID that already exists, it is not duplicated; it is replaced. This is exactly what the immutable domain needs: mark_done() produces a new instance with the same ID but a Done status, and the repository substitutes it.

Notice the .clone() in original.clone(). It’s necessary because save takes ownership (the signature is fn save(&mut self, task: Task)), but then we need original to call mark_done(). The clone is the cost of immutability with ownership transfer, an explicit and traceable cost.

The assertions verify two things: (1) there is only one task in total (all.len() == 1), and (2) its status is Done. If there were duplication, len() would be 2. If the upsert didn’t replace, the status would remain Todo.

`delete_returns_true_for_existing_task`
#

#[test]
fn delete_returns_true_for_existing_task() {
    let mut repo = InMemoryTaskRepository::new();
    let task = new_task("task to delete");
    let id = task.task_id();
    repo.save(task).expect("save should succeed");

    let deleted = repo.delete(id).expect("delete should succeed");
    assert!(deleted);

    let found = repo.find_by_id(id).expect("find should succeed");
    assert!(found.is_none());
}

Two complementary assertions: delete returns true and the task is no longer locatable. If you only verified the bool without searching afterwards, a bug where delete returns true but doesn’t delete would go unnoticed.

`delete_returns_false_for_non_existing_task`
#

#[test]
fn delete_returns_false_for_non_existing_task() {
    let mut repo = InMemoryTaskRepository::new();
    let id = new_task("temporary").task_id();

    let deleted = repo.delete(id).expect("delete should succeed");
    assert!(!deleted);
}

Verifies idempotency: deleting something that doesn’t exist is not an error, it returns false. Notice the trick to get a valid UUID that is not in the repository: create a task (which generates a Uuid::new_v4() internally) and use only its ID, without saving it.

`list_all_returns_all_tasks`
#

#[test]
fn list_all_returns_all_tasks() {
    let mut repo = InMemoryTaskRepository::new();
    repo.save(new_task("task 1")).expect("save should succeed");
    repo.save(new_task("task 2")).expect("save should succeed");

    let all = repo.list(TaskQuery::All).expect("list should succeed");
    assert_eq!(all.len(), 2);
}

Straightforward. Saves two, lists all, expects two. It doesn’t verify order because HashMap doesn’t guarantee iteration order, and the contract doesn’t promise it either.

`list_by_status_filters_tasks`
#

#[test]
fn list_by_status_filters_tasks() {
    let mut repo = InMemoryTaskRepository::new();
    let todo = new_task("todo task");
    let done = new_task("done task")
        .mark_done()
        .expect("status transition should succeed");

    repo.save(todo).expect("save should succeed");
    repo.save(done).expect("save should succeed");

    let done_tasks = repo
        .list(TaskQuery::ByStatus(TaskStatus::Done))
        .expect("list should succeed");
    assert_eq!(done_tasks.len(), 1);
    assert_eq!(done_tasks[0].status(), TaskStatus::Done);

    let todo_tasks = repo
        .list(TaskQuery::ByStatus(TaskStatus::Todo))
        .expect("list should succeed");
    assert_eq!(todo_tasks.len(), 1);
    assert_eq!(todo_tasks[0].status(), TaskStatus::Todo);
}

Verifies both branches of the filter in the same test: one Todo task, one Done task, filters by each status, and checks that only the correct one appears. The double assert (len + status) is intentional: len == 1 could be either of the two; status() == Done confirms it’s the right one.

`JsonFileTaskRepository` tests
#

Here things change substantially. We are no longer working in pure memory: there is a real filesystem, JSON serialization, and the risk of a test leaving residue that contaminates the next one.

Full code: json_file_task_repository.rs#tests

Isolation with `tempdir`
#

Every test for the JSON adapter follows the same setup pattern:

let temp = tempdir().expect("temp dir should be created");
let file_path = temp.path().join("tasks.json");
let mut repo = JsonFileTaskRepository::using(file_path.clone());

Three lines that solve three problems:

tempdir() (from the tempfile crate): creates a unique temporary directory per test. When the temp variable is destroyed at the end of the test (drop), the directory is automatically deleted. No manual cleanup, no risk of residue between tests.
file_path = temp.path().join("tasks.json"): builds a path inside the temporary directory. The file doesn’t exist yet; it is created the first time save writes to it.
JsonFileTaskRepository::using(file_path): uses the alternative constructor that accepts an arbitrary path, instead of new() which resolves the platform path with directories. This constructor exists exclusively for tests; it’s a clean testing seam that allows injecting the location without modifying the adapter’s logic.

The .clone() in file_path.clone() is necessary because using() takes ownership of the PathBuf, but some tests need the path later to verify file existence or to create a second repository pointing to the same file.

`save_creates_file_and_persists_task`
#

#[test]
fn save_creates_file_and_persists_task() {
    let temp = tempdir().expect("temp dir should be created");
    let file_path = temp.path().join("tasks.json");
    let mut repo = JsonFileTaskRepository::using(file_path.clone());

    let task = new_task("learn rust");
    let id = task.task_id();

    repo.save(task).expect("save should succeed");

    assert!(file_path.exists());
    let found = repo.find_by_id(id).expect("find should succeed");
    assert!(found.is_some());
}

This test verifies something the in-memory one cannot verify: that save creates the file on disk. assert!(file_path.exists()) is a pure infrastructure assertion; we are testing the adapter’s side effect, not the business logic. This is exactly what justifies having separate tests per adapter: each implementation has different observable behaviors.

`list_all_returns_tasks_persisted_on_disk`
#

#[test]
fn list_all_returns_tasks_persisted_on_disk() {
    let temp = tempdir().expect("temp dir should be created");
    let file_path = temp.path().join("tasks.json");
    let mut repo = JsonFileTaskRepository::using(file_path);

    repo.save(new_task("task 1")).expect("save should succeed");
    repo.save(new_task("task 2")).expect("save should succeed");

    let all = repo.list(TaskQuery::All).expect("list should succeed");
    assert_eq!(all.len(), 2);
}

Functionally identical to the in-memory test, but it implicitly verifies that serialization/deserialization is transparent: the data passes through serde_json::to_string when saving and serde_json::from_str when listing, and the result is still correct. If there is a bug in Task’s Serialize/Deserialize derives, this test catches it.

`delete_returns_true_for_existing_task_and_false_otherwise`
#

#[test]
fn delete_returns_true_for_existing_task_and_false_otherwise() {
    let temp = tempdir().expect("temp dir should be created");
    let file_path = temp.path().join("tasks.json");
    let mut repo = JsonFileTaskRepository::using(file_path);

    let task = new_task("task to delete");
    let id = task.task_id();
    repo.save(task).expect("save should succeed");

    let deleted = repo.delete(id).expect("delete should succeed");
    assert!(deleted);

    let deleted_again = repo.delete(id).expect("delete should succeed");
    assert!(!deleted_again);
}

Notice that this test combines two scenarios into one: deletes an existing one (true), deletes again (false). In the in-memory version, we had two separate tests. Why combine them here? Because each test for the JSON adapter has a higher setup cost (creating a temporary directory, writing a file). Combining related scenarios reduces that overhead without sacrificing readability. It’s a pragmatic trade-off, not a rule.

The second delete also implicitly verifies that the file was updated correctly: if retain didn’t write the file without the task, the second delete would find it and return true.

`data_persists_between_repository_instances`
#

#[test]
fn data_persists_between_repository_instances() {
    let temp = tempdir().expect("temp dir should be created");
    let file_path = temp.path().join("tasks.json");

    let mut writer = JsonFileTaskRepository::using(file_path.clone());
    let task = new_task("persist me");
    let id = task.task_id();
    writer.save(task).expect("save should succeed");

    let reader = JsonFileTaskRepository::using(file_path);
    let found = reader.find_by_id(id).expect("find should succeed");
    assert!(found.is_some());
}

This is the most revealing test in the suite. It creates two instances of the repository pointing to the same file: one writes, another reads. It verifies that the data survives destroying and rebuilding the repository.

Why does it matter? Because the in-memory version loses data when it is destroyed, but the JSON adapter must survive between process executions. This test simulates exactly that: the first execution saves a task, the second execution (new repository instance) finds it. If the JSON is written in a format that cannot be reread, or if TasksFile doesn’t deserialize correctly, this test fails.

Notice the variable names: writer and reader. They are not repo1 and repo2. The names communicate the intent of the test, not the mechanics.

`invalid_json_returns_error`
#

#[test]
fn invalid_json_returns_error() {
    let temp = tempdir().expect("temp dir should be created");
    let file_path = temp.path().join("tasks.json");
    fs::write(&file_path, "{invalid json").expect("invalid test payload should be written");

    let repo = JsonFileTaskRepository::using(file_path);
    let result = repo.list(TaskQuery::All);

    assert!(result.is_err());
}

This test does something no business logic test would do: corrupts the data file on purpose. It writes invalid JSON directly with fs::write, then tries to read with the repository and verifies that it gets Err, not a panic.

It is an adapter resilience test. It covers the case where the file gets corrupted due to a system crash, incorrect manual editing, or a serialization bug. The correct response is an explicit error, never an unwrap() that kills the process or a silent “there is no data”.

Why there are no shared tests between adapters
#

A legitimate question: if both adapters implement the same trait, why isn’t there a generic test suite that runs against any impl TaskRepository?

It could be done. Something like this:

fn test_save_and_find<R: TaskRepository>(repo: &mut R) {
    let task = new_task("test");
    let id = task.task_id();
    repo.save(task).expect("save should succeed");
    let found = repo.find_by_id(id).expect("find should succeed");
    assert!(found.is_some());
}

The reason for not doing it is pragmatic, not technical:

Each adapter has different observable behaviors. The JSON adapter needs to verify that the file is created (file_path.exists()), that data persists between instances, and that corrupt JSON produces an error. The in-memory one doesn’t have any of these scenarios. Shared tests would cover only the intersection, the common denominator, and would miss precisely the most valuable scenarios.
The setup is different. The in-memory one is instantiated with InMemoryTaskRepository::new(). The JSON one needs tempdir() + using(path). A generic factory to abstract the setup adds complexity without gaining significant coverage.
Duplication is minimal. There are 6 tests per adapter, with signatures of a few lines each. The real duplication (the “create repo, save task, find task, assert” pattern) is so small that the abstraction would be more code than the repetition.
Readability is maximized. Each test is read from top to bottom without jumping to generic functions. When a test fails, you know exactly which adapter and which scenario broke, with no indirections.

If we had 10 adapters with 30 tests each, the decision would be different. With 2 adapters and 12 tests in total, controlled duplication is more maintainable than premature abstraction.

Explicit technical debt
#

We are not going to pretend that this system is finished or that every decision is final. There are three points of technical debt that we consciously chose and documented instead of hiding.

1. Generic `RepoError` with `String`
#

#[derive(Debug, Error)]
pub enum RepoError {
    #[error("internal error: {error}")]
    InternalError { error: String },
}

As we saw in Post 3, RepoError has a single variant with a free String field. This means that if you want to programmatically distinguish “the file doesn’t exist” from “the JSON is corrupt” from “no write permissions”, you can’t: they are all InternalError with different text.

Why is it acceptable today? Because the only consumer of RepoError is the error propagation chain up to main.rs, where the message is printed and it exits with code 1. No one does a match on RepoError variants to make business decisions.

When would it stop being acceptable? When a use case needs to react differently depending on the type of persistence error. For example: “if the file doesn’t exist, create it automatically; if there are insufficient permissions, suggest sudo”. At that point, InternalError { error: String } would be insufficient and variants like IoError(std::io::Error), ParseError(serde_json::Error), PermissionDenied(PathBuf) would need to be introduced.

The cost of resolving it: defining typed variants in RepoError, mapping each specific error in each adapter, and possibly importing std::io and serde_json types in the output port (which strains the abstraction boundary). It is not difficult, but it adds no value for current use.

2. No file locking in `JsonFileTaskRepository`
#

The adapter reads the entire file, modifies it in memory, and writes it back. If two processes execute todo add "X" and todo add "Y" simultaneously:

Process A reads tasks.json (has 3 tasks).
Process B reads tasks.json (has 3 tasks).
Process A writes tasks.json with 4 tasks (its own).
Process B writes tasks.json with 4 tasks (its own), overwriting A’s.

Result: one task is lost. It’s a classic read-modify-write race condition without mutual exclusion.

Why is it acceptable today? Because a personal task CLI is executed by a single user, from a single terminal, one operation at a time. The probability of two simultaneous writes is virtually zero.

When would it stop being acceptable? If the tool were used in a multi-process environment (a daemon syncing tasks, a CI script modifying tasks in parallel, or an editor integration doing autosave). At that point, you would need file locking (flock on Unix, LockFile on Windows) or an atomic write mechanism (write to a temp file + rename).

3. No JSON versioning or migration
#

The tasks.json file format is implicit: it is the direct serialization of TasksFile { tasks: Vec<Task> }. If tomorrow you add a priority: Priority field to Task, the existing JSON will not have that field. Depending on how you configure serde, it might fail or it might use a silent default value.

Why is it acceptable today? Because the project is in the exploration phase. The file format can change between versions without us worrying about real users’ legacy data.

When would it stop being acceptable? When there are real users with data files that you want to preserve between binary updates. At that point you would need:

A version: u32 field in TasksFile (hence the wrapper struct, as we saw in Post 3).
A migration function migrate(old: TasksFileV1) -> TasksFileV2.
Or alternatively, strategically use #[serde(default)] for new fields with reasonable default values.

What tests don’t cover (and that’s okay)
#

No test suite covers 100% of scenarios. It is more valuable to know what we do not cover and why, than to pretend total coverage:

Performance tests. We don’t measure how long it takes to read/write a file with 10,000 tasks. For a personal tool with dozens of tasks, it’s not relevant.
Concurrency tests. We don’t verify the file locking race condition. We document it as explicit technical debt (point 2 above).
End-to-end integration tests. We don’t execute the compiled binary with Command::new() to verify that todo add "X" && todo list works as expected from the shell. That is a level of testing we would add if the tool were distributed as a package.
Constructor new() tests. The production constructor uses ProjectDirs::from() which resolves platform paths. We don’t test it because it depends on the execution environment. That’s why using() exists, so tests don’t depend on system configuration.

Complement with repo history
#

If you want to see how tests were added to the repository:

The pattern is clearly visible: implement first, test later. Not strict TDD, but tests that harden the implementation and protect against future regressions.

From quality control to plating
#

In this chapter, we’ve opened every preserve in the pantry, inspected it, and returned it to the shelf with a stamp of approval. Tests don’t prove the code is perfect; they prove that it behaves as we promised. And technical debt isn’t a dirty secret swept under the rug; it’s an explicit list of decisions we can address when the project justifies it.

With the kitchen equipped (architecture), the ingredients fresh (domain), the preserves verified (tested persistence), and the inventory documented (technical debt), it’s time to come up to the surface and present the dish: building the layer the user actually touches. In the next chapter we plate with clap, designing a CLI with typed parsing, subcommands as enums, and dual output for humans and machines.

See you in the kitchen!

Todo CLI in Rust no fluff - This article is part of a series.

Part 0: Todo CLI in Rust 0. Series roadmap and repository map

Part 1: Todo CLI in Rust 1. Hexagonal architecture in a small project

Part 2: Todo CLI in Rust 2. Immutable domain and typed errors by layer

Part 3: Todo CLI in Rust 3. JSON persistence, contract vs implementation

Part 4: This Article

Part 5: Todo CLI in Rust 4. Building the CLI with clap: typed parsing, subcommands and dual output

Part 6: Todo CLI in Rust 5. Next step moving from CLI to a TUI with ratatui

Testing strategy: test behavior, not implementation #

InMemoryTaskRepository tests #

The new_task helper #

save_and_find_by_id_returns_task #

save_is_upsert_when_same_id_is_saved_again #

delete_returns_true_for_existing_task #

delete_returns_false_for_non_existing_task #

list_all_returns_all_tasks #

list_by_status_filters_tasks #

JsonFileTaskRepository tests #

Isolation with tempdir #

save_creates_file_and_persists_task #

list_all_returns_tasks_persisted_on_disk #

delete_returns_true_for_existing_task_and_false_otherwise #

data_persists_between_repository_instances #

invalid_json_returns_error #

Why there are no shared tests between adapters #

Explicit technical debt #

1. Generic RepoError with String #

2. No file locking in JsonFileTaskRepository #

3. No JSON versioning or migration #

What tests don’t cover (and that’s okay) #

Complement with repo history #

From quality control to plating #