Skip to main content

Spark

Setting Sail. A clean setup using Docker, Jupyter & RustRover
·8 mins
A practical guide to setting up a clean and reproducible environment with Docker, Jupyter, and RustRover to work with Sail, whether as a user or contributor. From launching services with docker-compose to debugging locally without installing any dependencies on your machine.
Sail. Sailing Through Giants and Sparks
·7 mins
In this article, I share my critical view on the current state of data engineering, dominated by heavyweight platforms like Spark and Databricks, and introduce Sail, an open-source engine built on top of Apache Arrow and DataFusion, written in Rust, that offers a new path: lightweight, efficient, and powerful.
From Outside to Core
·5 mins
Defining the serializer and transformation from Yaml to Core
From Core to Spark
·6 mins
Defining the model core from Scala to Spark
Big Data with Zero Code
·3 mins
Exploring how to create Apache Spark ETL workflows with zero code.