Execution Engines
Ilum is a true multi-engine data lakehouse. SQL queries and data workloads can be executed against any of four engines, all sharing the same catalogs, table formats, and object storage:
- अपाचे स्पार्क : distributed processing for large-scale ETL, machine learning, and complex transformations.
- त्रिगुण : massively parallel processing for interactive analytics and federated queries.
- डकडीबी : single-node, embedded execution for small-to-medium data and DuckLake-managed tables.
- Apache Flink: low-latency stream processing and continuous data pipelines.
All four engines are fronted by the Apache Kyuubi SQL gateway, which provides session management, JDBC/ODBC compatibility, and unified authentication. An automatic engine router observes incoming queries and selects the engine best suited to each workload, with explicit user override available at any time.
स्थापत्यशैली
┌──────────────────────────────────┐
│ SQL Editor / JDBC / ODBC / API │
└────────────────┬─────────────────┘
│
┌────────▼────────┐
│ Apache Kyuubi │ Session management, auth,
│ SQL Gateway │ connection pooling
│ + Auto Router │
└─┬───┬─────┬───┬─┘
│ │ │ │
┌──▼┐ ┌▼─┐ ┌─▼┐ ┌▼──────┐
│Spk│ │Tr│ │Dk│ │Flink │
│ a │ │in│ │DB│ │(Beta) │
│ rk│ │ o│ │ │ │ │
└─┬─┘ └─┬┘ └┬─┘ └─┬─────┘
│ │ │ │
┌─────▼─────▼───▼─────▼──────┐
│ Catalog Layer │
│ Hive · Nessie · Unity │
│ · DuckLake │
├────────────────────────────┤
│ Storage Layer │
│ MinIO / S3 / GCS / WASBS │
│ / HDFS │
└────────────────────────────┘
When to use each engine
| Workload | Recommended engine | Why |
|---|---|---|
| Large-scale ETL, batch transforms | अपाचे स्पार्क | Distributed shuffle, dynamic allocation, mature optimizer |
| Machine learning pipelines | अपाचे स्पार्क | MLlib, Spark ML, GPU executor support |
| Interactive analytics on large datasets | त्रिगुण | Pipelined MPP execution, sub-second latency |
| Federated queries across data sources | त्रिगुण | Connectors for relational, NoSQL, search, lake |
| Ad-hoc exploration on small data | डकडीबी | Zero pod startup, in-process execution |
| Local-first analytics over DuckLake | डकडीबी | Native catalog integration, fast local queries |
| Low-latency stream processing | Apache Flink | Event-time semantics, continuous queries |
| Streaming ingestion with batch parity | Apache Spark Structured Streaming | Same code path as batch jobs |
Engines share the same data through the catalog layer and object storage, so workloads can shift from Spark to Trino to DuckDB without rewriting queries or copying data.
Automatic engine router
The router selects the engine for each incoming SQL query based on several signals:
- Estimated scan size: Pulled from catalog statistics. Small scans favour DuckDB; large scans favour Spark or Trino.
- Workload type: Streaming queries route to Flink; interactive read-only queries route to Trino; heavy joins and aggregations route to Spark.
- Locality: Queries against DuckLake-managed tables favour DuckDB; queries against tables already cached or warmed in another engine prefer that engine.
- Engine availability: Only engines that are deployed and currently healthy are considered.
The router is conservative: when signals are ambiguous, it falls back to a configurable default engine.
Manual override
Every query can override the router by selecting an explicit engine:
- In the SQL Editor: Use the Engine Selector dropdown.
- Through the REST API: Set the
enginefield on thePOST /api/v1/sql/executerequest. - Through JDBC / ODBC: Set a session property (for example,
SET ilum.engine = 'trino').
User overrides always take precedence over router decisions.
Apache Kyuubi SQL Gateway
Kyuubi is the unified SQL gateway in front of Spark, Trino, and Flink. It provides:
- Session management: Long-lived SQL sessions per user with isolation.
- JDBC and ODBC compatibility: Standard interfaces for BI tools (Tableau, PowerBI, Superset).
- प्रमाणीकरण : Integrated with Ilum's RBAC, OAuth2, and LDAP stack.
- Connection pooling: Efficient resource usage across many concurrent queries.
- Engine federation: A single entry point for queries that may target any registered engine.
DuckDB is integrated separately as an in-process engine inside इलम कोर for the lowest possible latency on small queries.
संरूपण
Each engine is enabled and configured through Helm values:
# Spark (always available through ilum-core)
इलम कोर :
उत्तेजक गुण :
सक्षम : सच्चा
# Trino
ट्रिनो :
सक्षम : सच्चा
# DuckDB (default-on, with DuckLake catalog)
इलम कोर :
एसक्यूएल :
duckdb:
ducklake:
सक्षम : सच्चा
# Kyuubi SQL gateway
इलम-एसक्यूएल :
सक्षम : सच्चा
इलम कोर :
एसक्यूएल :
सक्षम : सच्चा
Apache Flink is enabled per Enterprise deployment. Refer to the Flink page for details.
Per-engine documentation
Related pages
- SQL संपादक : The in-product workbench for multi-engine SQL.
- डेटा कैटलॉग : Catalogs that engines share.
- इलम टेबल्स : Unified table abstraction across Delta, Iceberg, and Hudi.
- स्थापत्यशैली : Where engines fit in the platform.