Cloudera CDP to Hadoop In-Place Migration Procedure
Classic performs an in-place migration from Cloudera CDP 7.x clusters onto open-source Apache Hadoop. The cluster topology, bare-metal footprint, and operational model are preserved. Only the distribution and the management tooling change. The migration is executed by Bifrost, not by a human checklist, and every step is reversible until an explicit final step.
When to Use Classic
Classic is appropriate when:
- The priority is removing Cloudera licensing cost while keeping the existing Hadoop architecture.
- Regulatory or organizational constraints prevent a move to Kubernetes in the short term.
- A weekend cutover window is required, with a well-defined rollback path.
- The estate is tightly integrated with bare-metal infrastructure: local-disk HDFS, hardware-level rack awareness, custom tuning.
Classic does not modernize workflows, convert HBase, or move storage. For those, use Modernize after Classic, or use Direct in a single engagement.
Typical program duration: 4 to 6 months for a multi-cluster estate.
Phase Pipeline
Classic executes 11 phases sequentially. Phases 0 through 2 are non-destructive and can be run on live production clusters. Phases 3 and beyond require a maintenance window.
Phase 0 — Discover
Connects to the Cloudera Manager API and inventories the source environment: hosts, roles, rack assignments, service configurations, security assets, encryption zones, keytabs, and TLS certificates. Produces a complete inventory tree under inventories/<cluster>/.
Command: bifrost classic discover. See CLI reference.
Phase 1 — Extract and convert
Translates Cloudera Manager configuration into standalone Hadoop XML templates. The translation runs in three stages:
- Raw extraction — full deployment export plus actual rendered XML from Cloudera Manager agent process directories.
- Property translation — maps Cloudera Manager property names to standard Hadoop equivalents, expands safety valve contents, filters Cloudera-internal properties (internal metrics, validation state, Cloudera-managed paths), and flags unmapped properties for manual review.
- Template generation — produces Jinja2 templates for every configuration file and populates them with cluster-specific variables.
After generation, Bifrost renders the templates, canonicalizes with xmlstarlet c14n, and diffs the result against the source-rendered XML. Remaining differences must be explained and approved before the migration proceeds.
Command: bifrost classic extract.
Phase 2 — Validate-pre
Runs the full pre-flight checklist. Returns PROCEED, WARNनहीं तो ABORT.
| Check | Abort condition |
|---|---|
| fsimage parseable by the target Hadoop version | Parse failure |
| Edit logs parseable by the target Hadoop version | Parse failure |
| NameNode layout version | Unexpected value |
| DataNode layout version | Unexpected value |
| HBase HFile validity | Any corrupt HFile |
| Hive Metastore schema validity | Validation failure |
| Keytab accessibility | Any failure |
| TLS certificate validity | Abort: any certificate expired or expiring during the migration window. Warn: any certificate expiring within 6 months. |
| Encryption zones documented with keys backed up | Any undocumented zone |
| Service ports reachable across nodes | Any unreachable |
| Free disk space on data volumes | Below 10 % free (warn at 20 % free) |
| Distribution package cache created on all nodes | Missing |
| LVM snapshot of NameNode metadata volume | Missing |
| Hive Metastore database backup verified | Corrupt |
| Policy database backup verified | Corrupt |
| Source fsimage compatibility with target NameNode | Parse failure |
The fsimage compatibility test is the most important check. Before any production cutover, Bifrost loads the source fsimage against the target Hadoop version on isolated test infrastructure. A failure here blocks the migration; a pass confirms that the upgrade is safe.
Command: bifrost classic validate-pre.
Phase 3 — Backup
Creates comprehensive rollback assets for every subsequent phase:
- HDFS namespace checkpoint (
saveNamespace) and fsimage copy. - LVM snapshot of the NameNode metadata volume (atomic).
- Hive Metastore database dump.
- Policy database dump.
- Policy export via REST API (JSON).
- Distribution package cache on all nodes (for in-place rollback without internet access).
- Persistent copies of keytabs and TLS certificates.
- Baseline metrics: HDFS fsck, HDFS report, HBase status, Hive table counts.
Phase 4 — Stop services
Graceful shutdown in dependency order. Master services run with serial: 1; worker services run in batches. Every stop includes a reachability check to confirm the port has closed. After shutdown, Bifrost verifies that all Java processes are terminated and stops the cluster manager agent on each node.
Phase 5 — Swap packages
Removes source distribution packages and installs the target distribution from a local mirror. No internet access is required during the swap.
- Data directories are never touched.
- Compatibility symlinks are created so applications referencing the source distribution paths continue to work.
- Each node runs an inline rescue block: if installation fails, Bifrost automatically reinstalls the source distribution and restarts the cluster manager agent on that node before failing the overall play.
Phase 6 — Deploy configs
Deploys the converted Hadoop XML templates, keytabs from backup, TLS keystores, and sets the hadoop-conf alternatives link. Includes assertion checks that confirm critical properties in the rendered XML match expected values.
Phase 7 — Start services and HDFS upgrade
Starts services in reverse dependency order. The exact sequence depends on the migration strategy.
Stop-and-swap. The cluster was fully shut down. Each NameNode starts with the hdfs namenode -upgrade startup argument, which converts the fsimage to the new layout version. DataNodes start in canary-first batches (1 node, then 5, then 20 % waves).
Shrink-and-grow. The cluster stays online. Bifrost runs hdfs dfsadmin -rollingUpgrade prepare on the active NameNode, then restarts the Standby NameNode first with the -rollingUpgrade started startup argument, waits for it to catch up, triggers an HA failover, and finally restarts the previously-active NameNode with the same argument. DataNodes follow in canary-first batches.
Prerequisite for shrink-and-grow. HDFS rolling upgrade is only available when the source and target NameNode layoutVersion values match. If the target distribution has a different layout version, rolling upgrade is not supported and the migration must use stop-and-swap. Bifrost's pre-flight check fails early if a shrink-and-grow run is attempted across incompatible layout versions.
Abort gates enforce time limits: if the NameNode does not exit safe mode before its gate time, rollback triggers automatically.
Abort gates are configured per cluster:
abort_gates:
namenode_up_by: "2026-09-15T04:00:00Z"
safe_mode_exit_by: "2026-09-15T06:00:00Z"
all_datanodes_by: "2026-09-15T10:00:00Z"
validation_complete_by: "2026-09-15T18:00:00Z"
Phase 8 — Validate-post
Runs the full validation suite against the pre-migration baseline. Critical checks must all pass. Performance checks warn on regressions greater than 20 %.
Critical checks include HDFS write and read cycles, HDFS fsck cleanliness, block count parity, HBase meta scan, Hive table count parity, Kafka consumer offset preservation, Kerberos authentication success, policy enforcement, and encryption zone accessibility. Performance checks run TeraSort and TestDFSIO and compare against the baseline. Smoke tests cover HDFS, HBase, Hive, Kafka, Spark, and ZooKeeper.
Command: bifrost classic validate-post.
Phase 9 — Ambari takeover
Registers the migrated cluster in Apache Ambari 3.0 for day-2 management. This phase runs after validation succeeds, not during the critical migration window. The cluster is fully functional without Ambari; Ambari is for ongoing management, not for migration execution.
Phase 10 — Finalize
Irreversible. The 5-day soak that precedes finalize is not a recommendation — it is the rollback window. While the soak is in progress, the full "after services started" rollback procedure remains available (roughly a 4-hour revert). Finalize closes that window.
bifrost classic finalize does two things:
- Runs
hdfs dfsadmin -finalizeUpgradeon the NameNode, which deletes theprevious/fsimage directory on the NameNode and every DataNode. After this point, both HDFS-native rollback mechanisms become unavailable: the NameNode start-up-rollbackflag (used by stop-and-swap rollback) andhdfs namenode -rollingUpgrade rollback(used by shrink-and-grow rollback). Both depend on theprevious/directory being present. - Removes the Bifrost-managed rollback assets: the distribution package cache, LVM snapshots, and baseline captures.
Until finalize runs, HDFS stays in the upgrade-in-progress state, with previous/ preserved on every data volume. bifrost classic rollback relies on that state. For the full stage-by-stage rollback model with timings, see Validation and rollback — Rollback (Classic).
Command: bifrost classic finalize --confirm-irreversible.
Migration Strategies
Classic supports two strategies. The choice depends on cluster size and tolerance for downtime.
Stop-and-swap
Full cluster shutdown. Every node is processed in parallel (fork count 50 is typical). Fastest total migration time, but requires complete downtime.
| Cluster size | Typical duration |
|---|---|
| 17 nodes | 4 to 8 hours |
| 29 nodes | 8 to 16 hours |
प्रयोग --strategy stop-and-swap for clusters with fewer than 30 nodes, or when the customer has a well-defined weekend window and a straightforward rollback plan.
Shrink-and-grow
DataNodes are decommissioned in batches from the live source cluster. Each batch follows the same cycle: decommission from HDFS, decommission from YARN, stop services, swap packages, deploy configs, start services, recommission.
HDFS remains available throughout. max_fail_percentage: 0 is the default for production — any single-batch failure aborts the play.
| Cluster size | Typical duration | Typical batch size |
|---|---|---|
| 60 nodes | 1 to 2 days | 10 |
| 120 nodes | 2 to 4 days | 10 |
| 300+ nodes | 4 to 8 days | 15 to 20 |
प्रयोग --strategy shrink-and-grow for clusters over 30 nodes or when partial HDFS availability must be preserved.
Timing Guidance
Plan the weekend window from the abort gates. A typical Stop-and-Swap gate configuration for a Saturday 20:00 start is:
- 04:00 Sunday — NameNode must be active.
- 06:00 Sunday — Safe mode must have exited.
- 10:00 Sunday — All DataNodes must report healthy.
- 18:00 Sunday — Full validation must pass.
Gates fire automatic rollback when exceeded. A cluster that is not healthy by the end of the window is automatically reverted to the source distribution, and the weekend exits with the original configuration intact.
What Classic Does Not Do
Classic explicitly does not handle the following. For these, use Modernize or Direct.
- Oozie to Airflow conversion — Classic preserves Oozie; workflow modernization is a separate workstream.
- HBase migration — Classic preserves HBase on HDFS; HBase migration is a Modernize or Direct track.
- Storage migration — Classic leaves HDFS in place; storage migration to object storage is Modernize or Direct.
- Impala migration — Impala is not a target of the Classic path; Trino evaluation is a separate engagement.
- Atlas migration — Atlas continues to run on HBase; OpenMetadata adoption is a separate engagement.
अगले कदम
- Review CLI reference — Classic commands.
- Configure validation and rollback before cutover.
- Prepare the production readiness checklist.
- After Classic completes, consider Modernize as the next step in progressive modernization.