मुख्य विषयवस्तु में जाएं

Troubleshoot Object Storage

विहंगावलोकन

This page catalogs the symptoms an operator most commonly encounters when something is off with the object-storage layer, the underlying cause, and the recovery procedure. Each recipe ends in one or two concrete कुबेक्टल नहीं तो helm commands.

502 Bad Gateway from /external/object-storage/ नहीं तो /बाहरी/मिनी/

Symptom

Loading http://<ingress>/external/object-storage/ नहीं तो http://<ingress>/external/minio/ returns 502 Bad Gateway से nginx. The Object Storage view in the इलम UI shows the gateway error inside the iframe.

Likely cause

वही ilum-objectstorage Service alias has no endpoints. The selector points at a label that no pod carries.

Diagnosis

Inspect the alias annotation, selector, and endpoints:

कुबेक्टल -n ilum get svc ilum-objectstorage \
-ओ jsonpath='active-provider: {.metadata.annotations.ilum\.cloud/object-storage-active-provider}{"\n"}selector: {.spec.selector}{"\n"}'
कुबेक्टल -n ilum get endpoints ilum-objectstorage

If the endpoints column shows , the selector does not match any pod. Common causes:

  • objectStorage.activeProvider was set to a name that does not match any running provider's app.kubernetes.io/name label.
  • The provider's chart was disabled (.enabled=false) without flipping activeProvider to a still-running provider.
  • A pre-upgrade override left the alias selector in an inconsistent state.

Recovery

Roll back to the last release revision whose values are known to be correct:

helm इतिहास इलम -n इलम
helm rollback ilum <revision> -n इलम
कुबेक्टल -n ilum rollout restart deploy/ilum-ui

Alternatively, override activeProvider to a still-running provider and re-upgrade:

helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो objectStorage.activeProvider=auto
कुबेक्टल -n ilum rollout restart deploy/ilum-ui

/external/object-storage/ redirects in a loop

Symptom

The browser keeps bouncing between /external/object-storage/ and the provider-specific console path; the page never renders.

Likely cause

The active provider's consoleMode है nginx-rewrite and its consolePath है /external/object-storage/ itself, so the redirect sends the browser back to where it came from.

Recovery

Set the provider's consolePath to a provider-specific path so the redirect target is distinct:

helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो objectStorage.providers.<प्रदाता>.consolePath=/external/<प्रदाता>/

Object Storage nav button does not load

Symptom

Clicking the Object Storage entry in the इलम UI loads a blank iframe or shows a "file not found" message.

Likely cause

ILUM_OBJECT_STORAGE_PATH में इलम यूआई कॉन्फ़िगरेशनमैप resolves to a path that the nginx proxy does not route, or no provider is active and the path falls back to the chart-wide default /external/object-storage/ which then 404s because no upstream is configured.

Diagnosis

Inspect the runtime path the UI uses:

कुबेक्टल -n ilum get configmap ilum-ui \
-ओ jsonpath='ILUM_OBJECT_STORAGE_PATH={.data.ILUM_OBJECT_STORAGE_PATH}{"\n"}'

Cross-check against the nginx configuration for the matching location block:

कुबेक्टल -n इलम exec deploy/ilum-ui -c ilum-ui -- \
grep -A5 'location /external/' /etc/nginx/conf.d/server.conf

Recovery

Ensure an in-cluster provider is enabled and either rely on the resolved default or override objectStorage.providers..consolePath explicitly. Then restart the इलम UI to pick up the new कॉन्फ़िगरेशनमैप:

helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो <प्रदाता>.enabled=सच्चा
कुबेक्टल -n ilum rollout restart deploy/ilum-ui

helm template fails with "3 providers enabled"

Symptom

एक helm install नहीं तो हेल्म अपग्रेड fails at render time with a message similar to:

Error: ... objectStorage: 3 providers enabled ([minio rustfs seaweedfs]);
set objectStorage.activeProvider= to pick which one user traffic
routes through

Likely cause

More than two providers are enabled simultaneously, and objectStorage.activeProvider is left at auto. The chart refuses to guess.

Recovery

Set the active provider explicitly:

helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो objectStorage.activeProvider=<प्रदाता>

Alternatively, disable the providers that are not relevant to user traffic by setting their सक्षम flags to गलत.

Alias has no endpoints despite a running provider

Symptom

A provider pod is running and ready, but kubectl get endpoints ilum-objectstorage shows .

Likely cause

The pod's labels do not match the alias सेवा selector. The selector requires both app.kubernetes.io/name: और app.kubernetes.io/instance: .

Diagnosis

कुबेक्टल -n ilum get pod -l app.kubernetes.io/name=<प्रदाता> \
-ओ jsonpath='{.items[*].metadata.labels}'
कुबेक्टल -n ilum get svc ilum-objectstorage -ओ jsonpath='{.spec.selector}'

Recovery

For pods deployed by a chart, ensure the chart sets both required labels. For hand-rolled Deployments (such as those created by the Add a New Provider procedure), patch the pod template to include the missing labels and re-roll the परिनियोजन.

Stuck pending-upgrade after a failed helm upgrade --wait

Symptom

helm history ilum shows a revision in pending-upgrade state. Every subsequent हेल्म अपग्रेड fails immediately with a message similar to:

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Likely cause

A previous helm upgrade --wait was interrupted (network drop, laptop crash, Ctrl-C). The release Secret recording the in-flight upgrade was never finalized.

Recovery

Delete the stuck release Secret and retry:

कुबेक्टल -n ilum get secret -l owner=helm,name=इलम
कुबेक्टल -n ilum delete secret sh.helm.release.v1.ilum.v<revision>
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values

The revision number is the highest one listed by helm history ilum that is in pending-upgrade राज्य।

Cutover acknowledged but the alias still targets the old provider

Symptom

objectStorage.cutoverAcknowledged=true is set (or its legacy alias rustfs.migrationAcknowledged=true), but the alias annotation still shows the previous provider.

Likely cause

Either the इलम UI's कॉन्फ़िगरेशनमैप was not regenerated (the rollme: annotation that forces a इलम यूआई rollout did not change), or the operator did not run हेल्म अपग्रेड after flipping the flag.

Recovery

Re-run हेल्म अपग्रेड and force a UI rollout:

helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो objectStorage.cutoverAcknowledged=सच्चा
कुबेक्टल -n ilum rollout restart deploy/ilum-ui

Verify by inspecting the alias annotation:

कुबेक्टल -n ilum get svc ilum-objectstorage \
-ओ jsonpath='{.metadata.annotations.ilum\.cloud/object-storage-active-provider}{"\n"}'

Bucket-init Job stays Pending or fails

Symptom

After helm install नहीं तो हेल्म अपग्रेडवही init-rustfs-buckets नहीं तो init-minio-policies नौकरी does not reach Complete. helm install --wait times out, or the bundled consumers report missing buckets at startup.

Likely cause

One of the following:

  • वही ilum-objectstorage-credentials Secret is missing or has empty values for access-key / secret-key.
  • The provider's Service is reachable on cluster DNS but the provider pod is not yet Ready; the init नौकरी's wait-for- init container is still looping.
  • The provider rejected the credentials (the bundled image baked in a different default than the live Secret).

Diagnosis

कुबेक्टल -n ilum logs job/init-rustfs-buckets -c wait-for-rustfs --tail=50
कुबेक्टल -n ilum logs job/init-rustfs-buckets --tail=200
कुबेक्टल -n ilum get secret ilum-objectstorage-credentials \
-ओ jsonpath='{.data.access-key}' | base64 -d; echo

Recovery

Populate the credentials Secret with all six aliased keys (access-key, secret-key, root-user, root-password, RUSTFS_ACCESS_KEY, RUSTFS_SECRET_KEY) and re-run the upgrade. The init नौकरी is idempotent; it can be retried by deleting and re-applying via हेल्म अपग्रेड:

कुबेक्टल -n ilum delete job init-rustfs-buckets || सच्चा
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values

Credentials lookup error on हेल्म अपग्रेड

Symptom

हेल्म अपग्रेड fails at render time with a message similar to:

Error: ... values don't meet the specifications of the schema(s) ...
... ilum-objectstorage-credentials lookup is missing required keys ...

Likely cause

The chart resolves credentials in this order: live Secret values via lookup (when objectStorage.credentials.preserveExisting=true), then the literal defaults in मान.yaml. When the live Secret exists but is missing one of the six aliased keys, the lookup returns an incomplete dictionary and the template fails the schema check.

Recovery

Either re-create the Secret with all six aliased keys, or disable the lookup and let the chart re-render the defaults:

# Option A: repopulate the Secret.
कुबेक्टल -n ilum delete secret ilum-objectstorage-credentials
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values

# Option B: force deterministic render (loses any rotated credentials).
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो objectStorage.credentials.preserveExisting=गलत

PVC bound to wrong StorageClass

Symptom

The provider's StatefulSet नहीं तो परिनियोजन stays Pending. The pod's events log a message similar to:

0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims

Likely cause

The chart-default storageClassName resolves to a class that does not match a CSI driver available on the cluster. This is common when moving the chart between cloud providers without overriding the storage class.

Recovery

Destructive

Deleting an existing PersistentVolumeClaim deletes the underlying volume on most CSI drivers. Use this recipe on net-new installs only.

Set the correct storage class and re-roll the PVCs:

कुबेक्टल -n ilum get storageclass
कुबेक्टल -n ilum delete pvc -l app.kubernetes.io/name=rustfs
helm upgrade ilum ilum/helm_aio -n ilum --reuse-values \
--अस्त हो rustfs.persistence.storageClass=<cluster-storage-class>

For pre-existing data, snapshot the source PVC and restore against the correct storage class before deletion. See Back Up and Restore Object Storage.

Post-cutover consumer still writes to the previous provider

Symptom

objectStorage.cutoverAcknowledged=true is set and mc diff confirms data parity, but one or more bundled consumers continue writing into the old provider's bucket.

Likely cause

The consumer cached its S3 endpoint at startup and has not refreshed since the cutover. The ilum-objectstorage Service alias re-targets the new provider instantly, but consumers that resolve the alias once on Pod startup do not pick up the change until they restart.

वही इलम UI rolls automatically when the हेल्म अपग्रेड regenerates the इलम यूआई कॉन्फ़िगरेशनमैप. Other consumers do not.

Recovery

Restart every consumer that targets the alias:

कुबेक्टल -n ilum rollout restart \
deploy/ilum-core \
deploy/ilum-jupyter \
deploy/ilum-mlflow \
deploy/ilum-kestra \
deploy/ilum-langfuse-web \
statefulset/ilum-hive-metastore

Long-running Spark driver Pods are unaffected: each Spark job creates its own S3 client and resolves the alias afresh.

Reference