Troubleshooting
Symptom → likely cause → fix, for full nodes (validator / RPC / seed) and edge (verifying light client) nodes. For the canonical ids, ports, and hardware referenced below, see Networks & chain IDs, Ports & firewall, and Hardware specs.
Full nodes (validator / RPC / seed)
| Symptom | Likely cause | Fix |
|---|---|---|
No peers (CL n_peers: 0, EL not finding peers) | Wrong EXT_IP; firewall not opening 30303/26656; bad EL_BOOTNODES/CL_SEEDS | Confirm EXT_IP is the reachable public IP; verify the SG/nftables opens 30303 tcp+udp and 26656 tcp from 0.0.0.0/0; re-check the bootnode/seed strings from the bundle. See Ports & firewall. |
| EL/CL chain-id mismatch (CL crashes at engine startup) | Wrong genesis bundle | bootstrap.sh aborts if eth-genesis.json chainId ≠ 473374; re-fetch the correct testnet bundle (jq .config.chainId); the CL image must be the -473374 tag. See Networks. |
Stuck sync (catching_up: true forever) | Few/no peers; slow disk; EL behind | Check peers (above); confirm DATA_DIR is NVMe, not slow EBS; check EL logs for MDBX/IO errors. |
| JWT mismatch (engine auth 401s in EL/CL logs) | EL and CL see different jwt.hex | Both must mount the same JWT_PATH; if regenerated, restart both; ensure 0600 / correct owner. |
| EL container restarts / OOM | EL_MEM_LIMIT too low for role | Raise per the HW matrix; ensure NVMe; on AWS use gp3/io2. |
| Chain halt (no node finalizing) | >1/3 of voting power offline — e.g. an over-weighted validator went down | This is the offline-whale halt. Bring the offline validators back; the chain self-heals once ≥2/3 power is online. See chain-halt note below. |
Chain halt — recovery and prevention
The chain has no self-heal while halted
The inactivity leak needs the chain to progress, which a halted chain can't — recovery is therefore operational: restore enough online voting power (≥2/3).
Prevent it by bounding any one validator's weight and requiring it be live before it carries weight — the W3 validator-parameter policy:
max-effective-balanceis the safety lever — the ceiling on any one validator's voting weight (mainnet target 16,000 BERA). A single validator's effective balance ismin(deposit, max-effective-balance), so the cap bounds the whale share against the realistic set sizeN.min-activation-balanceis the economic floor (mainnet target 1,000 BERA), a tokenomics/W4 knob — not the safety lever.- Staged activation ("live before weight") is the primary halt fix.
The live 4-validator halt happened because, under the old params, genesis validators had only 32 BERA effective each while one whale could cross >1/3. See from-source/VALIDATOR-PARAMS.md for the effective-balance arithmetic and the fix.
Edge nodes (verifying light client)
The rows below apply to the light node on any host. The first-boot / SD-card / arm64 rows are specific to the Raspberry Pi appliance; off-Pi the same failures surface as your service failing to start (see the non-Pi run notes).
| Symptom | Likely cause | Fix |
|---|---|---|
| Service runs but never bootstraps; no verified head | CL/EL endpoints not reachable from the light node (ephemeral ports / container-internal IPs) | Use a fixed-port, routable endpoint; verify with nc -vz <host> 26657 and nc -vz <host> 8545. |
| First boot aborts: missing CL/EL RPC, chain-id, pubkeys, or checkpoint | Required light-mode keys/files not set (fail-closed) | Set KRYPTON_CL_RPC / KRYPTON_EL_RPC / KRYPTON_CHAIN_ID / KRYPTON_CHECKPOINT_PUBKEYS and drop krypton-checkpoint.json on the boot partition. |
| Bootstrap fails / wrong sign-bytes / chain-id mismatch | KRYPTON_CHAIN_ID set to the numeric 473374 instead of the CometBFT string, or it doesn't match the checkpoint | Set it to the string from /status .result.node_info.network (e.g. krypton-473374), verbatim. |
| Checkpoint rejected at bootstrap | quorum not met / unknown or duplicate keys / wrong height–hash | Ensure ≥ K of the configured PUBKEYS actually signed the same (height, hash); re-merge with publish-checkpoint.sh. |
| Reads fail with a staleness error | head older than KRYPTON_MAX_STALENESS_SECS, or upstream is behind/eclipsed | Confirm the upstream CL is at the live head; add a second source + KRYPTON_MIN_SOURCES=2. |
| First boot aborts on RAM | board has < 4 GB | Use a Pi 4/5 with ≥ 4 GB (8 GB recommended); the RAM floor is enforced on first boot. |
exec format error / image won't run | wrong-arch artifact | The image and light binary must be arm64; rebuild via CI edge-image / edge-light-arm64. |
| SSD not detected | full-pruned preflight guard fired | This guard does not apply to light mode (which needs no SSD). Confirm NODE_MODE=light was read — check journalctl -u krypton-firstboot -b for mode=light. |
Reachability is the edge prerequisite
A live, node-reachable 473374 endpoint fleet (full nodes serving eth_getProof + CometBFT RPC at fixed addresses) must exist before an edge node has anything to verify against. See Edge node (Pi light client) and RPC / full node.
See also
- Networks & chain IDs · Ports & firewall · Hardware specs
- Validator node · Edge node
- Monitoring (Prometheus + Grafana) — the standing dashboards behind these CLI checks.