Skip to content

Troubleshooting

Symptom → likely cause → fix, for full nodes (validator / RPC / seed) and edge (verifying light client) nodes. For the canonical ids, ports, and hardware referenced below, see Networks & chain IDs, Ports & firewall, and Hardware specs.

Full nodes (validator / RPC / seed)

SymptomLikely causeFix
No peers (CL n_peers: 0, EL not finding peers)Wrong EXT_IP; firewall not opening 30303/26656; bad EL_BOOTNODES/CL_SEEDSConfirm EXT_IP is the reachable public IP; verify the SG/nftables opens 30303 tcp+udp and 26656 tcp from 0.0.0.0/0; re-check the bootnode/seed strings from the bundle. See Ports & firewall.
EL/CL chain-id mismatch (CL crashes at engine startup)Wrong genesis bundlebootstrap.sh aborts if eth-genesis.json chainId ≠ 473374; re-fetch the correct testnet bundle (jq .config.chainId); the CL image must be the -473374 tag. See Networks.
Stuck sync (catching_up: true forever)Few/no peers; slow disk; EL behindCheck peers (above); confirm DATA_DIR is NVMe, not slow EBS; check EL logs for MDBX/IO errors.
JWT mismatch (engine auth 401s in EL/CL logs)EL and CL see different jwt.hexBoth must mount the same JWT_PATH; if regenerated, restart both; ensure 0600 / correct owner.
EL container restarts / OOMEL_MEM_LIMIT too low for roleRaise per the HW matrix; ensure NVMe; on AWS use gp3/io2.
Chain halt (no node finalizing)>1/3 of voting power offline — e.g. an over-weighted validator went downThis is the offline-whale halt. Bring the offline validators back; the chain self-heals once ≥2/3 power is online. See chain-halt note below.

Chain halt — recovery and prevention

The chain has no self-heal while halted

The inactivity leak needs the chain to progress, which a halted chain can't — recovery is therefore operational: restore enough online voting power (≥2/3).

Prevent it by bounding any one validator's weight and requiring it be live before it carries weight — the W3 validator-parameter policy:

  • max-effective-balance is the safety lever — the ceiling on any one validator's voting weight (mainnet target 16,000 BERA). A single validator's effective balance is min(deposit, max-effective-balance), so the cap bounds the whale share against the realistic set size N.
  • min-activation-balance is the economic floor (mainnet target 1,000 BERA), a tokenomics/W4 knob — not the safety lever.
  • Staged activation ("live before weight") is the primary halt fix.

The live 4-validator halt happened because, under the old params, genesis validators had only 32 BERA effective each while one whale could cross >1/3. See from-source/VALIDATOR-PARAMS.md for the effective-balance arithmetic and the fix.

Edge nodes (verifying light client)

The rows below apply to the light node on any host. The first-boot / SD-card / arm64 rows are specific to the Raspberry Pi appliance; off-Pi the same failures surface as your service failing to start (see the non-Pi run notes).

SymptomLikely causeFix
Service runs but never bootstraps; no verified headCL/EL endpoints not reachable from the light node (ephemeral ports / container-internal IPs)Use a fixed-port, routable endpoint; verify with nc -vz <host> 26657 and nc -vz <host> 8545.
First boot aborts: missing CL/EL RPC, chain-id, pubkeys, or checkpointRequired light-mode keys/files not set (fail-closed)Set KRYPTON_CL_RPC / KRYPTON_EL_RPC / KRYPTON_CHAIN_ID / KRYPTON_CHECKPOINT_PUBKEYS and drop krypton-checkpoint.json on the boot partition.
Bootstrap fails / wrong sign-bytes / chain-id mismatchKRYPTON_CHAIN_ID set to the numeric 473374 instead of the CometBFT string, or it doesn't match the checkpointSet it to the string from /status .result.node_info.network (e.g. krypton-473374), verbatim.
Checkpoint rejected at bootstrapquorum not met / unknown or duplicate keys / wrong height–hashEnsure ≥ K of the configured PUBKEYS actually signed the same (height, hash); re-merge with publish-checkpoint.sh.
Reads fail with a staleness errorhead older than KRYPTON_MAX_STALENESS_SECS, or upstream is behind/eclipsedConfirm the upstream CL is at the live head; add a second source + KRYPTON_MIN_SOURCES=2.
First boot aborts on RAMboard has < 4 GBUse a Pi 4/5 with ≥ 4 GB (8 GB recommended); the RAM floor is enforced on first boot.
exec format error / image won't runwrong-arch artifactThe image and light binary must be arm64; rebuild via CI edge-image / edge-light-arm64.
SSD not detectedfull-pruned preflight guard firedThis guard does not apply to light mode (which needs no SSD). Confirm NODE_MODE=light was read — check journalctl -u krypton-firstboot -b for mode=light.

Reachability is the edge prerequisite

A live, node-reachable 473374 endpoint fleet (full nodes serving eth_getProof + CometBFT RPC at fixed addresses) must exist before an edge node has anything to verify against. See Edge node (Pi light client) and RPC / full node.

See also

Operator docs. Testnet chain-id 473374; mainnet 47337 (gated on external audit). Not financial advice.