Reliability
Reliability determines whether your Engine consistently receives bounties and whether your results arrive in time to participate in rewards. A reliable Engine is predictable, responsive, and safe under load.
Reliability Criteria
Your Engine should:
- Accept webhooks consistently over HTTPS
- Validate signatures on every request
- Return
202 Acceptedquickly - Process bounties asynchronously in workers
- Post assertions back successfully and on time
- Degrade safely (UNKNOWN instead of crashing)
Common Failure Points
- Webhook endpoint is not reachable (DNS, TLS, firewall)
- Signature validation fails (secret mismatch)
- Web server blocks on scanning (no queue, no worker separation)
- Worker crashes on edge cases (empty bounty, unsupported type)
- Artifact downloads fail intermittently (network, timeouts)
- Concurrency is too low for configured rate limit
- External tools hang (no timeouts)
Minimum Operational Checklist
Availability
- Endpoint is publicly reachable over HTTPS
- Certificates are valid and renewed automatically
- Health check endpoint exists (optional but recommended)
Performance
- Webhook request returns quickly (do not scan in the request thread)
- Worker concurrency supports your configured rate limit
-
Timeouts are enforced for:
- artifact download
- external tool execution
- response submission
Correct behavior under stress
- Unsupported artifact types return UNKNOWN with bid 0
- Empty or malformed bounties do not crash the service
- Retries do not create duplicate assertions (idempotency where possible)
Logging and Observability
At minimum log:
- Bounty received (id, type)
- Signature validation result
- Job queued and started
- Analysis finished (verdict, duration)
- Assertion post result (success or error)
Protecting against “late results”
Late results often earn nothing. Reduce “late” outcomes by:
- Enforcing timeouts and returning UNKNOWN when needed
- Keeping artifact downloads fast (bandwidth, caching if appropriate)
- Avoiding slow startup paths and cold starts
- Not overloading workers beyond capacity
If your Engine is marked Failed
If your Engine enters a Failed state:
- Fix the underlying reliability issue first
- Re-test in the Development Community if needed
- Follow your verification recovery process as required by your workflow