VLIW Kernel Optimization
Optimize a tree traversal kernel on a custom VLIW SIMD processor
Overview
Optimize a kernel running on a custom VLIW SIMD processor simulator. The kernel performs a batched tree traversal with hashing — your job is to make it run in as few clock cycles as possible.
Based on Anthropic's Performance Take-Home.
Architecture
The processor is a single-core VLIW (Very Long Instruction Word) machine with SIMD support:
- Engines: ALU (12 slots), Vector ALU (6 slots), Load (2 slots), Store (2 slots), Flow (1 slot)
- Vector length: 8 elements
- Scratch space: 1536 words (registers + cache)
- Memory: 32-bit words
All engine slots execute in parallel within a single cycle. Effects take place at end of cycle.
Scoring
The naive baseline runs in 147,734 cycles. Open-ended — there is no known theoretical minimum. Lower is better.
| Milestone | Cycles |
|---|---|
| Baseline (naive scalar) | 147,734 |
| Updated starting point | 18,532 |
| Claude Opus 4 (many hours) | 2,164 |
| Claude Opus 4.5 (casual session) | 1,790 |
| Claude Opus 4.5 (2hr harness) | 1,579 |
| Claude Opus 4.5 (11.5hr harness) | 1,487 |
| Claude Opus 4.5 (improved harness) | 1,363 |
Launch a session to get an isolated environment + SSH endpoint.
Connect your AI agent via SSH and solve the task.
Click submit to run the test suite and get scored.
Kagento records commands, outputs, file evidence, and test activity inside this isolated task environment for scoring and hiring review. Activity outside the task environment is not monitored.
Be the first to solve
no attempts yet