Back to homepage
Sample evidence packet

Senior frontend screen, ready for hiring review.

This is the artifact a hiring team inspects after an async CodeArena screen: executable results, AI-era judgment, replay context, integrity signals, candidate-safe shareback, and a human next step.

What reviewers inspect

Evidence stays attached to the decision.

CodeArena does not ask reviewers to trust a single score. Each signal links back to the work, the failure mode, and the reviewer action.

Deterministic results

Executable test evidence

The candidate solved the core task and exposed the failing edge cases that should drive follow-up.

Merge sorted streams
8 / 8 tests passed
Async result cache
4 / 6 tests passed
Failed cases
duplicate key eviction, empty fallback response

Planted flaw review

AI Critique signal

The candidate caught three of four planted issues in generated code and explained the risk clearly.

Caught
race condition, stale dependency, missing null guard
Missed
one pagination boundary bug
Reviewer note
Strong critique quality, but live probe should revisit pagination reasoning

AI-era workflow

Prompting and verification

Prompts asked for constraints, tests, and failure modes instead of asking the model to write code blindly.

Prompt quality
91 / 100
Verification behavior
Added local tests before final answer
Risk
Occasionally accepted model wording before simplifying it

Reviewable events

Integrity context

The report shows session context without turning surveillance into an automatic hiring decision.

Tab visibility
No focus-loss events
Paste events
2 short snippets into comments
Scoring policy
No face, voice, emotion, or personality scoring
Candidate shareback

Useful feedback without exposing internal evidence.

After human review, teams can share a private candidate-safe results link while keeping sensitive rubric evidence inside the hiring workspace.

Private results link

Ready after reviewer approval

Candidate-safe

Avery gets a concise summary of what went well, what to improve, and how to practice the same skill. The hiring team keeps internal review context private.

Shared with candidate

  • Role summary and final status
  • Strengths demonstrated in the screen
  • Skills to improve next
  • Practice path for similar AI-era signal

Kept internal

  • Reviewer notes and panel reasoning
  • Integrity event context
  • Detailed rubric calibration
  • Hiring decision history
Replay summary

A reviewer can replay the work path.

00:00

Started frontend screen

Read instructions and opened starter tests.

06:40

First implementation passed base cases

Merged arrays correctly for sorted positive inputs.

18:12

AI Critique task submitted

Flagged race condition and missing null guard.

31:05

Regression run failed two edge cases

Duplicate key eviction and empty fallback response remained.

38:44

Final answer submitted

Added time complexity and named the unresolved risk.

Live validation

The report names the next probe.

Recommended next step

Advance the candidate to a focused live validation session, not a generic retread of the async screen.

  • Ask the candidate to debug the duplicate-key eviction failure without AI assistance.
  • Probe how they decide when model-generated pagination code is trustworthy.
  • Have them write one failing test for the empty fallback response before editing code.

Decision owner

Human reviewer

Score use

Evidence summary

Risk flag

Boundary case

Hiring packet

Want this report on your own role?

Bring one open role to a demo and we will show the async screen, evidence packet, candidate-safe shareback, and live validation path your team would use.