Most organisations pick their first AI use cases the same way: someone senior says "we should automate that," or a vendor demo makes something look easy, or a team lead mentions a pain point and it gets escalated into a project. The selection is driven by visibility, enthusiasm, or politics. Not by a structured evaluation of what's actually ready to be built.

The result is predictable. The organisation commits to a workflow that's technically complex, politically sensitive, or dependent on judgment calls nobody mapped. The build takes longer than expected. The agent underperforms. Trust erodes before AI adoption has properly started.

The Greenlight stage in the AGENTIC Framework exists to prevent this. It takes the validated specifications from Assess, evaluates what current AI can reliably handle at the task level, designs the human-AI collaboration model, scores everything across multiple dimensions covering workflow characteristics, implementation factors, and organisational readiness, and produces a ranked decision. Green means go. The organisation commits resources. This is where analysis becomes action.


Why most AI prioritisation is just gut feel

The problem isn't that organisations don't try to prioritise. They do. They make lists, run workshops, plot things on impact-effort matrices. But the inputs are almost always subjective. "How hard is this to build?" gets answered by someone who hasn't read a specification, because no specification exists yet. "What's the business impact?" gets answered with optimistic estimates untethered from how the workflow actually runs.

The AGENTIC Framework solves this by sequencing. By the time a workflow reaches Greenlight, it has already been through Kickoff (where it was surfaced and roughly scored) and Assess (where it was mapped in detail, hidden knowledge surfaced, and a machine-readable specification produced). The scoring evidence already exists. You're not guessing at implementation difficulty because someone has already documented every step, every edge case, every judgment call. You're not guessing at organisational readiness because you've already had the conversations.

Greenlight doesn't create the data. It scores data that already exists.


How AI use case prioritisation actually works

Greenlight has two jobs. First: evaluate what current AI can reliably handle in this specific workflow and design the collaboration model. Second: score, rank, and decide. Both happen at the task level, not the workflow level.

Break the workflow into tasks

Take the specification from Assess and decompose it into individual tasks and decisions. Most workflows aren't one thing. They're a sequence of steps: pulling data, applying rules, formatting output, routing for approval. Each step gets evaluated independently.

This matters because a workflow that looks "partially automatable" at the surface often has steps that are completely automatable, steps that need human oversight, and steps that should stay entirely human. You can't see that until you break it apart.

For each task, test three things. Repeatability: does it run the same way every time? Ambiguity: how much judgment is involved? Consequence of error: what happens if the agent gets it wrong? A task that's highly repeatable, low ambiguity, and low consequence is a strong candidate for full automation. A task with high consequence of error might still be automated, but with human verification as a checkpoint.

Design the collaboration model

This is where Greenlight does something most prioritisation frameworks skip entirely. Instead of labelling a workflow as "automatable" or "not automatable," it designs the collaboration model: a step-by-step map of where agents lead, where humans lead, where they work together, and where agents don't belong.

The AGENT Collaboration Spectrum has four levels. AI-run: the agent completes the task with no human involvement. AI-led, human verified: the agent does the work, a human checks and approves. Human-led, AI-assisted: the human leads, the agent supports. Human-run: no agent involvement.

Automation isn't binary. Every workflow is a mix of steps where agents lead, steps where humans lead, and steps where agents don't belong. Greenlight designs that mix.

Most workflows use all four levels across different steps. In a typical workflow, data pulling is AI-run (highly repeatable, low consequence). Matching standard entries against reference data is AI-led with human verification (mostly repeatable, but edge cases warrant oversight). Rule-based approvals are flagged by governance: even though they're technically simple, the consequence of error on high-value transactions warrants human verification. The final submission stays human-run until trust is established.

The collaboration model is the engineering blueprint. It tells Engineer exactly which steps to build and at what autonomy level. And those levels aren't fixed. As trust is earned through parallel-run testing, tasks can move up the spectrum. The collaboration model also names people, not just roles. For every AI-led step, who verifies the output. For every escalation trigger, who gets notified. For every external output gate, who owns the gate. At five workflows this might be the same person wearing different hats. At twenty, it shouldn't be. This specificity prevents the ambiguity that causes oversight to fail under load.

Score across the prioritisation dimensions

Every workflow gets scored across three groupings of dimensions. Workflow characteristics: frequency, time cost, leverage. Implementation factors: difficulty, risk, cost-benefit. Organisational readiness: team readiness, owner buy-in, staff dependency. Each dimension scores 1 to 5, producing a total out of 45.

The scoring dimensions are weighted to reflect organisational priorities. A conservation non-profit might weight leverage differently to a venture studio. A workflow where the sole person who knows the process is about to retire scores high on staff dependency regardless of everything else.

The traffic-light system makes the decision visible. Green (36 or above): ready to build. Yellow (26 to 35): evaluate further, needs a deeper Assess pass or clearer collaboration model. Red (below 26): not yet, either technically premature or low priority. There are also two non-scoring statuses: human-run (this workflow stays with people) and retired (no longer in scope).

The best workflows to start with are the boring ones. High frequency, low risk, well-documented. They compound fastest and carry the least downside if the agent underperforms early on.

The scoring isn't done from scratch. The Greenlighting Agent reads the specification from Assess and produces a first-pass score with evidence cited from the documentation and confidence flags where data is thin. The human reviews, adjusts, and confirms. The matrix gets populated from the spec, not from gut feel.


What the Greenlight stage produces

Here's the full flow from specification to commitment:

Specification from Assess Task decomposition Capability evaluation Collaboration model Prioritisation scoring Traffic-light status Leadership commit

A workflow exits Greenlight with a scored entry in the AGENT Prioritisation Matrix, a defined collaboration model showing exactly which steps are AI-run, AI-led, human-led, or human-run, and organisational commitment to build. That's what Engineer builds from. That's what Nurture monitors against. That's what Track resurfaces when the frontier moves.

The matrix itself is a living dashboard. It's designed to be continuously rescored as capability evolves and adoption data flows in. What's red today might be green in six months. What's green today might need its collaboration model adjusted as Nurture reveals edge cases the specification didn't catch. The matrix is the connective tissue of the entire AGENTIC Framework.

A workflow doesn't get greenlighted because the technology is ready. It gets greenlighted because the technology, the people, and the business case are all ready.


What I've learned applying this

Two patterns show up consistently.

First: the collaboration model changes more minds than the score. When people see their workflow broken into steps with clear autonomy levels for each one, the conversation shifts. It's no longer "are we automating my job?" It's "these four steps would be handled by the agent, these two stay with me, and I verify the agent's work on these three." That specificity resolves most of the anxiety that blocks AI adoption.

Second: the workflows that score highest are rarely the ones people expected. The flashy, high-visibility project that leadership wanted usually scores moderate because it has high implementation difficulty or governance complexity. The boring, repetitive workflow that nobody talks about scores high because it's frequent, well-documented, low-risk, and the person who does it is desperate for it to change. Start with boring. Boring compounds.

The hardest part of Greenlight is the commitment step. Scoring is analytical. Commitment is political. Someone has to say "we're building this next" and mean it, with resources, timeline, and accountability. The traffic-light system helps because it makes the recommendation defensible. A green-lit workflow has a score, a collaboration model, and a business case. It's not a hunch. It's a decision backed by evidence from every upstream stage.