CVSankars Designs Limited



What Kirkpatrick Levels 3 and 4 Are Really For


Written by: Candice V. Sankarsingh
Senior Learning Quality, Evaluation & Instructional Technology Advisor

, , , , , , ,

Learning dashboards are comforting. They light up meetings with clean charts, upward trends, and the quiet reassurance that something measurable has happened. Completion rates rise, engagement scores stabilize, and someone eventually concludes that the training worked. Not because anyone is fully convinced, but because dashboards carry an aura of authority that makes disagreement feel impolite.

The problem is not that dashboards are inaccurate. It’s that they answer the wrong questions. They are excellent at showing activity—who logged in, who finished, how long people stayed—but they quietly sidestep the question leaders actually care about: did anything change because of this learning? That question belongs to Kirkpatrick Levels 3 and 4, and it is precisely where dashboards begin to lose their usefulness.

Level 3 is usually described as “application on the job,” a phrase so vague it allows almost anything to pass as evidence. In practice, behaviour is not intention, confidence, or satisfaction. It is observable action in a real context. A decision made differently. A step followed correctly. An error avoided. A procedure adhered to under pressure. If you cannot point to a concrete change in what people actually do, you are not measuring behaviour—you are measuring belief.

This is where dashboards struggle, because real behaviour rarely happens inside the learning platform. It happens in systems, workflows, decisions, and artifacts that sit well beyond the LMS. When organizations claim Level 3 success based solely on post-course surveys or platform analytics, what they are really measuring is compliance and sentiment, not behaviour. The learning may have been completed, but whether it was used remains un-examined.

Organizations that take Level 3 seriously do not start with analytics. They start with discipline. They decide in advance which behaviour must change for the learning to matter and where that behaviour would be visible if it did. They accept that evidence will be imperfect and that attribution will be partial. Instead of chasing elegant metrics, they look at mundane but telling signals: changes in real work outputs, structured observations by supervisors, fewer recurring errors, or shifts in documented decisions. None of this looks impressive on a dashboard, and that is precisely why it is useful.

Level 4, meanwhile, is where many organizations either stop measuring altogether or begin inventing numbers. The common mistake is treating Level 4 as a return-on-investment exercise, as though learning outcomes could be cleanly isolated and monetized. In reality, Level 4 is not about proving that learning caused a result. It is about making a defensible contribution claim.

A credible Level 4 evaluation does not assert that training increased performance by a precise percentage. It explains, instead, why it is reasonable to believe learning played a supporting role in an outcome, alongside other factors. It considers timing—did the learning precede the change? It explains the mechanism—how would this learning plausibly influence behaviour or decisions? It names boundaries—what the learning did not affect. And it acknowledges context—policies, incentives, tools, supervision—rather than pretending learning operated in isolation.

This kind of honesty makes many organizations uncomfortable. Levels 3 and 4 expose weak learning design, inflated claims, and misaligned objectives. They require cooperation beyond the learning function and introduce uncertainty where leaders often want certainty. Dashboards, by contrast, are safe. They allow organizations to claim impact without ever confronting whether impact occurred.

Seen this way, Kirkpatrick Levels 3 and 4 are not about proving learning’s value. They are about risk management. They protect organizations from scaling ineffective programs, from mistaking activity for value, and from making decisions based on metrics that look rigorous but cannot survive scrutiny. They function as a form of defensive intelligence—quietly preventing bad decisions rather than loudly celebrating questionable successes.

A simple test reveals the difference. If your evaluation data exists only inside the LMS, collapses correlation into causation, avoids observable behavior, or falls apart under a skeptical follow-up question, it is not Level 3 or Level 4—no matter what the dashboard label says. It is measurement theater.

Dashboards do not fail because they are dishonest. They fail because they are asked to do work they were never designed to do. Behavior and results do not need better charts. They need clearer thinking, tighter claims, and the courage to tolerate uncertainty.

Learning that cannot be defended should not be scaled. And evaluation that cannot tolerate skepticism is not evaluation at all—it is performance.

, , , , , , , ,

Enter your email below to receive updates.

Discover more from CVSankars Designs Limited

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from CVSankars Designs Limited

Subscribe now to keep reading and get access to the full archive.

Continue reading