Skip to content

Lesson 02 · Critically Reviewing AI Output

Beyond the 1Z0-830 exam

The skill that separates productive AI use from dangerous AI use is review. Generated code is confident, idiomatic, and frequently subtly wrong. This lesson catalogs the failure modes — and the lab demonstrates two of them with passing-then-failing tests.

Objectives

After this lesson you will be able to:

  • Spot hallucinated APIs and wrong idioms.
  • Find subtle correctness bugs that pass the happy path.
  • Apply a concrete review checklist to a generated diff.
  • Use the earlier modules as your reviewing toolkit.

The failure modes

FailureWhat it looks likeHow you catch it
Hallucinated APIlist.stream().toImmutableList() (no such method)compile; check the Javadoc
Wrong idiommutating a list during a for-each; == on StringModules 01, 05
Subtle correctness buginteger division, off-by-one, overflowedge-case tests
Missing edge casesno empty/null/boundary handlingyour oracle (Lesson 03)
Concurrency bugunsynchronized shared state, non-atomic check-then-actModule 07; stress tests
Outdated/insecureold library version, string-concatenated SQLModules 13, 19
Plausible-but-wrong explanationa confident comment that misdescribes the coderead the code, not the comment

Two real bugs (from the lab)

Both look correct and pass an obvious test — then fail an edge case:

java
// "Average these numbers" — looks fine, average([2,4]) == 3 ✓
static int averageNaive(int[] numbers) {
    int sum = 0;
    for (int n : numbers) sum += n;
    return sum / numbers.length;        // ① integer division  ② empty → ArithmeticException
}

averageNaive(new int[]{1, 2}) returns 1, not 1.5 (integer division). averageNaive(new int[]{})throws (divide by zero). The reviewed version returns OptionalDouble and divides as double.

java
// Binary-search midpoint — passes every small test
static int midpointNaive(int low, int high) {
    return (low + high) / 2;            // overflows when low + high > Integer.MAX_VALUE
}

Near Integer.MAX_VALUE the sum overflows to a negative number — the infamous JDK binary-search bug. The reviewed version, low + (high - low) / 2, can't overflow. The lab's tests assert exactly these edge cases, turning "looks right" into "proven right or wrong."

A review checklist

Run a generated diff through this before trusting it:

  1. Does every API exist? Compile, and verify unfamiliar calls against the docs.
  2. Edge cases — empty, null, zero, negative, max/min, single element, duplicates.
  3. Numeric correctness — integer vs floating division, overflow, rounding (Module 01).
  4. Resource & error handling — closing resources, swallowed exceptions (Modules 04, 08).
  5. Concurrency — shared mutable state, atomicity (Module 07).
  6. Security — input validation, parameterized SQL, no secrets in code (Module 19).
  7. Does the test actually test it? — or does it assert the bug? (Lesson 03)

Trap — the comment can lie

LLMs generate code and an explanatory comment, and the comment can confidently describe behavior the code doesn't have ("handles the empty case") when it doesn't. Review the code, not its description. A comment is a claim to verify, not evidence.

SDET note

Your reviewing power is the sum of the earlier modules. You catch the integer-division bug because of Module 01, the overflow because of JVM/numeric awareness (Modules 00, 12), the ==-on-String idiom because of Module 01, the race because of Module 07. You can't review what you don't understand — which is the case for studying all of it.

Key Takeaways

  • AI output fails in predictable ways: hallucinated APIs, wrong idioms, subtle correctness bugs, missing edge cases, concurrency/security flaws.
  • The dangerous bugs pass the happy path — only edge-case tests expose them (integer division, overflow).
  • Apply a review checklist: APIs exist, edge cases, numeric correctness, resources/errors, concurrency, security, real tests.
  • Read the code, not the comment — the explanation can be a confident lie.
  • Reviewing well requires the Java fluency the rest of the course builds.

Lesson Quiz

Lesson Quiz · Critically Reviewing AI Output0 / 5
  1. A 'hallucinated API' from an LLM is…

    • AA slow method
    • BA method/class that doesn't exist but is generated as if it does
    • CA deprecated method
    • DAn async call
  2. averageNaive([1,2]) returning 1 instead of 1.5 is caused by…

    • AA typo
    • BInteger division truncating the result
    • COverflow
    • DA null pointer
  3. Why does (low + high) / 2 fail near Integer.MAX_VALUE?

    • ADivision by zero
    • Blow + high overflows to a negative int before the divide
    • CIt's too slow
    • Dhigh is null
  4. When reviewing AI output, the explanatory comment should be treated as…

    • AProof of correctness
    • BA claim to verify against the actual code
    • CMore reliable than the code
    • DIrrelevant and deleted unread
  5. The dangerous AI bugs are typically the ones that…

    • ADon't compile
    • BPass the happy-path test but fail on edge cases (empty, overflow, null)
    • CThrow immediately on any input
    • DAre flagged by the IDE

Next: Prompt Patterns & Test Oracles. This module's lab is in labs/src/main/java/com/jse21/m21_ai/.