Lesson 03 · Prompt Patterns & Test Oracles

Beyond the 1Z0-830 exam

Two skills make AI-assisted testing trustworthy: writing a prompt that specifies intent precisely, and supplying the test oracle — the source of truth for what "correct" means. The model can generate a thousand test cases; deciding the right answer is the human part.

Objectives

After this lesson you will be able to:

Write prompts that specify intent, constraints, and edge cases.
Define a test oracle and list common kinds.
Derive oracles independently of the implementation.
Drive the AI to enumerate edge cases you'd otherwise miss.

Anatomy of a good prompt

Vague prompts get plausible-but-unconstrained code. Specify:

Intent — what it must do, in one sentence.
Signature — types in and out (OptionalDouble average(int[])).
Constraints — performance, immutability, no external calls, Java 21.
Edge cases — "handle empty input, overflow, nulls; average of [1,2] is 1.5."
Examples — a couple of input → expected pairs (these double as oracles).

text

Write a Java 21 method `OptionalDouble average(int[] numbers)`.
- Return empty for an empty array (do not throw).
- Use double division: average of [1, 2] is 1.5, not 1.
- Sum into a long to avoid overflow.
Then write JUnit 5 + AssertJ tests covering empty, single, and the [1,2] case.

Notice this prompt would have prevented both lab bugs — it pins the empty case and double division up front.

What a test oracle is

A test oracle is the mechanism that decides whether an output is correct for an input. Asserting assertThat(average(new int[]{1,2})).hasValue(1.5) works only because you know the answer is 1.5. The oracle — not the assertion syntax — is the hard part, and the part you must not outsource blindly to the same AI that wrote the code.

Kinds of oracle

Oracle	Idea	Example
Known value	you know the exact answer	`average([1,2]) == 1.5`
Inverse/round-trip	`decode(encode(x)) == x`	serialize→deserialize (Module 08)
Reference implementation	compare to a trusted version	new fast sort vs `Arrays.sort`
Property/invariant	a rule that always holds	`min <= midpoint(min,max) <= max`
Metamorphic	relation between related inputs	`average(xs) == average(reverse(xs))`

When you don't have an exact expected value, properties and metamorphic relations still pin behavior — and they're excellent for the edge cases AI misses.

Property oracles catch the overflow bug

You may not know the exact midpoint of two huge ints offhand, but you know the invariant: low <= midpoint(low, high) <= high. midpointNaive violates it (returns a negative). A property oracle catches the bug without your having to compute the "right" number — exactly what the lab asserts with isBetween(low, high).

Driving AI to find edge cases

Use the model's enumeration strength, then keep the oracle yourself:

text

List the edge cases a test for `average(int[])` should cover.

It will suggest empty, single element, negatives, very large values (overflow), duplicates. You decide the expected result for each. This division of labor — AI enumerates, human adjudicates — is the heart of AI-assisted testing.

Gotcha — don't let code and oracle share a brain

If the same AI writes the method and its tests in one breath, both can encode the same wrong assumption (e.g. that integer division is fine), so the tests pass and the bug ships. Derive the oracle independently — from the spec, a reference implementation, or an invariant — not from the implementation under test.

Key Takeaways

A good prompt specifies intent, signature, constraints, edge cases, and examples — pinning the cases that hide bugs.
A test oracle decides correct vs incorrect; it's the human core of testing AI output.
Use known values, round-trips, reference implementations, properties, and metamorphic relations as oracles.
Property/invariant oracles catch edge-case bugs (overflow) without an exact expected value.
Let AI enumerate edge cases, but derive the oracle independently of the code under test.

Lesson Quiz

Lesson Quiz · Prompt Patterns & Test Oracles0 / 5

A test oracle is…
- AA database
- BThe mechanism that decides whether an output is correct for a given input
- CAn AI model that writes tests
- DA type of assertion library
Which prompt is most likely to yield correct code?
- A'Write an average method'
- B'Write OptionalDouble average(int[]); return empty for [], use double division ([1,2]→1.5), sum into a long'
- C'Make it fast'
- D'Average, you know what I mean'
A property/invariant oracle for midpoint(low, high) is…
- AIt returns 0
- Blow <= result <= high
- CIt runs in O(1)
- DIt never returns even numbers
Why not let the same AI write both the method and its only tests, unreviewed?
- AIt's slower
- BBoth can encode the same wrong assumption, so the tests pass and the bug ships
- CTests must be in a different language
- DIt uses more tokens
The best division of labor in AI-assisted testing is…
- AAI decides correctness; human writes prompts
- BAI enumerates edge cases; the human adjudicates the expected results (the oracle)
- CHuman does everything
- DAI does everything

Next: Guardrails — When Not to Trust AI. This module's lab is in labs/src/main/java/com/jse21/m21_ai/.

Lesson 03 · Prompt Patterns & Test Oracles ​

Anatomy of a good prompt ​

What a test oracle is ​

Kinds of oracle ​

Driving AI to find edge cases ​

Key Takeaways ​

Lesson Quiz ​