Practice: UX for language user interfaces

Self-check

Seven short questions. Answer each before opening the collapsible.

1. Why is streaming the single biggest UX win in an LLM application?

Show answer

Lesson 2’s latency decomposition (total ≈ TTFT + output_tokens/tokens_per_second) is also a UX problem. A user staring at a blank screen for five seconds experiences a slow app; the same five seconds with tokens streaming from the first second feels fast. The model generates tokens one at a time anyway, and the implementation is small (stream=True or a streaming context manager in most clients). Skipping streaming is a baseline UX failure.

2. What three things does streaming require in the UI beyond just rendering tokens?

Show answer

(1) A thinking indicator before TTFT so the gap before tokens start is not perceived as a hang. (2) Token-by-token rendering, appending each delta to the visible text. (3) A Stop button so users who have seen enough can halt generation, saving cost and tokens. Streaming the tokens alone is half the work; the full pattern includes the wrapper UI.

3. What three benefits do citations provide?

Show answer

(1) Let users verify (a sourced claim becomes evidence to check, not just a claim to trust). (2) Make failure debuggable (a wrong answer with a citation often shows the retrieval was wrong, which is easier to triage). (3) Calibrate trust over time (users learn this app shows sources, which sharpens how they read the rare unsourced cases).

4. What does regeneration treat as a feature, and what is its branching variant?

Show answer

The model’s non-determinism: same prompt + same parameters can produce different responses, so a Regenerate button takes that as useful rather than a bug. Branching lets the user edit a previous message and re-run from that point, producing a parallel branch of the conversation; better chat applications use it to handle “I asked the wrong question first” without losing the prior context.

5. Why is hedging the right discipline, even though it is uncomfortable to design?

Show answer

A model that asserts wrong information confidently is far more damaging than one that says “I’m not sure, but…” Hedging asks the model to acknowledge uncertainty in language (especially when retrieval is thin or the question is outside scope) and renders that uncertainty visibly in the UI. It is uncomfortable for designers who prefer crisp answers, and it is exactly the discipline that builds long-term trust because the user learns that when the application claims certainty, it actually has reason to.

6. What does “recoverable failure” mean, and what four moves does it imply?

Show answer

A failure mode that the user can understand and act on, not just a red banner saying “Error 500.” Four moves: (1) legible failure messages (timeout vs no-relevant-context vs out-of-scope-refusal each tell different stories); (2) a recovery action (retry, rephrase, ask differently); (3) preserve the user’s input so they can edit and retry; (4) log everything for debugging (lesson 5/7’s discipline).

7. What does this lesson deliberately NOT cover, and why?

Show answer

Content policy (what the model should refuse), moderation, labeling AI-generated content, and similar policy questions. They are real, but they sit at the intersection of UX, product, legal, and platform decisions, and require their own framing in their own forum. This lesson is the interaction-design patterns that make any LLM application more usable; staying in scope keeps the discipline of each lesson clear.

Try it yourself: critique this UX

About 10 minutes, no code. Apply the patterns to a real-feeling app.

Part A: catalog the UX issues. A team ships an LLM-powered help assistant with this behavior:

A user types a question and clicks Send. The screen shows a static
spinner for ~6 seconds. Then the full answer appears all at once.
Answers do not include sources. If the user does not like the answer,
their only option is to clear the input and ask again. When the API
times out, the screen shows "Error" in red and the user's question
disappears from the input box. There is no conversation history; closing
the tab loses everything.

List at least five UX issues from this lesson and the pattern that fixes each.

What you’ll get

Issues + fixes:

No streaming. 6 seconds of spinner is the longest UX moment in an LLM app. Fix: stream tokens; the user sees text in ~1 second instead of 6.
No citations. Even if the app is RAG-based, the user has no way to verify. Fix: ask the model to cite sources; render them in the UI.
No regeneration. A user who got a wrong-but-close answer has to retype. Fix: a Regenerate button; ideally also branching from edited prior messages.
No hedging. Confident wrong answers cost trust. Fix: prompt the model to hedge when context is thin; render uncertainty visibly.
Unrecoverable failure. “Error” + lost input is the worst pattern. Fix: legible failure message + a retry action + preserve the input.
No conversation persistence. Closing the tab loses everything. Fix: store conversations (with auth/storage where appropriate) and let users return to them.
No Stop button. Once the answer starts, the user can’t halt it. Fix: render Stop while streaming.
No “thinking” indicator. Static spinner before any output is less informative than a streaming pulse / status line. Fix: show a “thinking” pattern between submit and first token.

Six is a passing answer; the more you catch, the more the LUI eye has formed.

Part B (reasoning). Why is hedging unusual among UX moves in that it can lower a user’s apparent confidence in the app but raise their long-term trust?

What you should notice

Confidence about wrong answers actively destroys trust over time: users who get a confident wrong answer and discover it become skeptical of every confident answer after that. Hedging trades some perceived authority on individual answers for calibration, the user learns that this app’s confident assertions are usually right and its hedged assertions are areas to verify. The aggregate signal becomes more useful, which is a better trust trade than maximizing single-answer confidence. This is the same calibration-over-confidence trade you saw in evaluation discipline elsewhere in the fleet.

Part C (reasoning). Why is “recoverable failure” specifically a LUI pattern rather than a generic UX pattern?

What you should notice

Failures in LUIs are multi-modal in a way conventional failures are not: the API can time out, the retrieval can miss, the model can refuse (scope-honest from lesson 5), the model can produce malformed output, the rate limit can hit. Each of those is a different failure with a different recovery, and each is far more common than in a button-and-form app where most failures are network errors. The generic “show an error banner” pattern hides that multi-modal nature; recoverable failure surfaces it (“the model timed out / nothing in our docs matched / I can answer in-scope questions”) so the user knows what to do next.

Flashcards

Nine cards. Click any card to reveal the answer. Use the Print flashcards button to lay the set out one card per page for offline review.

Q. Why is streaming the biggest UX win?

5 seconds of blank waiting feels slow; the same 5 seconds with tokens streaming from second 1 feels fast. The model generates one token at a time anyway; turning that on in the UI is small code + huge perceived improvement.

Q. What does streaming need beyond rendering tokens?

A thinking indicator before TTFT (so the pre-token gap isn’t a perceived hang), append-as-they-arrive rendering, and a Stop button (user halts; saves cost). All three together = the full streaming pattern.

Q. Three benefits of citations?

Users can verify (claim -> evidence), failures become debuggable (wrong answer often = wrong retrieval, traceable via citation), and trust calibrates over time (the app shows sources, so unsourced answers stand out).

Q. What does regeneration treat as a feature, and what's its branching variant?

Model non-determinism (same prompt + params -> different responses), turned into a Regenerate button. Branching: let the user edit a prior message and re-run from there, producing a parallel conversation branch. Handles “asked the wrong question first” cleanly.

Q. Why is hedging the right discipline, even when uncomfortable?

Confident wrong destroys long-term trust; hedged uncertainty preserves it. Trades some single-answer authority for calibration: users learn confident claims are usually right and hedged ones need checking, which is a more useful aggregate signal.

Q. What is recoverable failure, and the four moves it requires?

A failure mode the user can understand and act on. (1) Legible failure message (timeout vs no-context vs out-of-scope vs refusal). (2) Recovery action (retry, rephrase). (3) Preserve user input. (4) Log everything for debugging.

Q. Supporting UX details beyond the five core patterns?

Strong input affordances + example questions; markdown rendering; code blocks with copy; conversation persistence; latency-masking status lines (“Searching documents…”); skeleton UI while retrieval runs; staged output.

Q. What does this lesson NOT cover?

Content policy (what should the model refuse?), moderation, labeling AI-generated content, similar policy questions. Real but out of scope here; this lesson is interaction-design patterns for usability, not policy.

Q. Why is recoverable failure specifically a LUI pattern?

LUI failures are multi-modal (timeout vs retrieval miss vs refusal vs malformed output vs rate limit), each needing a different recovery. Generic “show an error banner” hides that; recoverable failure surfaces the specific failure so the user knows what to do.