feat: display inference statistics in chat UI #43

Closed
opened 2026-03-12 22:33:05 +01:00 by shahondin1624 · 2 comments

Summary

Add a UI component to display LLM inference statistics (token counts, context window utilization, throughput) in the chat interface.

Current State

  • The UI has no stats display component.
  • Once the orchestrator forwards InferenceStats in its streaming response (see dependency below), the UI needs to consume and display these metrics.

Tasks

1. Regenerate TypeScript proto types

Regenerate the TypeScript protobuf types to include the new InferenceStats message and the updated ProcessRequestResponse with the optional stats field.

2. Create InferenceStatsPanel.svelte component

Build a new Svelte component that displays:

  • Prompt tokens — number of tokens in the prompt
  • Completion tokens — number of tokens generated
  • Total tokens — sum of prompt + completion tokens
  • Context window size — model's maximum context length
  • Context utilization — visual progress bar showing total_tokens / context_window_size percentage
  • Tokens/sec — generation throughput

3. Integrate into chat page

  • Add the InferenceStatsPanel as a collapsible panel below each assistant message (or at the bottom of the chat stream).
  • Panel should be collapsed by default to keep the UI clean.
  • Extract the InferenceStats from the final streamed ProcessRequestResponse and pass it to the component.

Dependencies

  • Depends on: feat: forward inference stats through orchestrator streaming response
## Summary Add a UI component to display LLM inference statistics (token counts, context window utilization, throughput) in the chat interface. ## Current State - The UI has no stats display component. - Once the orchestrator forwards `InferenceStats` in its streaming response (see dependency below), the UI needs to consume and display these metrics. ## Tasks ### 1. Regenerate TypeScript proto types Regenerate the TypeScript protobuf types to include the new `InferenceStats` message and the updated `ProcessRequestResponse` with the optional stats field. ### 2. Create `InferenceStatsPanel.svelte` component Build a new Svelte component that displays: - **Prompt tokens** — number of tokens in the prompt - **Completion tokens** — number of tokens generated - **Total tokens** — sum of prompt + completion tokens - **Context window size** — model's maximum context length - **Context utilization** — visual progress bar showing `total_tokens / context_window_size` percentage - **Tokens/sec** — generation throughput ### 3. Integrate into chat page - Add the `InferenceStatsPanel` as a **collapsible panel** below each assistant message (or at the bottom of the chat stream). - Panel should be collapsed by default to keep the UI clean. - Extract the `InferenceStats` from the final streamed `ProcessRequestResponse` and pass it to the component. ## Dependencies - Depends on: **feat: forward inference stats through orchestrator streaming response**
Author
Owner
Depends on llm-multiverse/llm-multiverse#236
Author
Owner

Implementation Complete

PR #45 implements the inference statistics display:

  • Proto: Added InferenceStats message to common.proto with fields: prompt_tokens, completion_tokens, total_tokens, context_window_size, tokens_per_second. Added optional inference_stats field to ProcessRequestResponse.
  • Component: New InferenceStatsPanel.svelte — collapsible <details> panel (collapsed by default) showing token counts in a grid layout, throughput, and a color-coded context utilization progress bar (blue <70%, amber 70-90%, red >90%).
  • Integration: Panel renders below FinalResult in the chat page after orchestration completes. Stats extracted from the streaming response in useOrchestration.
  • Quality: Full dark mode support, ARIA accessibility on progress bar, 0 typecheck/lint errors.

Note: Proto source changes are local only — the upstream submodule will be updated when llm-multiverse/llm-multiverse#236 is merged.

## Implementation Complete PR #45 implements the inference statistics display: - **Proto**: Added `InferenceStats` message to `common.proto` with fields: `prompt_tokens`, `completion_tokens`, `total_tokens`, `context_window_size`, `tokens_per_second`. Added optional `inference_stats` field to `ProcessRequestResponse`. - **Component**: New `InferenceStatsPanel.svelte` — collapsible `<details>` panel (collapsed by default) showing token counts in a grid layout, throughput, and a color-coded context utilization progress bar (blue <70%, amber 70-90%, red >90%). - **Integration**: Panel renders below `FinalResult` in the chat page after orchestration completes. Stats extracted from the streaming response in `useOrchestration`. - **Quality**: Full dark mode support, ARIA accessibility on progress bar, 0 typecheck/lint errors. Note: Proto source changes are local only — the upstream submodule will be updated when llm-multiverse/llm-multiverse#236 is merged.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: llm-multiverse/llm-multiverse-ui#43