Aditya Komaravolu

Converting 2D CAD drawings into 3D BIM models is one of the most labor-intensive tasks in structural engineering. A single floor plan can take an experienced engineer hours to classify, dimension, and extrude into a standards-compliant IFC model. We built an agentic platform that does this conversationally — the engineer uploads a DXF, talks to an AI agent, and watches a 3D model materialize in an embedded viewer. This post explains how we architected it.

The Problem

The traditional CAD-to-BIM workflow looks roughly like this:

Open a 2D DXF drawing in a viewer
Manually identify structural elements — columns, walls, beams, slabs, footings, stairs
Assign classifications and properties to each element
Extract or estimate 3D dimensions (heights, thicknesses, soffit levels)
Model each element in a BIM authoring tool (Revit, ArchiCAD, etc.)
Export as IFC for interoperability

Steps 2 through 5 are where all the time goes. They require domain expertise, spatial reasoning, and repetitive data entry. More critically, they require judgment — is this polygon a column or a pedestal? Is that void an opening or a stairwell? Should this slab be sunken?

We hypothesized that an LLM agent with the right domain tools could orchestrate this entire pipeline. Not by replacing the engineer's judgment, but by doing the mechanical work and asking the right questions at the right moments. The engineer becomes a reviewer and decision-maker rather than a data entry operator.

The Architectural Journey

The platform went through three distinct architectural phases. Each one taught us something fundamental about building agentic systems.

Phase 1: CLI Spoofing

Our first working prototype shelled out to the claude CLI as a subprocess:

# Phase 1 approach (simplified pseudo-code)
process = subprocess.Popen(
["claude", "--session-id", sid, "--output-format", "json"],
stdin=PIPE, stdout=PIPE
)
process.stdin.write(user_message)

for line in process.stdout:
parsed = json.loads(line)  # fragile
push_to_browser(parsed)

The agent communicated with the frontend through filesystem side effects. When it wanted to show a widget (a structured input form), it wrote a JSON file to a watched directory. A filesystem watcher detected the file, parsed it, and pushed an SSE event to the browser.

What worked: We shipped fast. The CLI gave us Claude Code's full capabilities — file editing, bash execution, MCP tool calling — without building any of it ourselves.

What broke:

Stdout parsing was fragile — non-JSON lines broke the parser
Filesystem watchers introduced race conditions
Context window management was invisible
The system prompt was 266 lines, mostly preventing CLI misbehavior

Net line count for the orchestration layer alone: over 1,200 lines of glue code.

Phase 2: Claude Agent SDK

Anthropic released the Claude Agent SDK — an official Python package that wraps Claude Code programmatically. This changed everything.

Instead of parsing stdout, we got typed Python messages. Instead of filesystem watchers, we got in-process tool calls. Instead of hoping the CLI wouldn't start a web server, we got hook-based permission control.

# Phase 2 approach (simplified pseudo-code)
client = ClaudeSDKClient(
system_prompt="...",             # 40 lines, not 266
mcp_servers={
    "domain": in_process_tools,   # No filesystem watchers
    "bim": external_mcp_server,
},
hooks={
    "PreToolUse": [bash_guard],   # Structurally prevent bad commands
}
)

async for message in client.receive_response():
adapt_to_sse(message)

The migration deleted five files totaling ~1,200 lines and replaced them with ~450 lines of SDK integration. The system prompt dropped from 266 lines to about 40.

Phase 3: Proper Agentic Platform

With the SDK handling the agent loop, we could focus on the platform itself. The backend moved from Flask to FastAPI for native async support. The frontend moved from vanilla JavaScript to React + TypeScript + Vite. The UI adopted a glassmorphism design system with dark/light theme support.

The Agentic Loop

Here is the conceptual flow of a single user turn:

User types message
    |
    v
+------------------+
|  FastAPI Backend  |  (async handler)
+--------+---------+
     |
     v
+------------------+
| Session Manager  |  <-- async locks, widget event gates
+--------+---------+
     |
     v
+------------------+
|  Agent SDK       |  <-- conversation history, context window
+--------+---------+
     |
+----+----+
|         |
v         v
+-------+ +--------+
| Text  | | Tool   |  <-- Agent decides: respond or invoke
| Block | | Call   |
+-------+ +----+---+
           |
      +----+----+
      |         |
      v         v
   +----------+ +----------+
   | Domain   | | Built-in |
   | MCP Tool | | Tool     |
   +----+-----+ +----------+
    |
    v
+------------------+
| Stream Adapter   |  <-- typed messages to SSE events
+--------+---------+
     |
     v
+------------------+
| Browser          |  <-- React renders incrementally
+------------------+

The agent decides which tools to use based on conversation context. It does not follow a hardcoded pipeline. The LLM's reasoning drives the orchestration.

Flask to FastAPI Migration

The Widget Response Problem

When the agent needs structured input from the user — a floor height, a classification choice — it calls a tool that presents a form widget. The user fills it in and submits. The agent needs that response within the same turn, without restarting.

In Flask (synchronous), this required threading gymnastics. In FastAPI (asynchronous), it becomes natural — the widget tool suspends its coroutine via asyncio.Event, the event loop continues processing the response POST, and the coroutine resumes in the same agent turn.

SSE Streaming with asyncio

Server-Sent Events map naturally to async generators. The backend maintains a per-session async queue. Domain tools, the stream adapter, and system events all push to this queue. The SSE endpoint drains it with a 30-second keepalive timeout.

MCP Tools for Domain Capabilities

The agent has custom MCP tools that run in-process, covering five capability areas:

Drawing parsing — Extract line segments, form polygons, compute geometric properties
2D visualization — Push segmentation views into the browser's visualization pane
3D model viewing — Push IFC models, control wireframe/camera/category visibility
BIM generation — Generate IFC models with proper structural elements, material layers, property sets
Structured input collection — Present form widgets, receive responses within the same turn

Each tool has dual output: visual for the user (SSE push) and textual for the agent (return value).

Skills as Domain Expertise

The platform has three skill domains that the agent loads on demand:

CAD understanding — Layer conventions, geometric heuristics, element classification
BIM generation — IFC entity mappings, default dimensions, slab void cutting logic
Structural enrichment — 3D dimension collection strategies, point cloud extraction

Progressive Disclosure

Skills load only when the conversation reaches a point where they're needed. If the user uploads a DXF, the agent loads CAD knowledge. If the conversation progresses to BIM generation, the generation rules load. This keeps the context window focused.

The Streaming Architecture

Text Dripping

The SDK yields complete message blocks. We simulate smooth streaming by breaking text into ~6-character chunks with ~30ms delays between them.

SSE Event Types

Event	Purpose
`message_chunk`	Incremental text content
`tool_call_start`	Agent invoked a tool
`tool_call_end`	Tool finished
`widget`	Structured input form
`viewer_update`	Push to 2D/3D viewer
`viewer_command`	Control 3D viewer
`thinking_start`	Agent turn beginning
`done`	Agent turn complete

Sealed vs. Active Block Rendering

The frontend splits accumulated text into sealed blocks (completed, rendered once) and an active block (in-progress, re-rendered on each chunk). This eliminates flicker. Code fences are buffered until closed, then syntax-highlighted in one pass.

Embedded Visualization

The chat UI has a split-pane layout with two viewers:

2D Segmentation — Interactive SVG canvas with color-coded polygons, zoom/pan, hover tooltips
3D IFC Viewer — Three.js-based with orbit controls, category visibility, wireframe toggle, element selection

Both viewers are controlled by the agent through tool calls that push SSE events. The agent says "showing the model" and the model appears. No user action required.

The UI Evolution

Vanilla JS → React + TypeScript + Vite

Vite with hot module replacement for development
React 19 with hooks throughout
TypeScript for type safety across the SSE contract
Shiki for VS Code-quality syntax highlighting
Framer Motion for micro-animations

Glassmorphism Theme

Dark-first design with translucent surfaces, subtle borders, and restrained color. Floating bubble layout — agent messages left, user messages right. Karla + Source Code Pro typography.

Dark/Light Theme with View Transition API

The toggle uses the View Transition API for a smooth circular expansion animation. Progressive enhancement — browsers without support get instant switching.

Landing Page → Chat Transition

Two states: a centered landing page (logo, greeting, input, quick-start chips) and a split-pane chat view. The transition happens on first message — session created, SSE connected, layout morphs.

Lessons Learned

Structural constraints beat prompt constraints — A 20-line hook that blocks server-starting bash commands is more reliable than 200 lines of "NEVER do this" in the system prompt.
Tools should have dual output — Visual for users (SSE push), textual for agents (return value). Both are necessary.
Widget responses within the same turn are transformative — The agent's full reasoning context is preserved. This is the difference between a chatbot and an agent.
Progressive skill loading is worth the complexity — Loading all domain knowledge at once consumed 30-40% of the context window. On-demand loading keeps it focused.
Sealed/active block splitting is essential — Eliminates flicker, enables deferred syntax highlighting.
The agent should control the viewer — When the agent drives the visualization, the experience is seamless.
Async is not optional — Agent turns last 30+ seconds with concurrent SSE streaming, widget responses, and file uploads. Async handles this naturally.
Filter agent noise aggressively — The user should see conclusions, not research.

Built by the Ouvra team. The platform converts 2D structural engineering drawings into 3D BIM models through conversational AI.