Architecting Agentic CAD-to-BIM Platform Using Claude Agent SDK
Converting 2D CAD drawings into 3D BIM models is one of the most labor-intensive tasks in structural engineering. A single floor plan can take an experienced engineer hours to classify, dimension, and extrude into a standards-compliant IFC model. We built an agentic platform that does this conversationally — the engineer uploads a DXF, talks to an AI agent, and watches a 3D model materialize in an embedded viewer. This post explains how we architected it.
The Problem
The traditional CAD-to-BIM workflow looks roughly like this:
- Open a 2D DXF drawing in a viewer
- Manually identify structural elements — columns, walls, beams, slabs, footings, stairs
- Assign classifications and properties to each element
- Extract or estimate 3D dimensions (heights, thicknesses, soffit levels)
- Model each element in a BIM authoring tool (Revit, ArchiCAD, etc.)
- Export as IFC for interoperability
Steps 2 through 5 are where all the time goes. They require domain expertise, spatial reasoning, and repetitive data entry. More critically, they require judgment — is this polygon a column or a pedestal? Is that void an opening or a stairwell? Should this slab be sunken?
We hypothesized that an LLM agent with the right domain tools could orchestrate this entire pipeline. Not by replacing the engineer's judgment, but by doing the mechanical work and asking the right questions at the right moments. The engineer becomes a reviewer and decision-maker rather than a data entry operator.
The Architectural Journey
The platform went through three distinct architectural phases. Each one taught us something fundamental about building agentic systems.
Phase 1: CLI Spoofing
Our first working prototype shelled out to the claude CLI as a subprocess:
# Phase 1 approach (simplified pseudo-code)
process = subprocess.Popen(
["claude", "--session-id", sid, "--output-format", "json"],
stdin=PIPE, stdout=PIPE
)
process.stdin.write(user_message)
for line in process.stdout:
parsed = json.loads(line) # fragile
push_to_browser(parsed)
The agent communicated with the frontend through filesystem side effects. When it wanted to show a widget (a structured input form), it wrote a JSON file to a watched directory. A filesystem watcher detected the file, parsed it, and pushed an SSE event to the browser.
What worked: We shipped fast. The CLI gave us Claude Code's full capabilities — file editing, bash execution, MCP tool calling — without building any of it ourselves.
What broke:
- Stdout parsing was fragile — non-JSON lines broke the parser
- Filesystem watchers introduced race conditions
- Context window management was invisible
- The system prompt was 266 lines, mostly preventing CLI misbehavior
Net line count for the orchestration layer alone: over 1,200 lines of glue code.
Phase 2: Claude Agent SDK
Anthropic released the Claude Agent SDK — an official Python package that wraps Claude Code programmatically. This changed everything.
Instead of parsing stdout, we got typed Python messages. Instead of filesystem watchers, we got in-process tool calls. Instead of hoping the CLI wouldn't start a web server, we got hook-based permission control.
# Phase 2 approach (simplified pseudo-code)
client = ClaudeSDKClient(
system_prompt="...", # 40 lines, not 266
mcp_servers={
"domain": in_process_tools, # No filesystem watchers
"bim": external_mcp_server,
},
hooks={
"PreToolUse": [bash_guard], # Structurally prevent bad commands
}
)
async for message in client.receive_response():
adapt_to_sse(message)
The migration deleted five files totaling ~1,200 lines and replaced them with ~450 lines of SDK integration. The system prompt dropped from 266 lines to about 40.
Phase 3: Proper Agentic Platform
With the SDK handling the agent loop, we could focus on the platform itself. The backend moved from Flask to FastAPI for native async support. The frontend moved from vanilla JavaScript to React + TypeScript + Vite. The UI adopted a glassmorphism design system with dark/light theme support.
The Agentic Loop
Here is the conceptual flow of a single user turn:
User types message
|
v
+------------------+
| FastAPI Backend | (async handler)
+--------+---------+
|
v
+------------------+
| Session Manager | <-- async locks, widget event gates
+--------+---------+
|
v
+------------------+
| Agent SDK | <-- conversation history, context window
+--------+---------+
|
+----+----+
| |
v v
+-------+ +--------+
| Text | | Tool | <-- Agent decides: respond or invoke
| Block | | Call |
+-------+ +----+---+
|
+----+----+
| |
v v
+----------+ +----------+
| Domain | | Built-in |
| MCP Tool | | Tool |
+----+-----+ +----------+
|
v
+------------------+
| Stream Adapter | <-- typed messages to SSE events
+--------+---------+
|
v
+------------------+
| Browser | <-- React renders incrementally
+------------------+
The agent decides which tools to use based on conversation context. It does not follow a hardcoded pipeline. The LLM's reasoning drives the orchestration.
Flask to FastAPI Migration
The Widget Response Problem
When the agent needs structured input from the user — a floor height, a classification choice — it calls a tool that presents a form widget. The user fills it in and submits. The agent needs that response within the same turn, without restarting.
In Flask (synchronous), this required threading gymnastics. In FastAPI (asynchronous), it becomes natural — the widget tool suspends its coroutine via asyncio.Event, the event loop continues processing the response POST, and the coroutine resumes in the same agent turn.
SSE Streaming with asyncio
Server-Sent Events map naturally to async generators. The backend maintains a per-session async queue. Domain tools, the stream adapter, and system events all push to this queue. The SSE endpoint drains it with a 30-second keepalive timeout.
MCP Tools for Domain Capabilities
The agent has custom MCP tools that run in-process, covering five capability areas:
- Drawing parsing — Extract line segments, form polygons, compute geometric properties
- 2D visualization — Push segmentation views into the browser's visualization pane
- 3D model viewing — Push IFC models, control wireframe/camera/category visibility
- BIM generation — Generate IFC models with proper structural elements, material layers, property sets
- Structured input collection — Present form widgets, receive responses within the same turn
Each tool has dual output: visual for the user (SSE push) and textual for the agent (return value).
Skills as Domain Expertise
The platform has three skill domains that the agent loads on demand:
- CAD understanding — Layer conventions, geometric heuristics, element classification
- BIM generation — IFC entity mappings, default dimensions, slab void cutting logic
- Structural enrichment — 3D dimension collection strategies, point cloud extraction
Progressive Disclosure
Skills load only when the conversation reaches a point where they're needed. If the user uploads a DXF, the agent loads CAD knowledge. If the conversation progresses to BIM generation, the generation rules load. This keeps the context window focused.
The Streaming Architecture
Text Dripping
The SDK yields complete message blocks. We simulate smooth streaming by breaking text into ~6-character chunks with ~30ms delays between them.
SSE Event Types
| Event | Purpose |
|---|---|
message_chunk | Incremental text content |
tool_call_start | Agent invoked a tool |
tool_call_end | Tool finished |
widget | Structured input form |
viewer_update | Push to 2D/3D viewer |
viewer_command | Control 3D viewer |
thinking_start | Agent turn beginning |
done | Agent turn complete |
Sealed vs. Active Block Rendering
The frontend splits accumulated text into sealed blocks (completed, rendered once) and an active block (in-progress, re-rendered on each chunk). This eliminates flicker. Code fences are buffered until closed, then syntax-highlighted in one pass.
Embedded Visualization
The chat UI has a split-pane layout with two viewers:
- 2D Segmentation — Interactive SVG canvas with color-coded polygons, zoom/pan, hover tooltips
- 3D IFC Viewer — Three.js-based with orbit controls, category visibility, wireframe toggle, element selection
Both viewers are controlled by the agent through tool calls that push SSE events. The agent says "showing the model" and the model appears. No user action required.
The UI Evolution
Vanilla JS → React + TypeScript + Vite
- Vite with hot module replacement for development
- React 19 with hooks throughout
- TypeScript for type safety across the SSE contract
- Shiki for VS Code-quality syntax highlighting
- Framer Motion for micro-animations
Glassmorphism Theme
Dark-first design with translucent surfaces, subtle borders, and restrained color. Floating bubble layout — agent messages left, user messages right. Karla + Source Code Pro typography.
Dark/Light Theme with View Transition API
The toggle uses the View Transition API for a smooth circular expansion animation. Progressive enhancement — browsers without support get instant switching.
Landing Page → Chat Transition
Two states: a centered landing page (logo, greeting, input, quick-start chips) and a split-pane chat view. The transition happens on first message — session created, SSE connected, layout morphs.
Lessons Learned
- Structural constraints beat prompt constraints — A 20-line hook that blocks server-starting bash commands is more reliable than 200 lines of "NEVER do this" in the system prompt.
- Tools should have dual output — Visual for users (SSE push), textual for agents (return value). Both are necessary.
- Widget responses within the same turn are transformative — The agent's full reasoning context is preserved. This is the difference between a chatbot and an agent.
- Progressive skill loading is worth the complexity — Loading all domain knowledge at once consumed 30-40% of the context window. On-demand loading keeps it focused.
- Sealed/active block splitting is essential — Eliminates flicker, enables deferred syntax highlighting.
- The agent should control the viewer — When the agent drives the visualization, the experience is seamless.
- Async is not optional — Agent turns last 30+ seconds with concurrent SSE streaming, widget responses, and file uploads. Async handles this naturally.
- Filter agent noise aggressively — The user should see conclusions, not research.
Built by the Ouvra team. The platform converts 2D structural engineering drawings into 3D BIM models through conversational AI.