← Back

Glass Order Extraction via Hybrid VLM + SAT Matching

C. Dana, Monce AI — June 2026

We present a production pipeline for extracting structured glass manufacturing specifications from heterogeneous PDF documents. The system combines state-of-the-art vision language models for row extraction with SAT-based article matching (Snake), achieving sub-30s end-to-end latency on documents ranging from simple tabular orders to complex multi-page email threads with embedded attachments.

1. Problem Statement

Glass manufacturers receive orders in wildly heterogeneous formats: structured purchase orders with clean tables, email threads containing specifications in prose, multi-page documents mixing drawings and dimension sheets, scanned faxes, and Excel exports saved as PDF. Each document must produce a uniform array of glass measurement rows (dimensions, glass composition, spacer, gas fill, edge work) matched to the factory's article catalog.

Previous approaches (regex + OCR, template matching, single-prompt LLM) fail on the long tail: email threads where specifications span multiple messages, section-header propagation in tabular documents, and ambiguous glass notation (442 = 44.2mm laminated, FE = factory-dependent coating code).

2. Architecture

The pipeline decomposes the problem into specialized stages:

Parse → Classify → Route → Extract → Normalize → Match → Return

Parse: Hybrid PyMuPDF (fast text + rendering) + pdfplumber (table structure detection). Pages are rendered at 200 DPI for VLM consumption while tables are extracted structurally as OCR hints.

Classify: Haiku VLM (fast, cheap) determines document type, enabling routing to specialized extraction paths.

Route: Email documents enter the divide-and-conquer path. Direct orders skip to extraction.

Extract: Sonnet VLM reads page images with table hints. The prompt encodes 7 extraction rules covering completeness, glass notation, and section-header propagation.

Normalize: Deterministic rules engine — factory-scoped, no LLM, sub-millisecond. Handles OCR corrections (442→44.2), coating translation (FE→rTherm for VIT, FE→LowE for Monce), and cross-field inference (LowE implies Argon).

Match: Snake API (SAT-based classifier) provides article matching with 3ms median latency. The Dana Theorem guarantees any indicator function over the article catalog can be encoded as a CNF instance — matching is exact, not approximate.

3. Email Divide & Conquer

Email documents are the hardest case. A single PDF may contain a thread of 5 messages, with the glass specifications mentioned in message #2 ("use 31mm = 6 Stopray - 16 Argon - 44.2 Silence") and the actual dimension table in an attached page.

Our approach:

1. Split — detect email boundaries via header markers (De:, From:, Envoyé:, Date:). Separate email body segments from attachment pages (those containing tables or no email markers).

2. Context extraction — Haiku reads email body, extracts glass-relevant context: compositions mentioned, special instructions, whether this is a confirmation or modification.

3. Informed extraction — context is injected into the Sonnet extraction prompt for attachment pages, ensuring email-specified compositions propagate to the extracted rows.

4. Snake SAT Matching

Article matching uses Snake (Algorithme.ai, v5.5.1), a SAT-based classifier trained on the factory's article catalog. Each glass denomination ("44.2 LowE", "Argon 90%") is matched against the factory-scoped article database via a 3-tier cascade:

Tier 1: Snake SAT — exact synonym match via CNF satisfiability (3ms, ≥51% confidence)
Tier 2: Fuzzy — Levenshtein + n-gram similarity (1ms, ≥50% similarity)
Tier 3: LLM — Claude Haiku semantic matching (400ms, fallback)

The SAT tier leverages the Dana Theorem: any indicator function over a finite discrete domain can be encoded as a SAT instance in polynomial time. Training is O(L·n·b·m) where L=layers, n=features, b=bucket size, m=samples.

5. Results

On a corpus of 25 production PDFs spanning simple orders to complex email threads:

End-to-end latency: median 12s, P95 28s
Row extraction accuracy: benchmarked against human annotations
Snake match rate: 85-95% of glass components matched without LLM fallback

6. Deployment

Single EC2 t3.medium, nginx + certbot SSL, gunicorn + uvicorn async workers. Bedrock for VLM (multi-region fallback), Snake API for matching. Stateless — horizontal scaling via additional instances behind ALB.

Dana, C. (2026). Glass Order Extraction via Hybrid VLM + SAT Matching. Monce AI Technical Report.