Glass Pipeline

Factory-driven glass extraction. SOTA PDF parsing. Snake SAT matching.

5
Factories
3ms
Snake Match
<30s
End-to-end
v1.0
Pipeline

Factory-Driven

Every extraction is scoped to a factory_id. Normalization rules, Snake matching, and business logic are factory-specific. VIT, Monce, VIP, Eurovitrage, TGVI.

Divide & Conquer Emails

Complex email threads are split into email body + attachments. Context extracted from emails informs the attachment extraction — no information lost.

SOTA PDF Parsing

PyMuPDF for text + image rendering. pdfplumber for table structure detection. Vision LLM for scanned pages. Hybrid approach handles everything.

Snake SAT Matching

Seamless integration with snake.aws.monce.ai. 3-tier cascade: Snake exact (3ms) → Fuzzy (1ms) → LLM fallback. Factory-scoped article matching.

API Endpoints

POST/extractSubmit PDF for extraction
GET/extract/{id}Poll extraction result
GET/healthStatus + factory list
GET/architectureSystem architecture
GET/paperTechnical paper
GET/economicsCost analysis