Glass Pipeline

Factory-driven glass extraction. SOTA PDF parsing. Snake SAT matching.

Factories

3ms

Snake Match

<30s

End-to-end

v1.0

Pipeline

Factory-Driven

Every extraction is scoped to a factory_id. Normalization rules, Snake matching, and business logic are factory-specific. VIT, Monce, VIP, Eurovitrage, TGVI.

Divide & Conquer Emails

Complex email threads are split into email body + attachments. Context extracted from emails informs the attachment extraction — no information lost.

SOTA PDF Parsing

PyMuPDF for text + image rendering. pdfplumber for table structure detection. Vision LLM for scanned pages. Hybrid approach handles everything.

Snake SAT Matching

Seamless integration with snake.aws.monce.ai. 3-tier cascade: Snake exact (3ms) → Fuzzy (1ms) → LLM fallback. Factory-scoped article matching.

API Endpoints

POST/extractSubmit PDF for extraction

GET/extract/{id}Poll extraction result

GET/healthStatus + factory list

GET/architectureSystem architecture

GET/paperTechnical paper

GET/economicsCost analysis