Location: United States, Canada, Romania, Ukraine, Pakistan, Brazil, Argentina, Colombia
Type: Full-time | Remote
Our client is an early-stage startup building an AI intelligence layer for the commercial real estate industry, embedded directly inside Excel. Think of it as an AI analyst that lives inside the analyst's existing workbook — purpose-built for institutional CRE acquisitions. It understands domain-specific financial models and can handle everything from parsing offering documents to building financial models and running market research, all with full provenance and human-in-the-loop control.
We're looking for a Founding AI/ML Engineer to own the intelligence core of the platform. This is not about fine-tuning generic models — it's about building the reasoning, extraction, orchestration, and evaluation systems that make an AI analyst trustworthy enough to use in a $100M+ transaction.
You'll be the first dedicated AI/ML hire, working directly with the founders, and owning the full AI layer from day one.
- Multi-agent orchestration: coordinator + sub-agent architecture with a planning loop, routing tasks to the right model (OpenAI, Anthropic, Gemini, Perplexity, Mistral) based on cost, quality, and latency.
- Document intelligence pipelines: stateless extraction for financial documents (OMs, P&Ls, Rent Rolls) with per-field confidence scoring and bounding-box provenance.
- RAG and retrieval infrastructure: vector-backed retrieval with hybrid search, embedding pipeline management, and context assembly for grounded model responses.
- Evaluation and quality infrastructure: parser quality harnesses, extraction accuracy benchmarks, LLM output scoring, and feedback loops from analyst corrections.
- Prompt architecture and context management: system prompt design, tool schema engineering, context window optimization, and few-shot construction from live deal data.
- Provenance and hallucination controls: every output traces to a source document, page, and bounding box. If it can't be cited, it's flagged as an assumption.
- Model strategy: track frontier model releases and make build-vs-buy calls on fine-tuning, custom classifiers, and retrieval augmentation.
- Orchestrator: FastAPI (async Python), SSE streaming, multi-agent architecture
- AI / LLM: GPT-4.1, Claude Sonnet / Opus, Gemini 2.0 Flash, Perplexity Sonar Pro, Mistral
- Retrieval: pgvector, PostgreSQL, hybrid RAG, embedding pipelines
- Parsers: Azure Doc Intelligence, Mistral OCR, stateless extraction
- Evals: Custom harnesses, labeling pipelines, correction feedback loops
- Infra: Azure Container Apps, Service Bus, Blob Storage, Docker Compose
- Frontend: Next.js, React 19 + Office.js (interface with, not own)
- 5+ years building AI/ML systems in production — real systems, real users, real failure modes.
- LLM orchestration experience: tool-calling, multi-step reasoning chains, agent architectures, streaming.
- Expert-level Python: async FastAPI, type-annotated, well-structured.
- RAG systems: embedding pipeline design, hybrid retrieval, context assembly, chunk strategy.
- Evals mindset: you measure model quality systematically through benchmarks, harnesses, and accuracy scoring.
- Startup operating mode: you scope your own work, make judgment calls, and ship without waiting for consensus.
- Document extraction / OCR (Azure Doc Intelligence, Textract, or equivalent)
- Fine-tuning experience (LoRA, RLHF, DPO, or classifier fine-tuning)
- Vector database depth (pgvector, Pinecone, Weaviate)
- Financial document literacy (P&Ls, rent rolls, structured financial data)
- Multi-modal models: document layout understanding, table extraction, bounding-box grounding
- Prompt security: adversarial inputs, injection hardening, output validation
- Azure AI Services: OpenAI on Azure, Doc Intelligence, Blob-backed pipelines
- Week 1: Stand up the full stack locally. Run parser pipelines end-to-end on real documents. Understand the overall architecture.
- Week 2: Deep-dive the extraction pipelines. Trace a document through parse, extract, normalize, map, and write. Identify the weakest quality link.
- Week 3: Ship a measurable eval — a harness that scores extraction accuracy on a labeled document set and establishes a baseline.
- Week 4: Own an improvement — better field normalization, improved context assembly, or a new document type. Ship it with a passing eval.
- Full ownership of the AI layer of a real institutional product.
- Hard, unsolved problems in document extraction, multi-agent reliability, and provenance in an agentic write path.
- High-stakes domain: a $20T+ market where same-day turnaround wins deals. Your work directly affects whether a deal closes.
- Frontier model access: working with the latest models from OpenAI, Anthropic, Google, and Mistral in production.
- Operator founder: deep domain expertise from someone who has sat in the analyst, IC, and deal lead seats.