We are hiring a Senior Python Engineer to help drive the evolution of an enterprise knowledge base platform.
This platform is transitioning from a Confluence-focused RAG chatbot to a comprehensive agentic knowledge system. You will play a key role in expanding the platform’s capabilities, including hybrid retrieval, multi-source ingestion, evaluation pipelines, agent infrastructure, and shared chat platform primitives.
Responsibilities
-
Architect and develop backend services using Python 3.11, FastAPI, Pydantic, SQLAlchemy async, and asyncpg
-
Design retrieval and orchestration strategies that balance quality, latency, cost, safety, and operational simplicity
-
Build robust agent runtime features such as memory boundaries, tool sandboxing, permissions, and budget controls
-
Enhance answer grounding, failure analysis, and citation enforcement to improve reliability and accuracy
-
Establish observability and operational feedback loops with OpenTelemetry, Prometheus, Grafana, Docker, Helm, and GitHub Actions
-
Collaborate with product and engineering teams to support multiple conversational interfaces through a unified knowledge platform
-
Develop ingestion infrastructure for current and future content sources
-
Implement observability across application, pipeline, database, and model-serving layers
-
Manage cost, latency, throughput, and failure modes for AI-intensive workloads
-
Create release workflows that validate AI behavior changes beyond code compilation
Requirements
-
Minimum 3 years of relevant professional experience in software engineering
-
Extensive hands-on experience with Python in platform, automation, or infrastructure-focused environments
-
Experience building command-line tools using Python, Golang, or Rust
-
Practical knowledge of LangGraph, LangChain, pgvector, and modern retrieval pipelines
-
Experience designing evaluation frameworks for LLM-powered systems, including regression detection and quality assessment
-
Strong background with Docker, Helm, GitHub Actions, and Kubernetes-based workflows
-
Understanding of embedding pipelines, vector search, and operational aspects of LLM-driven systems
-
Advanced observability skills, including metrics, tracing, dashboards, alerting, and log analysis
-
Experience with ingestion, ETL, or large-scale content-processing pipelines
-
Ability to approach system design with considerations for reliability, cost, latency, throughput, and recovery
-
Excellent oral and written communication skills in English at B2+ level or higher
Nice to have
-
Experience with FastAPI for building web APIs
-
Familiarity with Grafana and Splunk for monitoring and log analysis
-
Knowledge of Qdrant, Neo4j, or other vector and graph database infrastructure
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn