We are seeking a Lead Data Software Engineer to come on board with our team.
This position centers on developing and maintaining data infrastructure that fuels AI-powered products and intelligent agent systems. You'll get the chance to engage with state-of-the-art technologies and help shape scalable, dependable platforms within a cooperative setting.
Responsibilities
-
Plan, build, and support data ingestion and processing pipelines that supply RAG systems, covering the management of unstructured data, images, videos, metadata, and permissions
-
Oversee and fine-tune vector database infrastructure, such as Amazon Kendra alongside an active migration toward OpenSearch
-
Build evaluation datasets and performance measurement frameworks tailored to agents
-
Establish monitoring and observability pipelines for AI workloads, including dashboards for latency, quality, and cost
-
Roll out data governance, privacy guardrails, and quality controls for AI training and inference data
-
Back A/B testing and experimentation infrastructure used to evaluate agent iterations
-
Work jointly with Backend AI engineers on data schemas and embedding approaches
Requirements
-
At least 5 years of data engineering background, including direct work with AI/ML data infrastructure
-
A minimum of one year guiding and managing development teams
-
Solid Python expertise for crafting data pipelines, ETL workflows, and backend automation scripts
-
Practical production experience with vector databases, covering schema design and index management for Amazon Kendra or OpenSearch
-
Thorough grasp of search and retrieval concepts, including embedding models, chunking techniques, and retrieval optimization
-
Working familiarity with AWS services like S3, Glue, Athena, and Kinesis (or equivalents), as well as Docker and distributed data environments
-
Experience treating data quality practices such as monitoring, validation, and lineage tracking as operational standards
-
Background in defining AI/ML evaluation metrics and setting up systematic tracking using evaluation frameworks
-
English language proficiency in writing and speaking at B2+ level or higher
Nice to have
-
Exposure to LangSmith, RAGAS, or custom-built evaluation framework approaches
-
Experience with multi-modal data processing involving unstructured text, images, and videos, together with related governance
-
Hands-on participation in LLM fine-tuning data preparation
-
Familiarity with observability tools tightly integrated with AI calls, such as Langfuse or Arize
-
Background in constructing streaming data pipelines with technologies like Kafka or Kinesis
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn