We are looking for a Senior Data Software Engineer to join our team.
This role focuses on building and supporting data infrastructure that powers AI-driven products and intelligent agent systems. You'll have the opportunity to work with cutting-edge technologies and contribute to scalable, reliable platforms in a collaborative environment.
Responsibilities
-
Design, build, and maintain data ingestion and processing pipelines that feed RAG systems, including handling unstructured data, images, videos, metadata, and permissions
-
Administer and optimize vector database infrastructure, including Amazon Kendra with an ongoing migration to OpenSearch
-
Create evaluation datasets and performance measurement frameworks for agents
-
Develop monitoring and observability pipelines for AI workloads, covering latency, quality, and cost dashboards
-
Implement data governance, privacy safeguards, and quality controls for AI training and inference data
-
Support A/B testing and experimentation infrastructure for assessing agent iterations
-
Collaborate with Backend AI engineers on data schemas and embedding strategies
Requirements
-
A minimum of 3 years of data engineering experience, including direct exposure to AI/ML data infrastructure
-
Strong Python skills for building data pipelines, ETL processes, and backend automation scripting
-
Hands-on production experience with vector databases, including schema design and index management for Amazon Kendra or OpenSearch
-
Deep understanding of search and retrieval concepts, including embedding models, chunking strategies, and retrieval optimization
-
Practical knowledge of AWS services such as S3, Glue, Athena, and Kinesis (or equivalents), along with Docker and distributed data environments
-
Experience embedding data quality practices such as monitoring, validation, and lineage tracking as operational defaults
-
Background in designing AI/ML evaluation metrics and establishing systematic tracking through evaluation frameworks
-
English language proficiency (written and spoken) at B2+ level or higher
Nice to have
-
Experience with LangSmith, RAGAS, or custom evaluation framework solutions
-
Background in multi-modal data processing covering unstructured text, images, and videos, along with associated governance
-
Hands-on involvement with LLM fine-tuning data preparation
-
Familiarity with observability tooling deeply integrated with AI calls, such as Langfuse or Arize
-
Experience building streaming data pipelines using technologies such as Kafka or Kinesis
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn