- A background in software or data engineering
- Polished communication skills with a proven record of leading work across disciplines
- Strong proficiency in Python programming
- Extensive experience with Apache Spark for large-scale data processing
- Expertise in containerization, particularly Docker and CI/CD technologies
- Experience designing and implementing RESTful APIs
- Comprehensive knowledge of AWS services, including: ECS Fargate for container orchestration, EMR (Elastic MapReduce) for big data processing and AWS Glue for ETL workflows
- Proven track record of building and maintaining complex ETL pipelines
- Experience with workflow management tools, specifically Apache Airflow
- Proficiency in using dbt (data build tool) for data transformation and modelling
- Strong understanding of DevOps principles and CI/CD practices
- Excellent problem-solving skills and attention to detail
- Ability to work effectively in a fast-paced, collaborative environment
It would be really great (but not a deal-breaker) if you had:
- Demonstrated experience in building ML platforms or MLOps infrastructure
- Experience with Polars, a high-performance DataFrame library for Rust and Python
- Familiarity with caching tools and strategies for optimizing data access and processing
- Knowledge of vector databases and their applications in machine learning pipelines
- Experience with search engines like Elasticsearch for efficient data indexing and retrieval
- Understanding of ML model serving frameworks and A/B testing methodologies
- Contributions to open-source MLOps tools or frameworks
- Familiarity with ML model versioning tools (e.g., MLflow, DVC)