Job Details
Job Description
Purpose of the Role
The Data Engineer is the architect and builder of Dis-Chem Life’s data foundation, creating the infrastructure that turns raw information into a strategic asset. This role goes far beyond moving data from A to B, it is about designing high-performance, future-proof systems that make data accurate, accessible, and truly powerful.
By developing best-in-class data pipelines, warehouse systems, architecture, and governance frameworks the Data Engineer enables the entire organisation, from the actuarial, data science and analytics teams to general operations, to work with clean, structured, and reliable datasets at scale while protecting our customers’ data privacy as stipulated in the POPI Act.
Solving hard engineering problems, building resilient ingestion frameworks, handling messy and complex source systems, optimising cloud architecture for cost and performance, and ensuring that every downstream user can focus on insight and innovation rather than wrangling.
The ultimate purpose is to build and continuously evolve a scalable, intelligent data platform that grows with Dis-Chem Life’s ambitions, fuels advanced analytics and modelling, unlocks automation, and sets a new benchmark for how data drives customer intelligence and operational excellence in the South African insurance industry.
Summary of the Role
The Data Engineer is responsible for designing, implementing, and maintaining the core technical solutions that keep Dis-Chem Life’s running at peak performance. This includes building scalable and resilient data ingestion frameworks, integrating complex source systems, and optimising cloud architecture for both performance and cost efficiency. The role requires deep hands-on experience with modern data engineering tools, ETL/ELT processes, workflow orchestration, and cloud platforms. Strong problem-solving skills, precision, and the ability to collaborate seamlessly with analytics, AI, and automation teams are essential. The Data Engineer continuously drives improvements in data processes and platform efficiency, ensuring the organisation can rely on high-quality, reliable data to make faster, smarter, and more impactful decisions.
Benefits
- Competitive salary
- Direct and significant influence over building the company’s data backbone as we are still in early development stages.
- Exposure to advanced analytics and AI projects with real-world business impact
- Access to modern cloud, orchestration, and automation technologies
- Hybrid working model with flexibility and autonomy
- Will be working with interesting datasets comprising health data, customer behaviour, payments, retail spend, etc.
- Build & Maintain Data Pipelines, Architecture, and Software
- Design, develop, optimise, and monitor scalable ETL/ELT pipelines and warehouse systems,
- Implement, monitor, and maintain reporting and analytics software.
- Architect robust, future-proof data infrastructure to support advanced analytics, AI, and automation.
- Ensure performance, reliability, and security across all data systems.
- Ensure Data Quality, Reliability & Accessibility
- Implement rigorous data quality validation, monitoring, and governance to guarantee data integrity.
- Deliver clean, well-structured datasets that downstream teams can confidently use.
- Minimise time spent on data cleaning and wrangling for actuaries, data scientists, and operational BI analysts.
- Enable AI, Analytics & Automation
- Prepare AI-ready datasets with consistency, scalability, and timeliness in mind.
- Collaborate with data scientists to build feature pipelines for predictive modelling.
- Support advanced automation workflows and real-time data requirements.
- Scale Data Architecture
- Design and optimise best-in-class data architecture to be capable of handling increasing data volumes and complexity.
- Leverage cloud-native solutions to enable rapid scaling, flexibility, and cost efficiency.
- Continuously enhance data infrastructure performance and reduce operational costs.
- Handle Complex Engineering Challenges
- Own the technical work of data ingestion, transformation, and orchestration.
- Solve challenging engineering problems to allow teams to focus on insights, models, and decisions.
- Act as the go-to expert for ensuring data is accessible, accurate, and usable.
- Collaboration & Knowledge Sharing
- Work closely with analysts, actuaries, and data scientists to understand evolving data needs.
- Document data flows, definitions, and system processes to ensure transparency and reusability.
- Mentor colleagues and promote best-practice data engineering across the organisation.
Soft Skills
- Obsessed with clean, high-quality data and how it drives better models/decisions
- Collaborative mindset, thriving at the intersection of engineering and analytics
- Strong communicator, able to explain complex engineering choices to non-technical users
- Detail-driven but pragmatic, balancing precision with speed in delivery
- Curious, innovative, and always seeking ways to improve
Technical Skills
- Data Architecture - design and implementation of scalable, maintainable data systems, defining data flows, and establishing architectural patterns for enterprise-scale solutions
- Advanced SQL - extraction, transformation, and optimisation
- Python Programming - strong skills (pandas, PySpark) for data pipelines and scientific workflows
- Big Data Frameworks - hands-on experience with at least one major framework (Hadoop, Spark, Kafka, Elastic Stack, or Databricks)
- Database Expertise - proficiency across all industry standard types including relational (PostgreSQL, MySQL), NoSQL (MongoDB, Cassandra). Understanding of lesser used types including time-series (InfluxDB, TimescaleDB) and graph databases (Neo4j)
- Data Modelling - dimensional modelling, normalisation, star/snowflake schemas, and designing efficient data structures for analytical workloads
- Data Lifecycle Management - end-to-end data management including ingestion, storage, processing, archival, retention policies, and data quality monitoring throughout the pipeline
- Data Science Integration - familiarity with feature stores, model-serving pipelines
- ETL/ELT Tools - hands-on experience with tools like dbt, Windmill, Airflow, Fivetran
- Cloud Platforms - experience with AWS, Azure, or GCP and modern warehouses (Snowflake, BigQuery, Redshift)
- Streaming Data - knowledge of real-time data processing (Kafka, Spark, Flink)
- Infrastructure Management - experience with Docker, Kubernetes, container orchestration, and managing scalable data infrastructure deployments is advantageous
- APIs & Integrations - understanding of APIs, integrations, and data interoperability
- Version Control - Git and CI/CD practices for production data pipelines
- Data Governance - familiarity with governance and compliance (POPIA, FAIS)
Experience
- 3–5 years in a Data Engineering or related technical role
- Proven ability to deliver clean, scalable pipelines supporting analytics and AI
- Hands-on work with cloud-native and warehouse systems
- Experience collaborating with Data Science teams to deliver AI/ML-ready datasets
- Exposure to regulated industries (insurance/finance) advantageous
Qualifications
- Bachelor’s degree in Data Engineering, Computer Science, Information Systems, or related field
- Cloud certifications (AWS, Azure, GCP) or Data Engineering credentials preferred
- Advanced SQL and Python certifications are advantageous