Fulltime
Fulltime
Data Engineer
View below the job description in addition to the application form.
Responsibilities :
- Design, build, and own scalable, reliable data pipelines using Python, Spark (PySpark), and Delta Lake
- Implement and maintain CDC pipelines, SCD Type 2 logic, and data quality / validation frameworks
- Develop analytics-ready data models using PostgreSQL and dbt
- Own and improve CI/CD pipelines with GitHub Actions
- Build and optimize geospatial data pipelines, including:
- PostGIS
- H3 indexing
- Lat/Lon → H3 conversion
- Point-in-polygon & spatial joins (Spark-based)
- Collaborate with product and backend teams to define data requirements
- Review code, set best practices, and help raise the bar for data engineering quality
Tech Stack
- Data Engineering: Python, Spark (PySpark), Delta Lake, PostgreSQL, dbt, Apache Airflow
- CI/CD & DevOps: Git, GitHub Actions, Docker, pytest
- GIS: PostGIS, H3, spatial joins, geospatial enrichment pipelines
Preferred Qualifications:
- 5+ years of experience as a Data Engineer (or equivalent)
- Strong fundamentals in Python and SQL
- Proven experience building and operating Spark-based data pipelines in production
- Solid experience with PostgreSQL and analytical data modeling
- Comfortable owning pipelines end-to-end (design → deploy → monitor)
- Familiar with CI/CD workflows and production best practices
- Experience with — or strong interest in — geospatial data
- Linux-friendly, reliability-focused engineer
Nice to have
- Real engineering challenges — not toy problems
- Clean, pragmatic, and scalable architecture
- High ownership and technical impact
- Engineer-driven, tech-first environment