Fulltime

Data Engineer

Fulltime
Data Engineer
View below the job description in addition to the application form.

Responsibilities :

  • Design, build, and own scalable, reliable data pipelines using Python, Spark (PySpark), and Delta Lake
  • Implement and maintain CDC pipelines, SCD Type 2 logic, and data quality / validation frameworks
  • Develop analytics-ready data models using PostgreSQL and dbt
  • Own and improve CI/CD pipelines with GitHub Actions
  • Build and optimize geospatial data pipelines, including:
    • PostGIS
    • H3 indexing
    • Lat/Lon → H3 conversion
    • Point-in-polygon & spatial joins (Spark-based)
  • Collaborate with product and backend teams to define data requirements
  • Review code, set best practices, and help raise the bar for data engineering quality
Tech Stack
  • Data Engineering: Python, Spark (PySpark), Delta Lake, PostgreSQL, dbt, Apache Airflow
  • CI/CD & DevOps: Git, GitHub Actions, Docker, pytest
  • GIS: PostGIS, H3, spatial joins, geospatial enrichment pipelines

Preferred Qualifications:

  • 5+ years of experience as a Data Engineer (or equivalent)
  • Strong fundamentals in Python and SQL
  • Proven experience building and operating Spark-based data pipelines in production
  • Solid experience with PostgreSQL and analytical data modeling
  • Comfortable owning pipelines end-to-end (design → deploy → monitor)
  • Familiar with CI/CD workflows and production best practices
  • Experience with — or strong interest in — geospatial data
  • Linux-friendly, reliability-focused engineer
Nice to have
  • Real engineering challenges — not toy problems
  • Clean, pragmatic, and scalable architecture
  • High ownership and technical impact
  • Engineer-driven, tech-first environment