apply fundamental principles to solve technological problems efficiently
design and build robust, scalable components of CRED's open-source-first data platform — focusing on ingestion, reconciliation, observability, and cost-aware compute using Spark on EKS and Delta/Iceberg lakes
engineer high-throughput, low-latency data ingestion systems across RDS, Kafka, DynamoDB, and more — powering 100+ TB of daily data flows across regulated domains (PCI, PII)
abstract complexity for data producers and consumers by building internal SDKs, self-serve orchestration, and smart developer tooling
lead the design and rollout of AI copilots for data engineers and analysts — including pipeline assistants, quality bots, and lineage-aware query editors
instrument observability, lineage, and access control as first-class features in the platform — ensuring auditable and secure multi-tenant usage across subsidiaries
collaborate with product, infra, compliance, and data science teams to deliver platform capabilities that accelerate internal builds while enforcing governance
drive open-source adoption by replacing legacy SaaS or monoliths with modular, community-aligned infrastructure
continuously profile and optimize infrastructure for petabytes of data scale and cost efficiency
stay informed about evolving technologies and best practices in data engineering, contributing to the enhancement of data processes
communicate effectively with internal and external stakeholders, providing updates on the progress of key initiatives
represent the organization in the tech community, building a strong brand presence
you should apply if you :
have 5+ years of experience building large-scale distributed data systems with a strong grasp of Spark, Iceberg/Delta Lake, and orchestration frameworks (Airflow, Dagster, or similar)
prefer building platforms over pipelines — and enjoy solving challenges in multi-tenant architecture, developer experience, lineage, and cost attribution
have hands-on experience deploying open-source data frameworks on cloud-native environments like EKS or EMR, and automating deployments via IaC or CI/CD
have worked on systems where data quality, auditability, and compliance are critical — and have engineered observability into your platforms
are comfortable writing production-grade Python, Scala, or Java, and can design SDKs or frameworks to simplify engineering workflows
have a product mindset: you think of internal developers as customers and optimize for their velocity, safety, and trust
are excited by the potential of AI-augmented platforms — copilots for query suggestions, smart catalogs, lineage tools, and anomaly detection bots
thrive in ambiguous environments and want to define the architecture and tooling that powers a lot of engineers,analysts,operation and product folks
believe in open source, modularity, and solving infra problems for 10x scale
familiarity with SQL, basic database management systems, and an understanding of data warehousing concepts
effective communication skills, both written and verbal, with the ability to convey technical concepts to non-technical stakeholders
an ability to work independently and as part of a collaborative team in a dynamic environment
eagerness to learn and adapt, with an interest in data streaming technologies