palantir
New York • fulltime
Posted on: 6/4/2025
Required Skills:
Job Description:
A World-Changing Company
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.
The Role
Join the Substrate Edge Team at Palantir, where we are responsible for mission-critical production infrastructure — encompassing hundreds of Kubernetes clusters across on-premise deployments - from large datacenter to small footprint edge devices. We are now seeking a Senior Infrastructure Engineer with specialized experience in Ceph to boost the scale, and reliability of our ruggedized Kubernetes offerings under novel operating constraints.
If you are passionate about infrastructure at scale, have a deep understanding of Ceph, and are eager to engage with the broader open-source community, we encourage you to apply for this exciting opportunity.
As a Senior Infrastructure Engineer, you will leverage your expertise in Ceph to manage and optimize storage solutions at scale, ensuring seamless integration with our Kubernetes infrastructure. You will play a key role in automating hundreds of deployments of Ceph clusters on heterogeneous hardware, and making core contributions to the broader Ceph and CNCF communities.
Core Responsibilities
- Manage Ceph at Scale: Design, deploy, and maintain Ceph storage solutions across diverse hardware environments, ensuring high availability and performance under challenging constraints.
- Automate Deployments: Develop and implement automation strategies for managing multiple Ceph deployments, reducing manual intervention and enhancing operational efficiency using world-class tooling.
- Innovate and Contribute: Drive the adoption of novel features and tools within the Ceph and CNCF ecosystems, contributing upstream as necessary to improve the broader community.
- Engage with Communities: Actively participate in the Ceph developer community and the CNCF, sharing insights and collaborating on open-source projects.
- Infrastructure Excellence: Collaborate with the team to design and build the next generation of Palantir’s infrastructure, focusing on systems that are scalable, stable, and secure.
What We Value
- Ceph & Rook Expertise: Practical, hands-on experience managing Ceph storage solutions, with a deep understanding of its architecture and operational nuances ideally using rook.
- Automation Proficiency: Strong skills in infrastructure automation tools such as Terraform, Kubernetes Operators, and with coding proficiency in Go, Java, or equivalent.
- Systems Programming: Experience in systems programming with proficiency in Go, Rust, C/C++, or equivalent languages.
- Hardware and OS Knowledge: Deep familiarity with hardware configurations, operating systems, and diagnostic tools.
- Networking Fundamentals: Solid understanding of networking principles, with experience in CNIs or cloud networking infrastructure preferred.
- On-premise datacenter experience: Experience working with on-premise hardware, or as sysadmin/SRE in data centers.
What We Require
- Professional Experience: 4+ years of software development experience focused on core infrastructure with an emphasis on operational excellence.
- System Design Expertise: 2+ years of experience in system design or architecture, including reliability and scaling of new and existing systems.
- SRE / Operational Experience: 1+ year of being operationally responsible for production grade Ceph clusters.
- Educational Background: Bachelor’s degree in Computer Science or equivalent practical experience.