Job#: 2007675
Job Description:
Summary - Principal SRE - Hybrid 3 days/week - Charlotte, NC - Phoenix, AZ - Dallas, TX - W2 Only, no C2C*Candidate must be able to work on clients W2 without sponsorship now or in the future*
*Candidate must be able to commute to one of the following locations 3 days a week - Charlotte, NC - Phoenix, AZ - Dallas, TX*
The Principal Site Reliability Engineer (SRE) role on our dynamic SRE team is a subject matter expert and SRE professional, with key focuses in analyzing complex data and distributed systems, anticipating problems? and finding ways to mitigate risks to the environment.? Incorporating the knowledge of business drivers, the principal SRE will affect changes, will lead and drive the SRE charter with innovative improvements and facilitate best practices in using software engineering to enable automation and efficiency in all aspects of platform change management and operations. The main responsibilities include optimizing day-to-day activities to reliably support product roll out and operation through automation and mentoring other lead, senior and staff SRE toward adopting and implementing the devsecops culture.? As a principal SRE, the role will include both oversight for production operations and launch execution for major initiatives, as well as development/engineering of solutions to optimize system reliability.
You will identify opportunities to design, build and implement innovative solutions to solve unique platform and infrastructure problems in order to optimize product delivery and operations workflow and enhance platform production stability for the products. You will collaborate with other senior and lead team members within and outside of ITSO to evangelize the SRE mindset and system design toward optimizing the performance and availability of our environment.
Responsibilities
- Lead the design, build and implement orchestration and tooling solutions to optimize workflows and tasks can be achieved at a high level of efficiency and free of defect?
- Establish operational best practices for structuring, automating, building, deploying and monitoring complex distributed software products and environments.?
- Collaborate with other engineering teams to ensure the reliability and traceability of software releases and deployments of software and infrastructure changes.?
- Create and maintain platform operational engineering design specifications to aid the maintenance and smooth operation of software environments?
- Collaborate with other engineering teams to triage alerts & diagnose/resolve critical issues, and manage implementation of changes.
- Collaborate with other engineering teams in the coordination, documentation and tracking of critical incidents ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.?
- Lead, grow, mentor other SREs team members.
- Evangelize SRE mindset and mentor others about reliability and best practices of SRE?
- Maintain a strong understanding of IaaS, Paas, and SaaS offerings with building and maintaining a state-of-the-art, cloud-based environment for massive-scale data processing
- Ensure that implementation and solution are fully documented, and solution deployed with fully operationalized processes to support the solution lifecycle
- Other tasks as assigned
Minimum Requirements
- Ability to read and write code in Ansible
- 10+ years of experience in System engineering or Software engineering
- Advanced knowledge in at least 3 of the following key areas: Cloud native and IaaS Architecture (Azure preferred, will accept GCP/AWS) (performance testing, monitoring, operations), Design (compliance, security), Cloud Engineering (planning, provision), Containers Orchestration, Microservice architecture and engineering.
- Strong understanding of business technology drivers and their impact on architecture and engineering design, performance and monitoring?
- SME of Site Reliability Engineering (SRE) and DevOps philosophies, technologies, platforms and tools, SLA management, incident resolution, and automation.
- API First design, implementation and testing experience.
- Demonstrated ability to conceptualize, launch and deliver multiple engineering projects on time and within budget?
- Demonstrated ability to understand and troubleshoot complex problems under pressure
- Strong understanding of cloud native architecture and microservices design and deployment pattern
- Strong data analytical skills
- Banking industry experience a plus
Skills/Training Required
- Expert level of Linux/Unix skills and shell scripting.
- Minimum of 7 years of experience automating tasks, building cloud native software in microservice architecture style and writing tools in either Python, Go, or Ruby,? C#?
- Excellent knowledge of at least 3 of leading SQL and No SQL database technologies (Postgres, MongoDB, Graph databases, Cassandra etc.)?
- Hands-on advanced experience with implementing and supporting streaming and messaging technologies such as Rabbit MQ, Kafka, Apache Pulsar, Azure or GCP
- Minimum of 7+ years of experience working with container orchestration platform, Kubernetes preferred but will look other platform also such Apache Mesos, Docker Swarm.
- Expert knowledge of distributed tracing with hands on experience with implementing and operating any one of these: Jaeger, OpenTelemetry, Open Tracing, Zipkin
- Expert knowledge of service mesh technologies – preferred Istio but will consider others including Linkerd, Consul
EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or 844-463-6178.
EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or 844-463-6178.
Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico.