Senior Site Reliability/DevOps Engineer

Remote

Full Time

Experienced

About AutoRABIT:
AutoRABIT is a hyper-growth SaaS software company and the leading provider of Salesforce DevSecOps platform for regulated industries such financial institutions, insurance, and healthcare. AutoRABIT solutions enable developers to automate their daily tasks to be more productive and increase the release velocity for their development team, while meeting the stringent security, compliance, and privacy regulations.

About the role:

AutoRABIT is looking for a Senior Site Reliability/DevSecOps Engineer to help develop, scale and operate our cloud services

In this role you will be an experienced business professional able to implement and execute best practice operations and improvements across teams by providing visibility and recommendations for improved reliability and automation. Responsible for the security, availability, performance, efficiency, change management, monitoring, emergency response, capacity planning, back-up, and disaster recovery of our technical ecosystem, as well as drive automation while building a robust and agile DevSecOps framework.

Accountability, agility and strong analytical skills paired with an obsession for learning, gathering data and executing on that data, are key to being successful in this role.

Responsibilities:

Broadly, Site Reliability or DevSecOps engineer with a passion for, automation, reliability, scalability, monitoring, and capacity planning. But you have the breadth of knowledge necessary to support a wide variety of software and systems.

Contribute to the development and maintenance of frameworks for monitoring, automation and code to increase the scalability and reliability of the service

Assist both internal and customer facing teams with deployment of new software releases, VPN and other related security infrastructure interfacing.

Assist with resolution of AutoRABIT service or customer issues as required

Participate in and practice sustainable incident response and blameless postmortems

Contribute to the automation of manual tasks, such as the provisioning of users in production and test environments.

Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration

Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve
Participate in a regular on-call or rotational schedule needed to support AutoRABIT servers, including weekends and holidays
Responsibility to adhere to set internal controls

Required Skills and Experience:

Design, implement, and maintain scalable, resilient, and secure infrastructure using AWS.

Develop and manage infrastructure as code using Terraform.

Implement and manage CI/CD pipelines to automate deployments and ensure smooth delivery of applications.

Monitor system performance, identify bottlenecks, and implement solutions to improve reliability and performance.

Troubleshoot, resolve, and perform RCAs for incidents, while ensuring minimal disruption to services.

Collaborate with development teams to ensure applications are designed for reliability and performance.

Working Experience with Shell Scripting (Bash), Python or equivalent is required

Good Knowledge of programming languages such as Python, Go, or Java.

Working Experience with configuration management tools such as Ansible or Chef.

Implement and maintain monitoring, logging, and alerting systems to ensure the health and performance of our infrastructure.

Ensure security best practices are followed and compliance requirements are met.

Responsibility to adhere to set internal controls.

Can-do attitude: challenging status, leading, and contributing to key improvements and innovations, while maintaining accountability
Excellent written and verbal US English communication skills for working across a global team environment

Education and Background:

Bachelors in Computer Science, Engineering, or equivalent degree or experience

5+ years of experience in site reliability engineering, DevOps, or a related field.

AWS, GCP and/or Azure Certified

3+ Years of Kubernetes experience

3+ years' experience managing Linux-based systems in a public cloud such as AWS, GCP, or Azure

3+ years of experience with systems monitoring and logging; knowledge of ELK is preferred
Solid understanding of standard TCP/IP networking and common protocols like DNS, load balancers, HTTP, etc.
Must be a US citizen/permanent resident, and capable of obtaining a Government Security clearance if required and live in and work from the US. Green card holders qualify, but H1B or other work visa holders do not qualify for this role.

Salary range for this role is $175,000 - $200,000 depending on experience.

THIS IS A 100% REMOTE JOB, but requires 10% travel and an in-person component to the interview process.

Apply for this position

Required*

First Name*

Last Name*

Email Address*

Phone*

Address

Resume*

We've received your resume. Click here to update it.

Attach resume or Paste resume

Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

What's your citizenship / employment eligibility?*

LinkedIn Profile URL:*

Desired salary*

Are you a citizen of the United States?*

Describe a time you managed infrastructure for multiple environments (QA, UAT, Prod). What tools and processes did you use to ensure consistency and reliability*

What steps would you take to respond to a production incident involving service downtime detected via Elasticsearch monitoring?*

Can you provide an example of how you’ve contributed to compliance readiness (e.g., ISO 27001, SOC 2, HIPAA, PCI-DSS or FedRAMP) in your previous role?*

Describe a Python script or solution you have implemented using Boto3 to automate AWS resource management (e.g., EC2, S3, IAM, or CloudWatch). What problem were you solving, what decisions did you make during the design, and how did you ensure the solution was secure and scalable?*

Are you willing to be on-call, and/or work a weekend rotation approximately once every two months as well as work timings to accommodate meetings with global teams (all US time zone)?*

Please briefly describe any experience working at a startup. What do you like most about working at a startup?*

How do you maintain and continue growth either personally or professionally?*

Describe the most recent thing you learned that made you feel excited or joyful.*

Human Check*

Submit Application

AutoRABIT Holding Inc.

Thanks for visiting our Career Page. Please review our open positions and apply to the positions that match your qualifications.

Senior Site Reliability/DevOps Engineer

Apply for this position