Responsibilities and Duties:
-Manage, mentor, train and schedule the remote team that works in various shifts in different time zones to include hiring, performance management, coaching and career development
-Establish and manage client coverage and schedules for global staff
-Technical design and ownership of the monitoring and alerting stack
-Choose and integrate the tools for your department to manage work
-Perform monitoring/alerting readiness assessments and ensure appropriate work backlogs are generated for changes required to set customers and their applications up for success
-Ongoing development and improvement of our offering(s) focused on customer site availability and platform incident response, reports as needed
-Participate in new client kickoff meetings
-Provide daily client update reports
Requirements:
-Professional level of English Fluency
-A minimum of two years as an IT Engineer
-Management of Support Engineers
-Two plus years responsible for managing the process of monitoring and alerting of production workloads, on-call schedules, escalation trees, and afterhours team
-Team communication through these channels (Phone, Video, Chat, Email, Tickets - Jira)
-Expertise with at least one Alert Management tool (Pagerduty, OpsGenie, AlertOps, VictorOps)
-Expert with monitoring tools such as Cloudwatch, Nagios, MRTG, Zabbix, SolarWinds, WhatsUp
-Intermediate level or higher level experience Linux Systems Administration Fundamentals
-Intermediate experience managing Cloud infrastructure in AWS
-Intermediate experience in the fundamentals of containers and Docker
-Intermediate or higher knowledge level of remote systems management using OpenSSH
-Strong understanding of Linux logging subsystems
-Troubleshooting and managing Linux services (Intermediate)
-Understanding of DNS fundamentals
-Experience working with wiki-based or markdown-based documentation tools (Intermediate)
Skills desired:
-Intermediate to Advanced Systems Administrator experience
Linux/Unix systems
-AWS Cloud or AWS Solutions Architect Associate Certification
Linux/Unix Shell Scripting
-Kubernetes fundamentals
-Prometheus and/or Grafana
-Knowledge of managing any specific AWS Managed Services including, but not limited to: ECS, EKS, S3, Elasticsearch, Cloudwatch, EC2 Auto-Scaling -Groups, Route53, Cloudwatch Logs
-Hands-on experience of one or more CI/CD or Build/Release tool
-IP Networking fundamentals
-Git for version control
-Terraform or any other infrastructure-as-code
Benefits
-100% remote position with option to lease shared work space
-Paid in USD
-Generous holidays and Unlimited paid time off
-Bonus payout for each new certification
-Paid for exams and certifications
-Work with cutting-edge technology - NO legacy systems!
-Paid online continuing training
-Internal training
-Sophisticated Internal library
-Individual professional development plan
-Free English Courses
-Fun corporate events
-DevOps & cloud community-centric corporate culture including Meetups, Conferences etc.
Caylent provides custom DevOps solutions for companies at every stage, giving our clients the freedom to focus on revenue-generating features, not infrastructure.