Position Summary:
We are seeking a proactive and technically skilled Software Development Manager to oversee the stability, performance, and efficiency of our production environment. This role will lead a small team of support specialists, manage incident response, monitor system health, and drive continuous improvements to ensure a seamless production operation. Acting as a liaison between engineering and support teams, the Production Support Manager plays a critical role in delivering high service levels and maintaining customer satisfaction.
Key Responsibilities:
-
Team Leadership: Manage and mentor a team of production support specialists, assign tasks, and foster skill development to ensure team effectiveness.
-
Production Monitoring: Oversee real-time monitoring of production systems and applications to detect issues, anomalies, and performance bottlenecks.
-
Incident Management: Lead timely response and resolution of critical production incidents, coordinating troubleshooting efforts across technical teams.
-
Problem Analysis: Conduct root cause analyses to identify recurring issues, implement corrective actions, and reduce future disruptions.
-
Performance Optimization: Identify opportunities for improving system performance, stability, and operational efficiency through automation, tuning, and capacity planning.
-
Change Management: Coordinate and manage the deployment of software releases and infrastructure changes, ensuring minimal disruption to production systems.
-
Reporting and Analytics: Generate regular reports on system performance, incidents, and KPIs to support informed decision-making and continuous improvement.
-
Cross-Functional Collaboration: Act as a bridge between engineering, DevOps, and support teams to ensure smooth production workflows and clear communication.
-
Other duties as assigned.
Required Qualifications:
-
3+ years of experience managing production support or technical operations teams.
-
Strong knowledge of cloud environments (e.g., AWS), production applications, and SQL-based databases.
-
3+ years of experience with scripting or programming languages such as Python and Java.
-
Proven expertise in incident management, problem diagnosis, and root cause analysis.
-
Excellent analytical and troubleshooting skills in high-pressure environments.
-
Strong communication skills, with the ability to collaborate effectively with both technical and non-technical stakeholders.
-
Ability to manage multiple priorities and lead teams during critical situations.
Preferred Qualifications:
-
Familiarity with ITIL or similar production support frameworks.
-
Experience with monitoring tools (e.g., CloudWatch, Datadog, Splunk) and incident management platforms (e.g., PagerDuty, ServiceNow).
-
Understanding of DevOps practices and CI/CD pipelines.
Physical Demands:
-
Regularly required to sit, stand, and walk.
-
Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions of the position.