Job Description
Workload Automation (IBM Tivoli-TWS/IWS) Lead/EngineerLocation: Warsaw, IN Duration: 6 months Role Overview We are seeking a Lead Workload Automation Engineer / Architect to define and drive the enterprise architecture, strategy, and operational model for IBM Tivoli/IBM Workload Scheduler (TWS/IWS) across distributed environments (on-prem and cloud). This role sets platform standards and reference designs, leads modernization and major upgrades/migrations, governs reliability and security practices, and serves as the senior technical partner for application, databases, and infrastructure organizations to deliver resilient, scalable scheduling services for mission-critical workloads. In addition, assist and supervise two job scheduling teams. Key Responsibilities Own the end-to-end architecture for the TWS/IWS platform (components, topology, environments, integrations), including standards, patterns, and reference implementations. Provide technical oversight for additional (3rd-party) job scheduling platforms where used; establish operating standards, integration patterns, and support processes to ensure consistent controls and reliability. Lead enterprise-scale installations, upgrades, and migrations; define cutover/rollback strategies, coordinate change windows, and ensure readiness across dependent teams. Lead assessments of legacy scheduler instances and batch frameworks to identify candidates for retirement, consolidation, or migration; produce target-state recommendations, sequencing/roadmaps, and risk-based migration plans. Define reliability engineering practices for workload automation: availability targets, capacity planning, performance tuning, monitoring/alerting, and continuous improvement. Design and validate high-availability and disaster recovery solutions (including DB2 HADR where applicable); plan and execute regular DR tests and remediate gaps. Establish governance for workload onboarding and job design: scheduling standards, dependency modeling, naming conventions, calendars, critical path optimization, and SLA/SLO management. Architect and productionize automation for platform operations and self-service (e.g., provisioning, reporting, batch controls) using shell/Python/Perl and enterprise tooling. Own security and compliance posture: access model (LDAP/SSO), least-privilege controls, audit evidence, vulnerability remediation, and secure configuration baselines. Manage and develop two teams (e.g., platform engineering and operations): set priorities and operating rhythms, oversee delivery and support outcomes, coach/mentor team members, and drive performance management in partnership with leadership. Be available for major outages and critical events related to job scheduling, including QEND activities up to four (4) times per fiscal year, providing incident leadership, stakeholder communications, and post-incident follow-up. Participate in an on-call rotation and provide after-hours/weekend support as needed to maintain scheduling availability and meet business SLAs. Support a global operating model by working flexibly across EMEA and US business hours to provide required coverage and stakeholder overlap. Serve as escalation point for complex incidents; lead root-cause analysis and drive problem management to prevent recurrence. Mentor and guide engineers; lead technical design reviews, documentation/runbook standards, and knowledge sharing across the organization. Deep dive into other job scheduling teams like Automate, AS400 and Robot and assist in supervising these teams in IT Operations. Required Qualifications High School Diploma or equivalent 10+ years of experience in enterprise workload automation, including 7+ years of hands-on IBM TWS/IWS/IWA administration in distributed environments. Bachelor’s degree or 10+ years of equivalent IT industry service experience For senior/lead equivalent roles, 8+ years of relevant ITSM/major incident operations experience may be required. IT Technology Certification is a plus Proven experience in a lead/architect capacity: defining platform standards/reference designs, guiding cross-team implementations, and making architecture decisions for reliability, scalability, and security. Strong Linux/UNIX engineering and production troubleshooting experience, including performance and availability triage. Advanced automation/scripting skills (shell plus Python and/or Perl) with experience building supported, maintainable operational tooling. Demonstrated ability to lead complex incident response and root-cause analysis, and to drive preventative action through problem management. Strong change leadership in regulated production environments (planning, risk management, implementation, validation,