Job Description
Must Have Technical/Functional Skills We are seeking a Site Reliability Engineer (SRE) with strong expertise in Talend and Big Data platforms to support and operate large-scale data processing environments. The role requires close collaboration with customers, application teams, and offshore delivery teams to ensure platform reliability, incident management, and operational excellence. Experience with Databricks is a strong plus. Key Responsibilities Act as an SRE for Big Data and ETL platforms, ensuring high availability, performance, and reliability of data pipelines and applications. Provide operational support and incident management (MIM), including triage, root cause analysis, and resolution of production issues. Serve as a primary point of contact for customers, providing timely updates, issue resolution, and operational insights. Collaborate closely with application teams to support ETL jobs, data processing workflows, and platform enhancements. Coordinate with offshore teams for day-to-day operations, incident resolution, and continuous improvement initiatives. Monitor, troubleshoot, and optimize Talend, Hadoop, Spark, and Big Data ecosystems. Implement and support monitoring, alerting, runbooks, and automation to improve platform stability and reduce manual effort. Participate in problem management, change management, and post-incident reviews to drive preventive measures. Support capacity planning, performance tuning, and reliability improvements across the data landscape. Required Skills & Qualifications Strong hands-on experience with Talend (development, support, and troubleshooting). Solid understanding of Big Data technologies, including: o Hadoop ecosystem o Apache Spark Proven experience handling Major Incident Management (MIM) and production support in a 24x7 or on-call environment. Experience working directly with customers, business stakeholders, and cross-functional teams. Strong coordination skills to manage and guide offshore teams. Knowledge of ITIL processes, especially Incident, Problem, and Change Management. Excellent communication, documentation, and stakeholder management skills. Roles & Responsibilities Act as an SRE for Big Data and ETL platforms, ensuring high availability, performance, and reliability of data pipelines and applications. Provide operational support and incident management (MIM), including triage, root cause analysis, and resolution of production issues. Serve as a primary point of contact for customers, providing timely updates, issue resolution, and operational insights. Collaborate closely with application teams to support ETL jobs, data processing workflows, and platform enhancements. Coordinate with offshore teams for day-to-day operations, incident resolution, and continuous improvement initiatives. Monitor, troubleshoot, and optimize Talend, Hadoop, Spark, and Big Data ecosystems. < li>Implement and support monitoring, alerting, runbooks, and automation to improve platform stability and reduce manual effort.- Participate in problem management, change management, and post-incident reviews to drive preventive measures. Support capacity planning, performance tuning, and reliability improvements across the data landscape. TCS Employee Benefits Summary: Discretionary Annual Incentive. Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans. Family Support: Maternal & Parental Leaves. Insurance Options: Auto & Home Insurance, Identity Theft Protection. Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement. Time Off: Vacation, Time Off, Sick Leave & Holidays. Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing. Salary Range-$120000-$160,000 a year Location Deerfield, IL Job Function TECHNOLOGY Role Lead Job Id 412624 Desired Skills Azure | Big Data | DevOps | Spark | Talend Salary Range $120,000-$160,000 a year