Denis Ćutić

Principal Engineer @ Infobip

Summary

Software engineer with 10+ years of experience, specializing in building, scaling and operating mission-critical services and platform engineering. I am a software quality advocate, specializing in driving reliability and operational excellence. My work has centered on developing and leading the strategies essential to achieving the organization's reliability vision. I have deep expertise in Java, Python, system design, observability, automation and incident management for high-volume, globally distributed systems.

Skills and technologies

Languages Data CI/CD Observability ML/AI OS
Java (Spring) MSSQL Docker Prometheus Prophet Linux
Python PostgreSQL Kubernetes Grafana Azure OpenAI Windows
Groovy Elasticsearch Ansible Graylog Copilot  
Kotlin Kafka Rundeck NewRelic LM Studio  
  RabbitMq   OpsGenie Spacy  
  Redis        

Certificates

  • Coursera Deep Learning Specialization
  • CKAD: Certified Kubernetes Application Developer
  • CKA: Certified Kubernetes Administrator

Work experience

Infobip

Since 2022. - Principal Engineer

Product observability, proactive approach towards reliability, incident response automation, performance testing, prototyping, defining reliability and quality strategy

Troubleshooting automation

Developed an AI Agent that automates the process of troubleshooting (root cause analysis) by connecting to observability tools and using anomaly detection to identify issues. The process is driven based on knowledge of the platform and usage of OpenAI for reasoning. (Azure Data Explorer, Kusto query language, Python, Slack API, OpenAI LLM models, ReAct prompting).

Proactive reliability

Defined and implemented strategies and roadmaps needed for reaching higher reliability objectives. Currently defining and implementing chaos engineering practices. Coordinating the implementation of an easy and safe to use performance testing tool.

Software quality course

Developed a course covering the entire software development life cycle for software quality perspective. (Learning outcomes, Bloom taxonomy)

Product observability

Implemented end-to-end observability on product level. Development and maintenance of a custom tracing solution and instrumentation of the SMS product flow. Creating product-specific and client-specific dashboard templates for support teams. Defining policies for structured approach to product monitoring based on synthetic, real-user and front-end monitoring.

Management

Ensured timely delivery of initiatives through careful planning and breakdown. Defined technical learning paths for SRE team. Led cross-functional teams in short-term and long-term projects. Mentored two SRE colleagues from senior to staff level. Promoted SRE practices across the organization from developers to C-level organization.

2020. - 2022. - Mid to Senior Site Reliability Engineer

Incident management, Observability, Reliability, Automation

Incident management

Redefined and improved the entire process on company level. Implementing metrics and data collection to drive reliability improvements. Responsibilities: platform monitoring, incident response and review, impact assessment, coordination with management, product and support teams, incident and platform reliability reporting, product and service review from a reliability perspective. (Jira, Slack, Confluence)

Platform observability

Created dashboards based on various data sources for efficient monitoring and troubleshooting, setting up actionable alerts and notification policies. Using various sources and tools for troubleshooting and root cause analysis. Defined and implemented Service Level Indicators and Objectives for core products. (Prometheus, Alert Manager, Grafana, OpsGenie, GrayLog, Kibana, NewRelic)

Operations

Coordinated and participated in company-wide high-risk infrastructure maintenance tasks.

Company culture

Coordination of the initiative with engineering directors; definition of survey questions in close collaboration with human resources and employer branding departments; analysis of survey results.

2014. - 2020. - Junior to senior backend developer

Java, Spring, Data pipelines, API Gateways

Identity management services

Developed, refactored and maintained highly-available, mission critical services related identity management and authentication, that handled all authentication and authorization requests for the platform. (Java, Spring framework, MS SQL, Hibernate, Redis)

Elasticsearch

Set up and maintained several Elasticsearch clusters (up to 40 nodes, ~100T of data). Developed and maintained related services. (Java, Kafka) for data ingestion and manipulation.

HTTP API gateways

Designed, developed and maintained REST API backends for handling SMS traffic, and HTTP API gateways serving as a platform for the other engineering teams. (Java, Groovy, RabbitMq, Spring framework, Tomcat, WebFlux, RxJava, Reactor)

Chat bot

Designed and implemented an in-house chat bot solution using NLP with focus on developing the intent engine and named entity recognition. (Java, Python, spaCy)

Code Escape

Created and coordinated an escape room for developers.

SRE Community of Practice

Researched and introduced the concept of Communities of Practice to the Engineering department as a strategy for improving knowledge sharing across the organization. Organized and led, with the SRM, a community of practice inside the company for revising, promoting and improving SRE practices among developers.

Kompare.hr

2013. - 2014. - Junior full stack developer

PHP, jQuery, Angular 1.x, MySQL

Optimized SQL queries, introduced a debugger tool to speed up troubleshooting, code quality improvements

Degrees

2014. Masters degree in Computer Science @ Faculty of Electrical Engineering and Computing, University of Zagreb

Programmatic realization of the particle swarm optimization algorithm

2013. Masters degree in Computer Software Engineering @ Politecnico Di Milano

Ontology-assisted approach for learning causal Bayesian network structure

Other competencies

Languages Public speeches Soft skills
Croatian JavaCro (2016, 2018, 2019, 2021, 2022) Communication and organizational skills
English Infobip DevDays (2016) Team work
Italian Joker conf (2018) Team culture building
French Meetups (Java Zg, ElasticSearch Zg) Adaptability
  Faculty of Humanities and Social Sciences - University of Zagreb (2023) Analytical thinking