Incident Response Playbook for Developers

Incident response is the structured process of handling security breaches and cyber attacks. Every development team needs a plan, because it is not a matter of if an incident will happen, but when. This article presents a practical incident response playbook based on the NIST SP 800-61 framework.

The NIST Incident Response Framework

The NIST framework defines four phases: Preparation, Detection and Analysis, Containment Eradication and Recovery, and Post-Incident Activity. We add a fifth phase, Triage, between Detection and Containment.

Phase 1: Preparation

Preparation is the most important phase. Without preparation, every incident becomes a chaotic scramble.

Build a response team : Identify who handles security incidents. The team should include a incident commander, a security analyst, a system owner, a communications lead, and a legal representative.

Create runbooks : Document step-by-step procedures for common incident types: phishing, malware outbreak, data breach, ransomware, denial of service, and insider threat.

Set up tooling : Ensure the team has access to:

Centralized logging (SIEM like Splunk, ELK, or Sentinel)
Endpoint detection and response (EDR like CrowdStrike or Defender)
Network monitoring and packet capture
Secure communication channels (Slack, Teams, or Signal)
Evidence collection tools (FTK Imager, Volatility, tcpdump)

Practice regularly : Run tabletop exercises every quarter. Simulate a ransomware attack, a data exposure, or a compromised credential. Practice builds muscle memory.

Phase 2: Detection and Analysis

Detection relies on monitoring and alerting. Every alert is a potential incident candidate.

Alert sources :

SIEM correlation rules detecting anomalous patterns
EDR alerts for malware execution or suspicious process behavior
Cloud provider alerts (GuardDuty, Security Command Center, Defender)
Application logs showing unusual error rates or access patterns
User reports of suspicious activity

Triage questions :

What happened? What systems are affected?
When did it start? Is it ongoing?
What is the impact? Data loss? Service disruption?
Is this a true positive or a false alarm?
What severity level applies?

Severity classification :

SEV-1: Critical. Active data exfiltration, ransomware, or service-wide compromise. Immediate response required.
SEV-2: High. Confirmed intrusion but contained. Credential compromise affecting multiple users.
SEV-3: Medium. Potential compromise under investigation. Phishing campaign targeting employees.
SEV-4: Low. Minor policy violations. Automated scans with no evidence of exploitation.

Phase 3: Containment, Eradication, and Recovery

Containment stops the attack from spreading. Eradication removes the attacker's presence. Recovery returns systems to normal operation.

Short-term containment :

Disconnect affected systems from the network.
Disable compromised user accounts.
Block attacker IP addresses at the firewall.
Rotate credentials for affected services.

Example: Block an IP at the firewall

iptables -A INPUT -s 203.0.113.50 -j DROP

Example: Disable a compromised AWS IAM user

aws iam update-access-key \

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\--access-key-id AKIAIOSFODNN7EXAMPLE \

Long-term containment :

Apply security patches.
Implement additional monitoring for affected systems.
Deploy WAF rules to block attack patterns.

Eradication :

Remove malware using EDR tools.
Rebuild compromised servers from known-good images.
Revoke all session tokens and API keys.
Reset root passwords and privileged credentials.

Recovery :

Restore systems from clean backups.
Verify system integrity before returning to production.
Gradually reintroduce traffic while monitoring for recurrence.
Communicate recovery status to stakeholders.

Phase 4: Post-Incident Activity

The post-mortem is where the team learns from the incident and improves processes.

Post-mortem meeting : Within one week of containment, gather everyone involved. Blameless culture is essential — the goal is to improve systems, not assign blame.

Post-mortem document :

Timeline of the incident
Root cause analysis
What went well and what went wrong
Detection gaps and containment delays
Remediation items with owners and deadlines
Changes to runbooks, tooling, or architecture

Post-Mortem: Service Credential Leak

Date : 2026-04-15

Severity : SEV-2

Timeline

Root Cause

GitHub Actions workflow accidentally logged AWS_SECRET_ACCESS_KEY to debug output. Logs were publicly accessible.

Action Items

Forensic Evidence Collection

Proper evidence collection preserves data for legal action and root cause analysis.

Capture memory dumps using tools like LiME or Volatility before powering off systems.
Collect disk images using dd or FTK Imager rather than copying files live.
Record command output with timestamps using the script command.
Maintain chain of custody documentation for all evidence.

Capture memory dump with LiME

insmod lime.ko "path=/evidence/memory.dump format=lime"

Capture disk image

dd if=/dev/sda of=/evidence/disk.img bs=4M conv=noerror,sync

Conclusion

A well-practiced incident response process turns a potential disaster into a manageable event. Preparation separates professional teams from those that panic. Detection without response is just noise. And every incident, no matter how small, is an opportunity to improve.