Module 8a · Incident Response
Security Champions · Module 8a
Incident Response
A structured approach to handling security incidents. Part 1 of 2 — the PICERL cycle.
P I C E R L
Cover
Your journey · Program map
Your journey so far
RoleProcessRisksFindSupplySecretsConfigM8RespondMobile
9 modules. One toolkit. You are at Module 8.
Context
Your toolkit · So far
What you already have
Modules 1–7
M1–2
Champion role, sprint workflow, business language
How you operate day-to-day as a Champion
M3
Risk vocabulary: OWASP Top 10, five lenses
What to look for in every product you touch
M4
Threat finding: STRIDE → DREAD → tickets
How you hunt and file threats
M5–7
Where threats live: supply chain, secrets, config
Controls for code, credentials, and configuration
Context
Module 8 · When prevention isn't enough
When prevention isn't enough
PICERL
Modules 2–7 built your prevention toolkit. But prevention has limits. At some point — not if, when — something will get through.
How do you respond when the pager goes off at 2 AM? How do you contain without destroying evidence? How do you communicate during chaos?
You'll learn three things:
1
PICERL cycle
Prepare → Identify → Contain → Eradicate → Recover → Learn
2
Your first runbook
A concrete, testable procedure — not a policy document
3
Your IR role as Champion
The product's representative providing architecture context the IR team can't get elsewhere
Context
Act 1 · Definitions
The foundation
An event is an observable change in a system. The word that matters is observable — if your monitoring doesn't capture it, if nobody notices it, the event doesn't exist for you.
Somewhere right now, a server is being compromised. Without observation, it's invisible.
Your ability to respond to incidents depends entirely on your ability to observe events. Monitoring coverage defines the boundary of what you can protect.
05 / 31 · Definitions
Act 1 · Definitions
From event to incident
An incident is an event that has caused — or could cause — negative impact to your business.
This definition is broader than "the system is down." Unauthorized data access is an incident even when the system works perfectly. Undetected data modification is an incident. A published vulnerability affecting your stack can be treated as an incident.
Some organizations classify published vulnerabilities as incidents — it triggers the response process, creates a ticket, and ensures someone addresses it. Whether you include precursors in your incident definition is a design choice, but they must be captured by some process.
06 / 31 · Definitions
Time to think · The observation gap
RESPONSE CAPABILITY OBSERVABLE ACTUAL INCIDENTS Blind spots Capacity gap

Better monitoring closes the first gap. Better processes close the second.

Time to think
Act 1 · Definitions
Already happened vs might happen
Indicators — evidence of something that already occurred. Unauthorized access in logs. Customer data on a dark web marketplace.
Precursors — signals of something that could happen. A critical vulnerability published. An unrotated service account with admin access.
A new critical Kubernetes vulnerability is published. Is it an incident for your team? If yes, it enters the response pipeline and someone patches it. If no, it must be captured by vulnerability management. The worst outcome is when nobody owns it.
08 / 31 · Definitions
Time to think · The PICERL cycle
P PREPARATION Before the alert I IDENTIFICATION Detect & classify C CONTAINMENT Stop the bleeding E ERADICATION R RECOVERY Restore operations L LESSONS Close the loop CONTAINMENT BEFORE INVESTIGATION

Six phases. The cycle improves with every incident your team processes.

Time to think
Act 2 · Preparation
Before the alert fires
Preparation is everything you set up before an incident occurs — what counts as an incident, who responds, what tools they use, what criteria they follow.
Five severity levels is the standard: critical, high, medium, low, informational. One project used a 10-point scale and people couldn't distinguish a 6 from a 7. Simpler scales mean faster triage.
In distributed architectures, predicting impact is hard. If the payment service degrades, how many downstream services break? You often don't know until it happens.
Your role as a Champion during an incident: you provide product context — architecture, trust boundaries, data flows, recent changes. The IR team has the security expertise; you bridge the gap to the product. After the incident, you write or update the runbook for this incident type. You are not the incident commander — you are the product's representative in the response process.
10 / 31 · Preparation
Act 2 · Preparation
You can't prepare for everything
Some incident types aren't in your playbook because nobody imagined they could happen. A cloud provider taking down entire regions during an infrastructure update. A third-party SDK pushing a compromised version through a legitimate update channel.
Trying to cover every possible scenario from the start is impossible and counterproductive. It produces a 200-page document that nobody reads.
Start with your top 5 most likely incident types. Define procedures for those. After each real incident, add the new type. In one to three years, your playbook covers 95% of what actually happens — because it's built from reality.
11 / 31 · Preparation
Best Practice
Build from real incidents, not theory
Best Practice
Building your IR process from real incidents rather than theoretical scenarios
Start with five severity levels and five incident types. Define response procedures for those. Everything else gets the generic playbook until you have data.

After each real incident, update the playbook: add the new incident type, adjust criteria, tune thresholds. Review your plans annually — stored configurations, golden images, and backup procedures decay over time. The OS image you saved two years ago may not be available in your cloud provider anymore.

The process is never finished. It's iterated.
12 / 31 · Best Practice
Act 3 · Identification
Where signals come from
Incident signals arrive from three categories of sources:
Technical — monitoring alerts, SIEM, IDS/IPS, log analysis. The automated layer.
Human — a Slack message, a support call, a developer noticing something odd. Often the first signal before automated alerts fire.
External — a news article, a dark web monitoring alert, a security researcher's disclosure, a partner reporting anomalous traffic.
People are often the first responders — before any monitoring system fires. Your process needs a clear path for human-reported signals, not just automated ones.
13 / 31 · Identification
Act 3 · Identification
Same event, different causes
A customer calls: "I can't log in." This is an event with at least four possible explanations:
Forgot the password, wrong keyboard layout, caps lock. No incident — just a user having a bad morning.
Auth database full, login service crashed. An operational incident, but not a security one.
Account compromised, password changed by attacker. Security incident — containment required.
Someone with access reset the password inappropriately. A different kind of security incident with different containment.
One event. Four possible causes. The first responder needs written criteria to distinguish between them — not intuition.
14 / 31 · Identification
Time to think · Initial triage
EVENT REPORTED Known false positive pattern? YES Close NO Can it impact business? NO Log & watch YES Is it actively ongoing? YES INCIDENT Assign team NO PRECURSOR Create ticket

The first responder follows the tree. Written criteria, not improvisation.

Time to think
Knowledge check · Triage
Knowledge check
Your monitoring shows 847 failed authentication attempts for an admin account in 30 minutes, from 12 different IP addresses. The account is not locked. What type of incident is this?
D. The brute force is happening now (indicator). If the account isn't locked and the password is weak, compromise is imminent (precursor). Immediate action: lock the account or enforce rate limiting.
16 / 31 · Quiz
Act 4 · Containment
The most important phase
Containment is the most counterintuitive phase. The instinct is to investigate — who did this? How did they get in? But investigation takes time, and the incident is ongoing during every minute of analysis.
The principle: containment takes priority over investigation. Predefined actions, executed immediately based on incident type, without waiting for a complete understanding of what happened.
If your containment procedure requires investigation before action, it's not a containment procedure — it's an investigation procedure with a containment label.
17 / 31 · Containment
Act 4 · Containment
Short-term, backup, long-term
Short-term — immediate actions to stop ongoing damage. Block the account, restrict access to trusted IPs, isolate the host. This buys time, not a permanent fix.
Backup — before changing anything, snapshot the current state. Disk images, database backups, config snapshots. Evidence you don't preserve now is evidence you can never analyze.
Long-term — fixes that allow systems to operate securely while investigation continues. Patching, account removal, network segmentation.
The best malware analysis source is a RAM dump. But shutting down a server destroys RAM. For most organizations, restoring business takes priority over forensic completeness. If evidence preservation is required, build memory dumps into the containment plan — before shutdown, not instead of it.
18 / 31 · Containment
Act 4 · Real case
Block everyone. Use the paper.
A private bank with fewer than 1,000 high-value clients had a radical containment approach for critical AD compromises.
🚫Slack
Compromised
📱Signal
Out-of-band
📞Phone tree
Pre-registered
Pattern
When the network is compromised, out-of-band communication saves you. Pre-register a backup channel (phone tree, Signal group) before you need it.
19 / 31 · War story
Time to think · Predefined containment
🚨 ALERT TRIGGERS AD compromise detected 🔒 SCRIPT FIRES All admin accounts blocked 📄 PAPER PASSWORD Restoration begins 0 min ~2 min restoration begins

No investigation needed. No decision-making under pressure. The response was defined before the incident happened.

Champion's takeaway
If attackers are reading your Slack, your incident response is their intelligence feed. Assume breach in your communication planning.
Time to think
Act 4 · Real case
When attackers own the conversation
A large holding company's Active Directory was fully compromised — every user, every mailbox, every internal conversation was controlled by the attackers.
21 / 31 · War story
Time to think · The correct sequence
CONTAIN SNAPSHOT INVESTIGATE ERADICATE MINUTES HOURS – DAYS INVESTIGATE UNDERSTAND THEN CONTAIN ATTACKER STILL ACTIVE

Investigation is important — but it comes after the bleeding stops, not before.

Time to think
Act 5 · Eradication
Remove, restore, verify
Eradication removes the threat and restores affected systems to a clean state. Two challenges are common:
You can't always restore to the same state — VM images get deprecated, OS versions reach end of life, cloud providers retire services. Your two-year-old backup might reference infrastructure that no longer exists.
And you need to check whether the issue exists in similar systems. An attacker who compromised one service probably probed others.
Document everything during eradication. What was removed, what was restored, what was rebuilt. If you don't document it now, critical details will be forgotten within days.
23 / 31 · Eradication
Act 5 · Recovery
Four numbers that define recovery
RPO — Recovery Point Objective. How much data loss is acceptable? Drives your backup frequency.
RTO — Recovery Time Objective. How long until systems are operational? Drives your infrastructure decisions.
SDO — Service Delivery Objective. The minimum service level that keeps the business alive.
MTO — Maximum Tolerable Outage. The hard ceiling. Beyond this, consequences become catastrophic.
A sports betting platform goes down. Full recovery takes days. But the SDO might be: a static web page saying "call this number" plus phone operators accepting bets. The business continues at reduced capacity while full recovery proceeds. Define your SDO before you need it.
24 / 31 · Recovery
Time to think · Recovery metrics
MTO MAXIMUM TOLERABLE OUTAGE INCIDENT SDO Minimum service HOURS RTO Systems operational HOURS – DAYS RPO Full data restored BACKUP AGE = DATA LOSS

SDO gets the business running. RTO gets systems back. RPO gets data back. MTO is the ceiling you must beat.

Time to think
Knowledge check · Recovery
Knowledge check
Your e-commerce platform suffers a ransomware attack. Database backups are 6 hours old. Your MTO is 8 hours. What's the recovery approach?
D. First, reach the SDO: a static page with a phone number keeps the business alive within minutes. In parallel, restore from the 6-hour backup. You'll lose 6 hours of data, but business continues. Never negotiate with attackers without law enforcement guidance.
26 / 31 · Quiz
Act 6 · Lessons learned
The phase that closes the loop
The lessons learned meeting should happen within two weeks — while details are fresh. It answers: what was the scope, how effective was containment, what worked well, what was slow, and what changes would prevent recurrence?
The most important outcome isn't the report — it's the process changes that result. An updated playbook. A tuned monitoring rule. A new containment procedure. If lessons learned produces only a document, it failed.
Not every incident type can be prevented. When it can't, the goal shifts to reducing response time — detect earlier, contain faster, recover more efficiently. Improving from 4-hour response to 45-minute response is a meaningful security improvement even if the incident itself can't be prevented.
27 / 31 · Lessons learned
Best Practice
One channel, one ticket system
Best Practice
Routing all incident communication through a single channel and ticketing system
During an incident, fragmented communication is the most common process failure. Updates scattered across Slack channels, email threads, and war rooms that not everyone knows about.

Route all incident reports — from employees, customers, and monitoring systems — into a single ticketing system. This creates faster identification (all signals in one place), a searchable knowledge base over time, and a foundation for automation.

Over months and years, the documented procedures that accumulate in this system handle the vast majority of incidents — turning ad-hoc responses into repeatable processes.
Channel template: #inc-YYYY-MM-description. Pin: incident summary, severity, owner, current status. Update every 30 minutes. Tag every decision with timestamp and author. This format is directly usable — create it as a Slack template today.
28 / 31 · Best Practice
Summary · Part 1
What you covered
An event is an observable change. An incident is an event with negative business impact. Your monitoring coverage defines what you can protect.
The PICERL cycle provides structure: Prepare → Identify → Contain → Eradicate → Recover → Learn. Each real incident improves the next response.
Containment takes priority over investigation. Predefined actions execute immediately based on incident type.
Recovery has four metrics: RPO, RTO, SDO, and MTO. The SDO keeps the business alive while full recovery continues.
Lessons learned must produce process changes, not just documentation.
If your team interacts with external parties during incidents (regulators, partners, CERTs), two standards matter. NIST SP 800-61 provides the standard incident handling framework that most regulatory bodies expect. FIRST TLP (Traffic Light Protocol) classifies information by sharing scope: RED (named recipients only), AMBER (limited sharing), GREEN (community), CLEAR (public). Knowing these exists is enough for now — your security team will guide the specifics.
29 / 31 · Summary
Module 8a · Your results
Your Performance
Total XP
0
Rank
RECRUIT
Best streak
0
Reflections
0 / 2
30 / 31 · Results
Next · Part 2
Next
Operationalizing Incident Response
Part 2 covers alert management, false positive reduction, runbooks and playbooks, triage processes, and automation.
31 / 31 · Bridge
Reflect
Retake module
Reset all progress and start over?
Your XP, streak, quiz answers, reflections, and wagers will be cleared. This cannot be undone.