If you’ve ever stared at a red dashboard tile at 02:00 thinking “Okay… but why?”, you’ve already met the difference between monitoring and observability. Add security into the mix and suddenly people start throwing around SIEM, SOC, and incident—often interchangeably.
This post clears up those terms from a practical Linux/system administration perspective, with examples you can actually run on a server. The goal is simple: build a mental model that makes it obvious what each thing is for and where it fits in a real environment.
Why these words matter (and why people mix them up)
In operations, you want reliable answers to two questions:
- Is the system healthy? (availability, performance, capacity)
- Is the system safe? (abuse, intrusion, policy violations)
Monitoring and observability are mostly about health. SIEM and SOC are mostly about security. An incident is what you declare when “something weird” becomes “something real.”
Monitoring: “Is it OK?”
Monitoring is the practice of tracking known signals and alerting when a condition crosses a defined threshold. It’s the “dashboard lights” model.
Typical monitoring signals
- CPU usage above 90% for 5 minutes
- Disk space below 10%
- Service down (e.g., nginx, sshd, database)
- HTTP error rate above X
- Latency above Y
Monitoring usually produces alerts like:
- “nginx is down”
- “Disk /var is 95% full”
- “API latency exceeds SLO”
Monitoring is great at
- Fast detection of known failure modes
- Availability tracking and SLAs/SLOs
- Capacity signals (disk growth, memory pressure)
Monitoring is weak at
- Explaining why something happened
- Catching novel failure modes you didn’t anticipate
Bottom line: monitoring answers “Is it broken?” and “How bad is it?”
Observability: “What’s happening and why?”
Observability is the ability to understand what’s happening inside a system by looking at the data it emits—especially when the problem wasn’t pre-defined as a threshold.
It’s often described via three pillars:
1) Metrics
Numbers over time (timeseries).
- Load average, CPU, memory, IO
- Request rate, error rate
- Latency percentiles (p95/p99)
2) Logs
Discrete events (text or JSON) that explain “what happened.”
- sshd authentication attempts
- sudo commands
- nginx access/error logs
- application exceptions
3) Traces
Request-level visibility across services (web → API → DB), useful in microservices and distributed systems.
Monitoring vs Observability in one example
- Monitoring: “The API is slow.”
- Observability: “The API is slow because DB query
SELECT ...regressed and saturates storage IO, causing request timeouts.”
Bottom line: observability answers “What’s happening?” and “Where is the bottleneck?”—even when you didn’t predict the failure ahead of time.
SIEM: Security Information and Event Management
SIEM is a platform (and a discipline) that collects security-relevant data, normalizes it, correlates events, and helps you detect and investigate threats. Think: “logs and signals, but security-aware.”
What a SIEM typically does
- Ingest: collect logs/events from servers, endpoints, network devices, identity systems
- Normalize: parse data into searchable fields (user, IP, action, host, process, etc.)
- Correlate: connect related events across systems and time
- Detect: rules, signatures, anomaly logic, threat intel enrichment
- Investigate: fast search, timelines, pivoting, context gathering
- Retain: keep evidence for compliance and forensics
SIEM vs “a log server”
- Log server (e.g., rsyslog storing files): “Here are the logs.”
- SIEM: “Here are the logs, correlated into suspicious behavior with context, prioritization, and investigation workflows.”
Bottom line: SIEM answers “What looks like a security issue in our data?”
SOC: Security Operations Center
SOC is not a product. It’s a function: a team + processes + tools that monitor security, triage alerts, and respond to incidents (sometimes 24/7, sometimes business-hours, depending on maturity).
What a SOC does in practice
- Continuously reviews and triages security alerts
- Validates what is real vs noise (false positives)
- Escalates and coordinates response
- Maintains playbooks and runbooks
- Supports threat hunting and proactive improvements
Simple mental model
- SIEM = the data brain and alert engine
- SOC = the people and process that decide and act
Bottom line: SOC answers “Who handles it, how, and with what process?”
Event vs Alert vs Incident (don’t mix these up)
This is where a lot of confusion happens, so let’s be strict with definitions:
Event
An event is anything that happened. Most events are normal.
- User logged in
- Package updated
- SSH key added
- HTTP 404 served
Alert
An alert is an event (or pattern) that crossed a rule/threshold and deserves attention.
- “10 failed SSH logins in 2 minutes”
- “New admin user created”
- “Process executed from
/tmp”
Incident
An incident is when you have a confirmed or high-likelihood security problem that impacts (or threatens) confidentiality, integrity, or availability—or violates policy.
- Compromised account
- Data exfiltration
- Ransomware activity
- Unauthorized persistence (cron backdoor, SSH key drop, webshell)
Bottom line: events are common, alerts are filtered, incidents are real (or treated as real until proven otherwise).
A Linux-centric scenario: SSH brute force → security alert → incident
Let’s connect the terms in a real workflow.
1) The system emits events (logs)
On a typical Linux server, SSH authentication events are recorded in:
/var/log/auth.log(Debian/Ubuntu)/var/log/secure(RHEL/Rocky/Alma)- or via systemd journal
2) Monitoring may only see symptoms
Monitoring might detect:
- Higher CPU load
- Spike in network traffic
- Service latency
That’s helpful, but it’s not a security conclusion. It’s “something changed.”
3) SIEM detects a pattern with context
A SIEM (or a SIEM-like stack) can flag:
- Many failed logins from one IP
- Attempts against many usernames (“dictionary” behavior)
- Successful login after repeated failures
- New SSH key added right after login
Now you have a security alert that’s richer than “CPU is high.”
4) SOC triages and responds
A SOC analyst would typically:
- Confirm the activity (false positive vs real)
- Assess scope (which host, which user, which source IPs)
- Decide actions (block, reset, isolate, collect evidence)
- Escalate if needed
If there’s a confirmed compromise or strong evidence, this becomes an incident.
Hands-on: quick Linux commands to feel the difference
“Monitoring-ish” checks
uptime
free -h
df -h
systemctl status ssh
ss -tulpn | head
“Observability-ish” checks (dig into the story)
# Debian/Ubuntu (auth)
sudo tail -n 100 /var/log/auth.log
# RHEL/Rocky/Alma (secure)
sudo tail -n 100 /var/log/secure
# systemd journal (portable)
sudo journalctl -u ssh --since "1 hour ago"
# quick filter for failed SSH attempts
sudo journalctl -u ssh --since "2 hours ago" | grep -i "failed"
Basic triage pivots (when you suspect compromise)
# Who logged in recently?
last -a | head -n 30
# SSH keys changed?
sudo find /home -maxdepth 3 -type f -name "authorized_keys" -ls 2>/dev/null
# Suspicious cron changes?
sudo ls -la /etc/cron.* /var/spool/cron 2>/dev/null
# Unexpected listening services?
sudo ss -tulpn
Note: These commands are not a full incident response guide—they’re a “first 10 minutes” set to help you build intuition.
How to choose the right tool for the job
Here’s a simple decision map:
- If you need uptime, thresholds, and capacity alerts → Monitoring (Zabbix/Prometheus)
- If you need to debug complex behavior quickly → Observability (metrics + logs + traces)
- If you need security detection, correlation, investigation → SIEM (Wazuh/Elastic/OpenSearch-based stacks)
- If you need consistent handling of security alerts with playbooks → SOC (people + process + tooling)
In real environments, you almost always use all of them—because availability and security are deeply connected.
Key takeaways
- Monitoring tells you something is wrong.
- Observability helps you understand what and why.
- SIEM turns raw events into security detection and investigation.
- SOC is the operational function that handles alerts and incidents.
- Incident is when a security issue becomes real enough to respond formally.
Next step (if you’re building a small lab)
A great learning path is to build a small telemetry pipeline:
- Metrics: node exporter → Prometheus → Grafana
- Logs: rsyslog → central storage + parsing
- Security: Wazuh (or SIEM stack) → correlation → dashboards + alerts
Once you understand what data is produced and how it flows, “SIEM vs monitoring” stops being a debate and becomes an architecture decision.