Oracle DB → (someone needs to read it) → collect data
→ (store it somewhere) → visualize it
That "someone" is your monitoring stack. Let me
introduce the cast of characters.
👥 Meet the Characters
(Components)
1. 🤖 TOM Agent — "The
Field Reporter"
- Installed
on every DB host (the actual server where Oracle runs)
- It's
the boots-on-the-ground agent
- It
contains two things inside it: SQL Exporter + River File
- Think
of it as a reporter stationed at each office branch
2. 🔍 SQL Exporter — "The
Interviewer"
- Lives
inside TOM Agent
- Its
job: run SQL queries against Oracle and convert the results into a format
called metrics
- Example:
It runs SELECT count(*) FROM v$session → converts that number into a
metric called oracle_sessions_active
- It
speaks to Oracle, Oracle doesn't know it's being monitored
3. 📄 River File — "The
Traffic Controller"
- Also
lives inside TOM Agent
- It's a
configuration file (written in a language called River/Alloy syntax)
- It
tells the agent: "Hey, go collect these metrics, then send them to
THIS address"
- Think
of it as the routing instructions — what to collect, how often, where to
send
4. 📊 Metrics — "The
Data Itself"
- Not a
tool — it's the output format
- Metrics
are simple number+label combinations
- Example:
oracle_db_sessions{host="prod-db1", status="active"}
42
- This
means: "On prod-db1, there are 42 active sessions right now"
5. 🗄️ Central Server — "The
Post Office"
- All
agents from all DB hosts push their metrics here
- It's a
central collection point
- Prometheus
sits here waiting to receive or scrape data
6. 📈 Prometheus — "The
Database for Numbers"
- Stores
all the metrics with timestamps
- It's a
time-series database — meaning it remembers that at 2PM there were 42
sessions, at 3PM there were 67, etc.
- Grafana
talks to Prometheus to get historical data
- Also
handles alerting rules
7. 📉 Grafana — "The
Dashboard TV Screen"
- The
thing your team actually looks at
- Beautiful
charts, graphs, tables
- Queries
Prometheus for data and renders it visually
- Where
you set up alert notifications (Slack, email, PagerDuty, etc.)
🌊 The Complete Data Flow
+----------------------------------------------------------+
| DB HOST 1
|
|
|
|
+-----------------------------------------------+ |
| | TOM AGENT | |
| |
| |
| | +----------------+ +----------------+ |
|
| | | SQL
Exporter | |
River File | |
|
| | | | |
(config) | |
|
| | | Runs
SQL on | |
| | |
| | |
Oracle DB | | -
what SQL | |
|
| | | | | -
how often | |
|
| | |
Outputs | | -
where to | |
|
| | |
metrics | |
send | |
|
| | +-------+--------+ +----------------+ |
|
| | | | |
|
+----------+-----------------------------------+ |
| | |
| Oracle DB <--- SQL Exporter queries it |
+-------------+--------------------------------------------+
|
| PUSH metrics (HTTP)
v
+---------------------------+
| CENTRAL SERVER |
|
|
|
+---------------------+ |
| | Prometheus |
|
| | | |
| | Stores metrics | |
| | with timestamps | |
| | | |
| | Evaluates alert | |
| | rules |
|
|
+---------+-----------+ |
| | |
+------------+--------------+
|
| Query (PromQL)
v
+------------------------+
| GRAFANA |
| |
| [*] Dashboards |
| [!] Alert UI |
| [~] Visual monitoring |
+------------------------+
📄 Deep Dive: The River File
This is the brain of the TOM Agent. Let me explain it line by line
conceptually.
What River File Does:
|
1.
Define WHERE your Oracle DB is (connection string) 2.
Define WHAT to collect (point to SQL exporter config) 3.
Define HOW OFTEN to collect 4.
Define WHERE TO SEND the data (your central server) |
River File Structure (conceptual):
Hcl
|
//
================================================ // BLOCK
1: Discovery / Connection // Tell
the agent about your Oracle DB //
================================================ prometheus.exporter.oracle
"my_oracle_db" { connection_string =
"oracle://username:password@localhost:1521/ORCL" // This is like a phone number for your
Oracle DB } //
================================================ // BLOCK
2: Scrape Config // Tell
the agent HOW to collect metrics //
================================================ prometheus.scrape
"collect_oracle" { targets
= prometheus.exporter.oracle.my_oracle_db.targets forward_to =
[prometheus.remote_write.central.receiver] scrape_interval = "60s" // collect every 60 seconds scrape_timeout = "30s" // give up if Oracle doesn't respond in
30s } //
================================================ // BLOCK
3: Remote Write // Tell
the agent WHERE to send the data //
================================================ prometheus.remote_write
"central" { endpoint { url =
"http://central-server:9090/api/v1/write" // This is your central Prometheus server
address } } |
Mentor Tip: Think of River file blocks like LEGO pieces —
each block does one job, and you connect them together using the names you give
them (like my_oracle_db and central)
📋 Deep
Dive: SQL Exporter & Metrics YAML
The SQL Exporter needs a YAML file that tells it which SQL queries
to run and what to name the metrics.
YAML File Structure:
Yaml
Yaml
|
#
================================================ #
GLOBAL SETTINGS #
================================================ global: scrape_timeout: 30s # How long to wait for Oracle to
respond scrape_timeout_offset: 500ms min_interval: 0s # Minimum time between scrapes cache_size: 1000 #
================================================ #
JOBS — Each "job" = one group of related queries #
================================================ jobs: # JOB 1: Session Monitoring - job_name: oracle_sessions interval: '1m' # Run these queries every 1
minute connections: -
"oracle://user:pass@host:1521/SID" queries: # QUERY 1: Count active sessions - query: | SELECT status, COUNT(*) as session_count FROM v$session WHERE type = 'USER' GROUP BY status metrics: # Each column from your SQL becomes
a metric or a label - metric_name:
oracle_sessions_total type: gauge # gauge = current value (can
go up or down) help: "Total Oracle sessions
by status" values: [session_count] # This SQL column becomes the NUMBER labels: status: status # This SQL column becomes a LABEL
(category) # QUERY 2: Tablespace usage - query: | SELECT tablespace_name, used_space, tablespace_size, used_percent FROM dba_tablespace_usage_metrics metrics: - metric_name:
oracle_tablespace_used_percent type: gauge help: "Tablespace usage
percentage" values: [used_percent] labels: tablespace: tablespace_name # JOB 2: Performance Monitoring - job_name: oracle_performance interval: '5m' # Less frequent — every 5
minutes queries: - query: | SELECT metric_name, value FROM v$sysmetric WHERE group_id = 2 metrics: - metric_name: oracle_sysmetric type: gauge help: "Oracle system
metrics" values: [value] labels: metric_name: metric_name |
🔑 Key YAML Concepts:
🔔 Alert Mechanism — How
Alerts Work
Alerts flow through two stages:
|
Stage 1:
Prometheus detects the problem Stage 2:
Grafana/Alertmanager notifies your team |
Stage 1 — Prometheus Alert Rules:
Yaml
|
#
alert_rules.yml — lives on your Prometheus server groups: - name: oracle_alerts rules: # Alert 1: Too many active sessions - alert: OracleHighSessions expr:
oracle_sessions_total{status="ACTIVE"} > 200 for: 5m # Must be true for 5 CONTINUOUS
minutes (avoid false alarms) labels: severity: warning annotations: summary: "High session count
on {{ $labels.instance }}" description: "Sessions = {{
$value }} (threshold: 200)" # Alert 2: Tablespace almost full - alert: OracleTablespaceFull expr: oracle_tablespace_used_percent
> 85 for: 10m labels: severity: critical annotations: summary: "Tablespace {{
$labels.tablespace }} is {{ $value }}% full" description: "Free up space
immediately on {{ $labels.instance }}" # Alert 3: DB host not responding
(agent stopped pushing) - alert: OracleDBDown expr: up{job="oracle"} == 0 for: 2m labels: severity: critical annotations: summary: "Oracle DB
unreachable: {{ $labels.instance }}" |
Stage 2 — Where Alerts Go (Alertmanager):
Yaml
|
#
Alertmanager routes alerts to your team route: receiver: 'team-slack' routes: - match: severity: critical receiver: 'oncall-pagerduty' # Critical = wake someone up! - match: severity: warning receiver: 'team-slack' # Warning = Slack message is fine receivers: - name: 'team-slack' slack_configs: - channel: '#db-alerts' text: '{{ .Annotations.description
}}' - name: 'oncall-pagerduty' pagerduty_configs: - service_key: 'your-key-here' |
🗺️ The Complete Mental
Model
Here's how to think about all of this together:
EACH DB SERVER:
┌──────────────────────────────────────────────────┐
│ TOM Agent │
│
│
│ River File
answers: │
│ "Where is
Oracle?" ──► SQL Exporter connects
│
│ "What to
collect?" ──► Points to YAML
file │
│ "How
often?" ──►
scrape_interval │
│ "Where to
send?" ──► Central server
URL │
│
│
│ YAML File answers: │
│ "Which SQL
queries to run?" │
│ "What to name
each metric?" │
│ "Which columns
are values vs labels?" │
└──────────────────────────────────────────────────┘
↓ pushes
metrics every N seconds
CENTRAL SERVER:
Prometheus stores →
oracle_sessions_total{host="prod1"} 42 @timestamp
↓ Grafana
queries Prometheus
GRAFANA:
Shows charts + fires
alerts when rules are breached
River file vs dbconnection, which one really make connection to
oracle
Who Makes the Connection to Oracle?
SQL Exporter makes the actual connection. But it needs to be told
how — and that's where the two configs play different roles.
The Simple Truth
River File = ORCHESTRATOR (tells the agent what to do and where to
send)
DB Connection = CREDENTIALS (tells SQL Exporter how to reach
Oracle)
SQL Exporter = THE ONE WHO ACTUALLY CONNECTS to Oracle
Think of it Like This
Imagine you're a delivery driver (SQL Exporter)
DB Connection string = the ADDRESS of the house (where to go, door
code)
River File = your MANAGER's instructions (go collect from this
address, every 60 seconds, bring it back to HQ)
You (SQL Exporter) = the one who actually drives there and knocks
on the door
Where Each One Lives and What it Does
|
Config |
Lives In |
Purpose |
Makes DB Connection? |
|
DATA_SOURCE_NAME |
.env file or environment variable |
Oracle credentials + host + port + service |
❌ No, just stores info |
|
River File |
TOM Agent config |
Orchestrates collection and routing |
❌ No, just instructs |
|
SQL Exporter |
TOM Agent |
Reads the connection string and actually connects |
✅ YES |
The Correct Flow
.env file / Environment Variable
DATA_SOURCE_NAME="user/pass@//host:1521/SERVICE"
|
| SQL Exporter
READS this
v
SQL Exporter
|
| Uses it to
CONNECT to Oracle
v
Oracle DB <--- actual connection happens HERE
|
| Results come
back as metrics
v
River File picks up
those metrics
|
| and ships
them to Central Server
v
Prometheus