First — The Analogy (Understand This, You Understand Everything)
Think of the monitoring stack like a Hospital Reporting System:
Component 1 — Oracle DB Exporter
What is it in layman terms?
It is a small program that sits next to your Oracle DB, logs in, runs some SQL queries, and then stands on a door waiting for someone to ask "what did you find?"
That door = http://localhost:9161/metrics
What does it actually return?
Plain text. Like this:
oracledb_sessions_total{status="ACTIVE"} 42
oracledb_tablespace_used_percent{name="USERS"} 78.5
oracledb_up 1
oracledb_up 1 means exporter connected to Oracle successfully.
oracledb_up 0 means connection failed — first thing you check when something breaks.
SRE Level — What to verify:
# Is exporter running?
ps -ef | grep oracledb_exporter
# Can you hit its door?
curl http://localhost:9161/metrics | head -50
# Is Oracle reachable from exporter user?
sqlplus monitoring_user/password@//host:1521/SERVICE
Component 2 — DB Connection File
Layman explanation
This is the key to Oracle's front door. Without this the exporter cannot get in.
DATA_SOURCE_NAME="username/password@//hostname:1521/SERVICENAME"
Breaking it apart like a URL:
username → who is logging in
password → their password
hostname → which server Oracle is on
1521 → which port Oracle's listener is on
SERVICENAME → which database (like a room number inside the building)
Common mistakes beginners make:
| Mistake | Result |
|---|---|
| Using SID instead of Service Name | Connection fails silently |
| Wrong port | Timeout error |
| User missing V$ grants | Exporter connects but metrics show zero |
| CRLF in file (Windows line endings) | Variable corrupted |
SRE Level — Grants the monitoring user needs:
-- Minimum required grants
GRANT CREATE SESSION TO monitoring_user;
GRANT SELECT ON V_$SESSION TO monitoring_user;
GRANT SELECT ON V_$DATABASE TO monitoring_user;
GRANT SELECT ON V_$TABLESPACE TO monitoring_user;
GRANT SELECT ON V_$DATAFILE TO monitoring_user;
GRANT SELECT ON DBA_TABLESPACE_USAGE_METRICS TO monitoring_user;
GRANT SELECT ON V_$SYSSTAT TO monitoring_user;
GRANT SELECT ON V_$WAITSTAT TO monitoring_user;
Component 3 — metrics.yaml
Layman explanation
This is the SQL query book. It tells the exporter:
- "Every time someone asks for metrics, run these SQL queries"
- "Take the results and name them like this"
- "These columns become labels, this column becomes the number"
Component 4 — River File (Grafana Agent Brain)
Layman explanation
Think of it as the agent's job description written in a file. It tells the agent:
- "Go to this address every 60 seconds and collect metrics"
- "After collecting, send them to this central server"
- "Add these labels so we know which server this came from"
Full river file explained line by line:
| Setting | What it means | Recommended |
|---|---|---|
scrape_interval | How often metrics are collected | 60s for DB, 15s for critical |
scrape_timeout | How long to wait before giving up | Always less than interval |
extra_labels | Tags added to every metric | env, db_name, host, region |
remote_write url | Where to push data | Your central Prometheus |
| What | URL |
|---|---|
| Grafana Alloy (new agent) | https://grafana.com/docs/alloy/latest/ |
| River/Alloy config reference | https://grafana.com/docs/alloy/latest/reference/components/ |
| Oracle DB Exporter | https://github.com/iamseth/oracledb_exporter |
| Prometheus remote write | https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write |
| PromQL queries | https://prometheus.io/docs/prometheus/latest/querying/basics/ |
No comments:
Post a Comment