The DevOps Revolution
DevOps has transformed how we build, deploy, and maintain software. By breaking down silos between development and operations teams, we can deliver software faster, more reliably, and with better quality through automation.
"The goal is not to become agile or do Agile, but to become agile." - Ahmed Sidky
CI/CD Pipeline Fundamentals
Continuous Integration
- Automated testing on every commit
- Code quality checks and linting
- Build artifact generation
- Fast feedback to developers
Continuous Delivery
- Automated deployment to staging
- Environment consistency
- Release readiness validation
- Manual production deployment
Continuous Deployment
- Fully automated production deployment
- Zero-downtime deployments
- Automated rollback capabilities
- Real-time monitoring and alerts
GitHub Actions CI/CD
Complete Node.js Application Pipeline
GitHub Actions Workflow
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
env:
NODE_VERSION: '18.x'
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:14
env:
POSTGRES_PASSWORD: postgres
POSTGRES_DB: testdb
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
redis:
image: redis:6
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run type check
run: npm run type-check
- name: Run unit tests
run: npm run test:unit
env:
NODE_ENV: test
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
REDIS_URL: redis://localhost:6379
- name: Run integration tests
run: npm run test:integration
env:
NODE_ENV: test
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
REDIS_URL: redis://localhost:6379
- name: Run E2E tests
run: npm run test:e2e
env:
NODE_ENV: test
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
- name: Generate test coverage
run: npm run test:coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage/lcov.info
- name: Build application
run: npm run build
- name: Upload build artifacts
uses: actions/upload-artifact@v3
with:
name: build-files
path: dist/
security:
runs-on: ubuntu-latest
needs: test
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run security audit
run: npm audit --audit-level=moderate
- name: Run dependency check
uses: dependency-check/Dependency-Check_Action@main
with:
project: 'my-app'
path: '.'
format: 'ALL'
- name: Run SAST scan
uses: github/codeql-action/init@v2
with:
languages: javascript
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
build-and-push:
runs-on: ubuntu-latest
needs: [test, security]
if: github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Download build artifacts
uses: actions/download-artifact@v3
with:
name: build-files
path: dist/
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix={{branch}}-
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
runs-on: ubuntu-latest
needs: build-and-push
if: github.ref == 'refs/heads/main'
environment: staging
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Deploy to ECS
run: |
# Update ECS service with new image
aws ecs update-service \
--cluster staging-cluster \
--service my-app-service \
--force-new-deployment
- name: Wait for deployment
run: |
aws ecs wait services-stable \
--cluster staging-cluster \
--services my-app-service
- name: Run health check
run: |
# Wait for service to be healthy
for i in {1..30}; do
if curl -f https://staging.myapp.com/health; then
echo "Health check passed"
exit 0
fi
echo "Waiting for service to be healthy..."
sleep 10
done
echo "Health check failed"
exit 1
- name: Run smoke tests
run: |
npm run test:smoke -- --url=https://staging.myapp.com
deploy-production:
runs-on: ubuntu-latest
needs: deploy-staging
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Blue-Green Deployment
run: |
# Get current running service
CURRENT_SERVICE=$(aws ecs describe-services \
--cluster production-cluster \
--services my-app-blue my-app-green \
--query 'services[?status==`ACTIVE`].serviceName' \
--output text)
# Determine target service
if [ "$CURRENT_SERVICE" = "my-app-blue" ]; then
TARGET_SERVICE="my-app-green"
else
TARGET_SERVICE="my-app-blue"
fi
echo "Deploying to $TARGET_SERVICE"
# Update target service
aws ecs update-service \
--cluster production-cluster \
--service $TARGET_SERVICE \
--force-new-deployment
# Wait for deployment
aws ecs wait services-stable \
--cluster production-cluster \
--services $TARGET_SERVICE
# Update load balancer to point to new service
aws elbv2 modify-target-group \
--target-group-arn ${{ secrets.TARGET_GROUP_ARN }} \
--targets Id=$TARGET_SERVICE
# Wait for health checks
sleep 30
# Stop old service
aws ecs update-service \
--cluster production-cluster \
--service $CURRENT_SERVICE \
--desired-count 0
Jenkins Pipeline
Declarative Pipeline for Java Application
Jenkins Declarative Pipeline
// Jenkinsfile
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: maven
image: maven:3.8.6-openjdk-17
command:
- cat
tty: true
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.sock
- name: docker
image: docker:20.10.21-dind
securityContext:
privileged: true
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.sock
- name: kubectl
image: bitnami/kubectl:1.25
command:
- cat
tty: true
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
"""
}
}
environment {
DOCKER_REGISTRY = 'your-registry.com'
IMAGE_NAME = 'my-java-app'
KUBECONFIG = credentials('kubeconfig')
SONAR_TOKEN = credentials('sonar-token')
SLACK_WEBHOOK = credentials('slack-webhook')
}
parameters {
choice(
name: 'DEPLOY_ENV',
choices: ['dev', 'staging', 'production'],
description: 'Environment to deploy to'
)
booleanParam(
name: 'SKIP_TESTS',
defaultValue: false,
description: 'Skip running tests'
)
string(
name: 'IMAGE_TAG',
defaultValue: '',
description: 'Custom image tag (optional)'
)
}
stages {
stage('Checkout') {
steps {
checkout scm
script {
env.GIT_COMMIT_SHORT = sh(
script: 'git rev-parse --short HEAD',
returnStdout: true
).trim()
env.BUILD_TAG = params.IMAGE_TAG ?: "${env.BUILD_NUMBER}-${env.GIT_COMMIT_SHORT}"
}
}
}
stage('Build') {
steps {
container('maven') {
sh '''
mvn clean compile -DskipTests=true
mvn versions:set -DnewVersion=${BUILD_TAG}
'''
}
}
}
stage('Test') {
when {
not { params.SKIP_TESTS }
}
parallel {
stage('Unit Tests') {
steps {
container('maven') {
sh 'mvn test'
}
}
post {
always {
publishTestResults(
testResultsPattern: 'target/surefire-reports/*.xml',
allowEmptyResults: false
)
}
}
}
stage('Integration Tests') {
steps {
container('maven') {
sh 'mvn verify -Dtest.profile=integration'
}
}
post {
always {
publishTestResults(
testResultsPattern: 'target/failsafe-reports/*.xml',
allowEmptyResults: false
)
}
}
}
}
}
stage('Code Quality') {
parallel {
stage('SonarQube Analysis') {
steps {
container('maven') {
withSonarQubeEnv('SonarQube') {
sh '''
mvn sonar:sonar \
-Dsonar.token=${SONAR_TOKEN} \
-Dsonar.projectKey=my-java-app \
-Dsonar.projectName="My Java App"
'''
}
}
}
}
stage('Security Scan') {
steps {
container('maven') {
sh 'mvn org.owasp:dependency-check-maven:check'
}
}
post {
always {
publishHTML([
allowMissing: false,
alwaysLinkToLastBuild: true,
keepAll: true,
reportDir: 'target',
reportFiles: 'dependency-check-report.html',
reportName: 'OWASP Dependency Check'
])
}
}
}
}
}
stage('Quality Gate') {
steps {
timeout(time: 5, unit: 'MINUTES') {
waitForQualityGate abortPipeline: true
}
}
}
stage('Package') {
steps {
container('maven') {
sh 'mvn package -DskipTests=true'
}
container('docker') {
script {
def image = docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG}")
docker.withRegistry("https://${DOCKER_REGISTRY}", 'docker-registry-credentials') {
image.push()
image.push('latest')
}
}
}
}
}
stage('Deploy') {
when {
anyOf {
branch 'main'
branch 'develop'
expression { params.DEPLOY_ENV != null }
}
}
steps {
container('kubectl') {
script {
def deployEnv = params.DEPLOY_ENV ?: (env.BRANCH_NAME == 'main' ? 'production' : 'staging')
sh """
# Update deployment image
kubectl set image deployment/my-java-app \
my-java-app=${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG} \
-n ${deployEnv}
# Wait for rollout
kubectl rollout status deployment/my-java-app -n ${deployEnv} --timeout=300s
# Verify deployment
kubectl get pods -n ${deployEnv} -l app=my-java-app
"""
}
}
}
}
stage('Smoke Tests') {
when {
anyOf {
branch 'main'
branch 'develop'
}
}
steps {
container('maven') {
script {
def deployEnv = env.BRANCH_NAME == 'main' ? 'production' : 'staging'
def baseUrl = deployEnv == 'production' ?
'https://api.myapp.com' :
'https://staging-api.myapp.com'
sh """
mvn test -Dtest.profile=smoke \
-Dapi.base.url=${baseUrl} \
-Dtest.timeout=60
"""
}
}
}
}
}
post {
always {
// Archive artifacts
archiveArtifacts(
artifacts: 'target/*.jar,target/dependency-check-report.html',
allowEmptyArchive: true
)
// Clean workspace
cleanWs()
}
success {
script {
def deployEnv = params.DEPLOY_ENV ?: (env.BRANCH_NAME == 'main' ? 'production' : 'staging')
slackSend(
channel: '#deployments',
color: 'good',
message: """
✅ Deployment Successful!
• Project: ${env.JOB_NAME}
• Build: ${env.BUILD_NUMBER}
• Environment: ${deployEnv}
• Image: ${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG}
• Duration: ${currentBuild.durationString}
""".trim(),
webhookUrl: env.SLACK_WEBHOOK
)
}
}
failure {
slackSend(
channel: '#deployments',
color: 'danger',
message: """
❌ Deployment Failed!
• Project: ${env.JOB_NAME}
• Build: ${env.BUILD_NUMBER}
• Stage: ${env.STAGE_NAME}
• Duration: ${currentBuild.durationString}
• Logs: ${env.BUILD_URL}console
""".trim(),
webhookUrl: env.SLACK_WEBHOOK
)
}
unstable {
slackSend(
channel: '#deployments',
color: 'warning',
message: """
⚠️ Deployment Unstable!
• Project: ${env.JOB_NAME}
• Build: ${env.BUILD_NUMBER}
• Issues: Check test results
• Logs: ${env.BUILD_URL}console
""".trim(),
webhookUrl: env.SLACK_WEBHOOK
)
}
}
}
Infrastructure as Code
Terraform for AWS Infrastructure
Terraform Configuration
# terraform/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
}
}
}
# Variables
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "project_name" {
description = "Project name"
type = string
default = "my-app"
}
# Data sources
data "aws_availability_zones" "available" {
state = "available"
}
data "aws_caller_identity" "current" {}
# VPC Configuration
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-${var.environment}-vpc"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-${var.environment}-igw"
}
}
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-${var.environment}-public-${count.index + 1}"
Type = "public"
}
}
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 10}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "${var.project_name}-${var.environment}-private-${count.index + 1}"
Type = "private"
}
}
# NAT Gateway
resource "aws_eip" "nat" {
count = 2
domain = "vpc"
tags = {
Name = "${var.project_name}-${var.environment}-nat-eip-${count.index + 1}"
}
}
resource "aws_nat_gateway" "main" {
count = 2
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}"
}
depends_on = [aws_internet_gateway.main]
}
# Route Tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-${var.environment}-public-rt"
}
}
resource "aws_route_table" "private" {
count = 2
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}"
}
}
resource "aws_route_table_association" "public" {
count = 2
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = 2
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "${var.project_name}-${var.environment}"
configuration {
execute_command_configuration {
logging = "OVERRIDE"
log_configuration {
cloud_watch_encryption_enabled = true
cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs.name
}
}
}
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Name = "${var.project_name}-${var.environment}-cluster"
}
}
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "ecs" {
name = "/ecs/${var.project_name}-${var.environment}"
retention_in_days = 30
tags = {
Name = "${var.project_name}-${var.environment}-logs"
}
}
# Application Load Balancer
resource "aws_security_group" "alb" {
name_prefix = "${var.project_name}-${var.environment}-alb-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-alb-sg"
}
}
resource "aws_lb" "main" {
name = "${var.project_name}-${var.environment}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = aws_subnet.public[*].id
enable_deletion_protection = var.environment == "production"
tags = {
Name = "${var.project_name}-${var.environment}-alb"
}
}
# Auto Scaling Group for ECS
resource "aws_launch_template" "ecs" {
name_prefix = "${var.project_name}-${var.environment}-"
image_id = data.aws_ami.ecs.id
instance_type = var.environment == "production" ? "t3.medium" : "t3.small"
vpc_security_group_ids = [aws_security_group.ecs_instance.id]
iam_instance_profile {
name = aws_iam_instance_profile.ecs_instance.name
}
user_data = base64encode(templatefile("${path.module}/user-data.sh", {
cluster_name = aws_ecs_cluster.main.name
}))
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.project_name}-${var.environment}-ecs-instance"
}
}
}
# Outputs
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
output "cluster_name" {
description = "ECS Cluster name"
value = aws_ecs_cluster.main.name
}
output "load_balancer_dns" {
description = "Load balancer DNS name"
value = aws_lb.main.dns_name
}
Docker and Containerization
Multi-stage Docker Build
Optimized Dockerfile
# Multi-stage Dockerfile for Node.js application
FROM node:18-alpine AS base
# Install security updates
RUN apk update && apk upgrade && apk add --no-cache dumb-init
# Create app directory
WORKDIR /usr/src/app
# Copy package files
COPY package*.json ./
# Development stage
FROM base AS development
ENV NODE_ENV=development
RUN npm ci --include=dev
COPY . .
EXPOSE 3000
CMD ["dumb-init", "npm", "run", "dev"]
# Build stage
FROM base AS build
ENV NODE_ENV=production
# Install only production dependencies
RUN npm ci --only=production && npm cache clean --force
# Copy source code
COPY . .
# Build the application
RUN npm run build
# Production stage
FROM node:18-alpine AS production
# Install security updates and create non-root user
RUN apk update && apk upgrade && apk add --no-cache dumb-init && \
addgroup -g 1001 -S nodejs && \
adduser -S nextjs -u 1001
# Set working directory
WORKDIR /usr/src/app
# Copy built application and dependencies
COPY --from=build --chown=nextjs:nodejs /usr/src/app/dist ./dist
COPY --from=build --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules
COPY --from=build --chown=nextjs:nodejs /usr/src/app/package*.json ./
# Create logs directory
RUN mkdir -p /usr/src/app/logs && chown nextjs:nodejs /usr/src/app/logs
# Switch to non-root user
USER nextjs
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node health-check.js
# Start application
CMD ["dumb-init", "node", "dist/index.js"]
# Multi-service Docker Compose
# docker-compose.yml
version: '3.8'
services:
app:
build:
context: .
target: production
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=postgresql://postgres:password@db:5432/myapp
- REDIS_URL=redis://redis:6379
- JWT_SECRET=${JWT_SECRET}
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- app-logs:/usr/src/app/logs
restart: unless-stopped
networks:
- app-network
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
cpus: '0.25'
memory: 256M
db:
image: postgres:14-alpine
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=password
volumes:
- postgres-data:/var/lib/postgresql/data
- ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- app-network
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
command: redis-server --appendonly yes
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
networks:
- app-network
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./nginx/ssl:/etc/nginx/ssl
- app-logs:/var/log/app
depends_on:
- app
restart: unless-stopped
networks:
- app-network
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
restart: unless-stopped
networks:
- app-network
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
- ./monitoring/grafana/provisioning:/etc/grafana/provisioning
restart: unless-stopped
networks:
- app-network
volumes:
postgres-data:
redis-data:
app-logs:
prometheus-data:
grafana-data:
networks:
app-network:
driver: bridge
Monitoring and Observability
Metrics
- Application performance metrics
- Infrastructure monitoring
- Business metrics tracking
- Custom dashboard creation
Logging
- Centralized log aggregation
- Structured logging formats
- Log correlation and tracing
- Alerting on log patterns
Tracing
- Distributed request tracing
- Performance bottleneck identification
- Service dependency mapping
- Error rate analysis
Application Monitoring Setup
// Application monitoring with Prometheus and OpenTelemetry
const express = require('express');
const promClient = require('prom-client');
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
// Initialize OpenTelemetry
const sdk = new NodeSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
[SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0',
}),
});
sdk.start();
// Prometheus metrics
const register = new promClient.Registry();
// Default metrics
promClient.collectDefaultMetrics({
register,
prefix: 'myapp_',
});
// Custom metrics
const httpRequestDuration = new promClient.Histogram({
name: 'myapp_http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5],
registers: [register],
});
const httpRequestTotal = new promClient.Counter({
name: 'myapp_http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
const activeConnections = new promClient.Gauge({
name: 'myapp_active_connections',
help: 'Number of active connections',
registers: [register],
});
// Business metrics
const userRegistrations = new promClient.Counter({
name: 'myapp_user_registrations_total',
help: 'Total number of user registrations',
labelNames: ['source'],
registers: [register],
});
const orderValue = new promClient.Histogram({
name: 'myapp_order_value_dollars',
help: 'Order value in dollars',
buckets: [10, 50, 100, 500, 1000, 5000],
registers: [register],
});
// Express app
const app = express();
// Middleware for metrics
app.use((req, res, next) => {
const start = Date.now();
activeConnections.inc();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const route = req.route ? req.route.path : req.path;
httpRequestDuration
.labels(req.method, route, res.statusCode)
.observe(duration);
httpRequestTotal
.labels(req.method, route, res.statusCode)
.inc();
activeConnections.dec();
});
next();
});
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
version: process.env.APP_VERSION || '1.0.0',
});
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
// Business logic with metrics
app.post('/api/register', async (req, res) => {
try {
const { email, source = 'web' } = req.body;
// Registration logic here
await registerUser(email);
// Track metric
userRegistrations.labels(source).inc();
res.json({ success: true });
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.post('/api/orders', async (req, res) => {
try {
const { amount, items } = req.body;
// Order processing logic
const order = await createOrder({ amount, items });
// Track business metric
orderValue.observe(amount);
res.json(order);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// Error handling middleware
app.use((error, req, res, next) => {
console.error('Unhandled error:', error);
// Track error metric
httpRequestTotal
.labels(req.method, req.route?.path || req.path, 500)
.inc();
res.status(500).json({ error: 'Internal server error' });
});
const port = process.env.PORT || 3000;
app.listen(port, () => {
console.log(`Server running on port ${port}`);
});
Best Practices for DevOps
Security
- Secret management and rotation
- Container image scanning
- Infrastructure security scanning
- Compliance automation
Automation
- Everything as Code
- Automated testing strategies
- Self-healing systems
- Automated incident response
Culture
- Shared responsibility model
- Continuous learning
- Blameless post-mortems
- Cross-functional collaboration
Conclusion
DevOps and automation are not just about tools and technology—they're about creating a culture of collaboration, continuous improvement, and shared responsibility. The goal is to deliver value to users faster and more reliably.
Start small, automate incrementally, and always prioritize reliability and security. Remember that the best automation is the one that makes your team more effective and your systems more reliable.