Automation & DevOps - CI/CD Pipelines & Workflow Automation

Master automation and DevOps with CI/CD pipelines, infrastructure as code, workflow automation, and best practices for streamlined development workflows.

Back to Articles

The DevOps Revolution

DevOps has transformed how we build, deploy, and maintain software. By breaking down silos between development and operations teams, we can deliver software faster, more reliably, and with better quality through automation.

"The goal is not to become agile or do Agile, but to become agile." - Ahmed Sidky

CI/CD Pipeline Fundamentals

Continuous Integration

  • Automated testing on every commit
  • Code quality checks and linting
  • Build artifact generation
  • Fast feedback to developers

Continuous Delivery

  • Automated deployment to staging
  • Environment consistency
  • Release readiness validation
  • Manual production deployment

Continuous Deployment

  • Fully automated production deployment
  • Zero-downtime deployments
  • Automated rollback capabilities
  • Real-time monitoring and alerts

GitHub Actions CI/CD

Complete Node.js Application Pipeline

GitHub Actions Workflow

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

env:
  NODE_VERSION: '18.x'
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  test:
    runs-on: ubuntu-latest
    
    services:
      postgres:
        image: postgres:14
        env:
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432
      
      redis:
        image: redis:6
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 6379:6379
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
        
    - name: Install dependencies
      run: npm ci
      
    - name: Run linter
      run: npm run lint
      
    - name: Run type check
      run: npm run type-check
      
    - name: Run unit tests
      run: npm run test:unit
      env:
        NODE_ENV: test
        DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
        REDIS_URL: redis://localhost:6379
        
    - name: Run integration tests
      run: npm run test:integration
      env:
        NODE_ENV: test
        DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
        REDIS_URL: redis://localhost:6379
        
    - name: Run E2E tests
      run: npm run test:e2e
      env:
        NODE_ENV: test
        DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
        
    - name: Generate test coverage
      run: npm run test:coverage
      
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        token: ${{ secrets.CODECOV_TOKEN }}
        file: ./coverage/lcov.info
        
    - name: Build application
      run: npm run build
      
    - name: Upload build artifacts
      uses: actions/upload-artifact@v3
      with:
        name: build-files
        path: dist/

  security:
    runs-on: ubuntu-latest
    needs: test
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
        
    - name: Install dependencies
      run: npm ci
      
    - name: Run security audit
      run: npm audit --audit-level=moderate
      
    - name: Run dependency check
      uses: dependency-check/Dependency-Check_Action@main
      with:
        project: 'my-app'
        path: '.'
        format: 'ALL'
        
    - name: Run SAST scan
      uses: github/codeql-action/init@v2
      with:
        languages: javascript
        
    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v2

  build-and-push:
    runs-on: ubuntu-latest
    needs: [test, security]
    if: github.ref == 'refs/heads/main'
    
    permissions:
      contents: read
      packages: write
      
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Download build artifacts
      uses: actions/download-artifact@v3
      with:
        name: build-files
        path: dist/
        
    - name: Setup Docker Buildx
      uses: docker/setup-buildx-action@v3
      
    - name: Log in to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
        
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-
          type=raw,value=latest,enable={{is_default_branch}}
          
    - name: Build and push Docker image
      uses: docker/build-push-action@v5
      with:
        context: .
        platforms: linux/amd64,linux/arm64
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

  deploy-staging:
    runs-on: ubuntu-latest
    needs: build-and-push
    if: github.ref == 'refs/heads/main'
    environment: staging
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1
        
    - name: Deploy to ECS
      run: |
        # Update ECS service with new image
        aws ecs update-service \
          --cluster staging-cluster \
          --service my-app-service \
          --force-new-deployment
          
    - name: Wait for deployment
      run: |
        aws ecs wait services-stable \
          --cluster staging-cluster \
          --services my-app-service
          
    - name: Run health check
      run: |
        # Wait for service to be healthy
        for i in {1..30}; do
          if curl -f https://staging.myapp.com/health; then
            echo "Health check passed"
            exit 0
          fi
          echo "Waiting for service to be healthy..."
          sleep 10
        done
        echo "Health check failed"
        exit 1
        
    - name: Run smoke tests
      run: |
        npm run test:smoke -- --url=https://staging.myapp.com

  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    if: github.ref == 'refs/heads/main'
    environment: production
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1
        
    - name: Blue-Green Deployment
      run: |
        # Get current running service
        CURRENT_SERVICE=$(aws ecs describe-services \
          --cluster production-cluster \
          --services my-app-blue my-app-green \
          --query 'services[?status==`ACTIVE`].serviceName' \
          --output text)
          
        # Determine target service
        if [ "$CURRENT_SERVICE" = "my-app-blue" ]; then
          TARGET_SERVICE="my-app-green"
        else
          TARGET_SERVICE="my-app-blue"
        fi
        
        echo "Deploying to $TARGET_SERVICE"
        
        # Update target service
        aws ecs update-service \
          --cluster production-cluster \
          --service $TARGET_SERVICE \
          --force-new-deployment
          
        # Wait for deployment
        aws ecs wait services-stable \
          --cluster production-cluster \
          --services $TARGET_SERVICE
          
        # Update load balancer to point to new service
        aws elbv2 modify-target-group \
          --target-group-arn ${{ secrets.TARGET_GROUP_ARN }} \
          --targets Id=$TARGET_SERVICE
          
        # Wait for health checks
        sleep 30
        
        # Stop old service
        aws ecs update-service \
          --cluster production-cluster \
          --service $CURRENT_SERVICE \
          --desired-count 0

Jenkins Pipeline

Declarative Pipeline for Java Application

Jenkins Declarative Pipeline

// Jenkinsfile
pipeline {
    agent {
        kubernetes {
            yaml """
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: maven
    image: maven:3.8.6-openjdk-17
    command:
    - cat
    tty: true
    volumeMounts:
    - name: docker-sock
      mountPath: /var/run/docker.sock
  - name: docker
    image: docker:20.10.21-dind
    securityContext:
      privileged: true
    volumeMounts:
    - name: docker-sock
      mountPath: /var/run/docker.sock
  - name: kubectl
    image: bitnami/kubectl:1.25
    command:
    - cat
    tty: true
  volumes:
  - name: docker-sock
    hostPath:
      path: /var/run/docker.sock
"""
        }
    }
    
    environment {
        DOCKER_REGISTRY = 'your-registry.com'
        IMAGE_NAME = 'my-java-app'
        KUBECONFIG = credentials('kubeconfig')
        SONAR_TOKEN = credentials('sonar-token')
        SLACK_WEBHOOK = credentials('slack-webhook')
    }
    
    parameters {
        choice(
            name: 'DEPLOY_ENV',
            choices: ['dev', 'staging', 'production'],
            description: 'Environment to deploy to'
        )
        booleanParam(
            name: 'SKIP_TESTS',
            defaultValue: false,
            description: 'Skip running tests'
        )
        string(
            name: 'IMAGE_TAG',
            defaultValue: '',
            description: 'Custom image tag (optional)'
        )
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
                script {
                    env.GIT_COMMIT_SHORT = sh(
                        script: 'git rev-parse --short HEAD',
                        returnStdout: true
                    ).trim()
                    
                    env.BUILD_TAG = params.IMAGE_TAG ?: "${env.BUILD_NUMBER}-${env.GIT_COMMIT_SHORT}"
                }
            }
        }
        
        stage('Build') {
            steps {
                container('maven') {
                    sh '''
                        mvn clean compile -DskipTests=true
                        mvn versions:set -DnewVersion=${BUILD_TAG}
                    '''
                }
            }
        }
        
        stage('Test') {
            when {
                not { params.SKIP_TESTS }
            }
            parallel {
                stage('Unit Tests') {
                    steps {
                        container('maven') {
                            sh 'mvn test'
                        }
                    }
                    post {
                        always {
                            publishTestResults(
                                testResultsPattern: 'target/surefire-reports/*.xml',
                                allowEmptyResults: false
                            )
                        }
                    }
                }
                
                stage('Integration Tests') {
                    steps {
                        container('maven') {
                            sh 'mvn verify -Dtest.profile=integration'
                        }
                    }
                    post {
                        always {
                            publishTestResults(
                                testResultsPattern: 'target/failsafe-reports/*.xml',
                                allowEmptyResults: false
                            )
                        }
                    }
                }
            }
        }
        
        stage('Code Quality') {
            parallel {
                stage('SonarQube Analysis') {
                    steps {
                        container('maven') {
                            withSonarQubeEnv('SonarQube') {
                                sh '''
                                    mvn sonar:sonar \
                                        -Dsonar.token=${SONAR_TOKEN} \
                                        -Dsonar.projectKey=my-java-app \
                                        -Dsonar.projectName="My Java App"
                                '''
                            }
                        }
                    }
                }
                
                stage('Security Scan') {
                    steps {
                        container('maven') {
                            sh 'mvn org.owasp:dependency-check-maven:check'
                        }
                    }
                    post {
                        always {
                            publishHTML([
                                allowMissing: false,
                                alwaysLinkToLastBuild: true,
                                keepAll: true,
                                reportDir: 'target',
                                reportFiles: 'dependency-check-report.html',
                                reportName: 'OWASP Dependency Check'
                            ])
                        }
                    }
                }
            }
        }
        
        stage('Quality Gate') {
            steps {
                timeout(time: 5, unit: 'MINUTES') {
                    waitForQualityGate abortPipeline: true
                }
            }
        }
        
        stage('Package') {
            steps {
                container('maven') {
                    sh 'mvn package -DskipTests=true'
                }
                
                container('docker') {
                    script {
                        def image = docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG}")
                        
                        docker.withRegistry("https://${DOCKER_REGISTRY}", 'docker-registry-credentials') {
                            image.push()
                            image.push('latest')
                        }
                    }
                }
            }
        }
        
        stage('Deploy') {
            when {
                anyOf {
                    branch 'main'
                    branch 'develop'
                    expression { params.DEPLOY_ENV != null }
                }
            }
            steps {
                container('kubectl') {
                    script {
                        def deployEnv = params.DEPLOY_ENV ?: (env.BRANCH_NAME == 'main' ? 'production' : 'staging')
                        
                        sh """
                            # Update deployment image
                            kubectl set image deployment/my-java-app \
                                my-java-app=${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG} \
                                -n ${deployEnv}
                            
                            # Wait for rollout
                            kubectl rollout status deployment/my-java-app -n ${deployEnv} --timeout=300s
                            
                            # Verify deployment
                            kubectl get pods -n ${deployEnv} -l app=my-java-app
                        """
                    }
                }
            }
        }
        
        stage('Smoke Tests') {
            when {
                anyOf {
                    branch 'main'
                    branch 'develop'
                }
            }
            steps {
                container('maven') {
                    script {
                        def deployEnv = env.BRANCH_NAME == 'main' ? 'production' : 'staging'
                        def baseUrl = deployEnv == 'production' ? 
                            'https://api.myapp.com' : 
                            'https://staging-api.myapp.com'
                        
                        sh """
                            mvn test -Dtest.profile=smoke \
                                -Dapi.base.url=${baseUrl} \
                                -Dtest.timeout=60
                        """
                    }
                }
            }
        }
    }
    
    post {
        always {
            // Archive artifacts
            archiveArtifacts(
                artifacts: 'target/*.jar,target/dependency-check-report.html',
                allowEmptyArchive: true
            )
            
            // Clean workspace
            cleanWs()
        }
        
        success {
            script {
                def deployEnv = params.DEPLOY_ENV ?: (env.BRANCH_NAME == 'main' ? 'production' : 'staging')
                slackSend(
                    channel: '#deployments',
                    color: 'good',
                    message: """
✅ Deployment Successful!
• Project: ${env.JOB_NAME}
• Build: ${env.BUILD_NUMBER}
• Environment: ${deployEnv}
• Image: ${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG}
• Duration: ${currentBuild.durationString}
                    """.trim(),
                    webhookUrl: env.SLACK_WEBHOOK
                )
            }
        }
        
        failure {
            slackSend(
                channel: '#deployments',
                color: 'danger',
                message: """
❌ Deployment Failed!
• Project: ${env.JOB_NAME}
• Build: ${env.BUILD_NUMBER}
• Stage: ${env.STAGE_NAME}
• Duration: ${currentBuild.durationString}
• Logs: ${env.BUILD_URL}console
                """.trim(),
                webhookUrl: env.SLACK_WEBHOOK
            )
        }
        
        unstable {
            slackSend(
                channel: '#deployments',
                color: 'warning',
                message: """
⚠️ Deployment Unstable!
• Project: ${env.JOB_NAME}
• Build: ${env.BUILD_NUMBER}
• Issues: Check test results
• Logs: ${env.BUILD_URL}console
                """.trim(),
                webhookUrl: env.SLACK_WEBHOOK
            )
        }
    }
}

Infrastructure as Code

Terraform for AWS Infrastructure

Terraform Configuration

# terraform/main.tf
terraform {
  required_version = ">= 1.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "infrastructure/terraform.tfstate"
    region = "us-east-1"
    
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  }
}

provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    }
  }
}

# Variables
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "my-app"
}

# Data sources
data "aws_availability_zones" "available" {
  state = "available"
}

data "aws_caller_identity" "current" {}

# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "${var.project_name}-${var.environment}-vpc"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.project_name}-${var.environment}-igw"
  }
}

resource "aws_subnet" "public" {
  count = 2
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 1}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
  
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.project_name}-${var.environment}-public-${count.index + 1}"
    Type = "public"
  }
}

resource "aws_subnet" "private" {
  count = 2
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 10}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
  
  tags = {
    Name = "${var.project_name}-${var.environment}-private-${count.index + 1}"
    Type = "private"
  }
}

# NAT Gateway
resource "aws_eip" "nat" {
  count = 2
  
  domain = "vpc"
  
  tags = {
    Name = "${var.project_name}-${var.environment}-nat-eip-${count.index + 1}"
  }
}

resource "aws_nat_gateway" "main" {
  count = 2
  
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  
  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}"
  }
  
  depends_on = [aws_internet_gateway.main]
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = {
    Name = "${var.project_name}-${var.environment}-public-rt"
  }
}

resource "aws_route_table" "private" {
  count = 2
  
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }
  
  tags = {
    Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}"
  }
}

resource "aws_route_table_association" "public" {
  count = 2
  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count = 2
  
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "${var.project_name}-${var.environment}"
  
  configuration {
    execute_command_configuration {
      logging = "OVERRIDE"
      
      log_configuration {
        cloud_watch_encryption_enabled = true
        cloud_watch_log_group_name     = aws_cloudwatch_log_group.ecs.name
      }
    }
  }
  
  setting {
    name  = "containerInsights"
    value = "enabled"
  }
  
  tags = {
    Name = "${var.project_name}-${var.environment}-cluster"
  }
}

# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "ecs" {
  name              = "/ecs/${var.project_name}-${var.environment}"
  retention_in_days = 30
  
  tags = {
    Name = "${var.project_name}-${var.environment}-logs"
  }
}

# Application Load Balancer
resource "aws_security_group" "alb" {
  name_prefix = "${var.project_name}-${var.environment}-alb-"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-${var.environment}-alb-sg"
  }
}

resource "aws_lb" "main" {
  name               = "${var.project_name}-${var.environment}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id
  
  enable_deletion_protection = var.environment == "production"
  
  tags = {
    Name = "${var.project_name}-${var.environment}-alb"
  }
}

# Auto Scaling Group for ECS
resource "aws_launch_template" "ecs" {
  name_prefix   = "${var.project_name}-${var.environment}-"
  image_id      = data.aws_ami.ecs.id
  instance_type = var.environment == "production" ? "t3.medium" : "t3.small"
  
  vpc_security_group_ids = [aws_security_group.ecs_instance.id]
  
  iam_instance_profile {
    name = aws_iam_instance_profile.ecs_instance.name
  }
  
  user_data = base64encode(templatefile("${path.module}/user-data.sh", {
    cluster_name = aws_ecs_cluster.main.name
  }))
  
  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "${var.project_name}-${var.environment}-ecs-instance"
    }
  }
}

# Outputs
output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.main.id
}

output "cluster_name" {
  description = "ECS Cluster name"
  value       = aws_ecs_cluster.main.name
}

output "load_balancer_dns" {
  description = "Load balancer DNS name"
  value       = aws_lb.main.dns_name
}

Docker and Containerization

Multi-stage Docker Build

Optimized Dockerfile

# Multi-stage Dockerfile for Node.js application
FROM node:18-alpine AS base

# Install security updates
RUN apk update && apk upgrade && apk add --no-cache dumb-init

# Create app directory
WORKDIR /usr/src/app

# Copy package files
COPY package*.json ./

# Development stage
FROM base AS development
ENV NODE_ENV=development
RUN npm ci --include=dev
COPY . .
EXPOSE 3000
CMD ["dumb-init", "npm", "run", "dev"]

# Build stage
FROM base AS build
ENV NODE_ENV=production

# Install only production dependencies
RUN npm ci --only=production && npm cache clean --force

# Copy source code
COPY . .

# Build the application
RUN npm run build

# Production stage
FROM node:18-alpine AS production

# Install security updates and create non-root user
RUN apk update && apk upgrade && apk add --no-cache dumb-init && \
    addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001

# Set working directory
WORKDIR /usr/src/app

# Copy built application and dependencies
COPY --from=build --chown=nextjs:nodejs /usr/src/app/dist ./dist
COPY --from=build --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules
COPY --from=build --chown=nextjs:nodejs /usr/src/app/package*.json ./

# Create logs directory
RUN mkdir -p /usr/src/app/logs && chown nextjs:nodejs /usr/src/app/logs

# Switch to non-root user
USER nextjs

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node health-check.js

# Start application
CMD ["dumb-init", "node", "dist/index.js"]

# Multi-service Docker Compose
# docker-compose.yml
version: '3.8'

services:
  app:
    build:
      context: .
      target: production
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://postgres:password@db:5432/myapp
      - REDIS_URL=redis://redis:6379
      - JWT_SECRET=${JWT_SECRET}
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - app-logs:/usr/src/app/logs
    restart: unless-stopped
    networks:
      - app-network
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M

  db:
    image: postgres:14-alpine
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    networks:
      - app-network

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
    networks:
      - app-network

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./nginx/ssl:/etc/nginx/ssl
      - app-logs:/var/log/app
    depends_on:
      - app
    restart: unless-stopped
    networks:
      - app-network

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=200h'
      - '--web.enable-lifecycle'
    restart: unless-stopped
    networks:
      - app-network

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
      - ./monitoring/grafana/provisioning:/etc/grafana/provisioning
    restart: unless-stopped
    networks:
      - app-network

volumes:
  postgres-data:
  redis-data:
  app-logs:
  prometheus-data:
  grafana-data:

networks:
  app-network:
    driver: bridge

Monitoring and Observability

Metrics

  • Application performance metrics
  • Infrastructure monitoring
  • Business metrics tracking
  • Custom dashboard creation

Logging

  • Centralized log aggregation
  • Structured logging formats
  • Log correlation and tracing
  • Alerting on log patterns

Tracing

  • Distributed request tracing
  • Performance bottleneck identification
  • Service dependency mapping
  • Error rate analysis

Application Monitoring Setup

// Application monitoring with Prometheus and OpenTelemetry
const express = require('express');
const promClient = require('prom-client');
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

// Initialize OpenTelemetry
const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0',
  }),
});

sdk.start();

// Prometheus metrics
const register = new promClient.Registry();

// Default metrics
promClient.collectDefaultMetrics({
  register,
  prefix: 'myapp_',
});

// Custom metrics
const httpRequestDuration = new promClient.Histogram({
  name: 'myapp_http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5],
  registers: [register],
});

const httpRequestTotal = new promClient.Counter({
  name: 'myapp_http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register],
});

const activeConnections = new promClient.Gauge({
  name: 'myapp_active_connections',
  help: 'Number of active connections',
  registers: [register],
});

// Business metrics
const userRegistrations = new promClient.Counter({
  name: 'myapp_user_registrations_total',
  help: 'Total number of user registrations',
  labelNames: ['source'],
  registers: [register],
});

const orderValue = new promClient.Histogram({
  name: 'myapp_order_value_dollars',
  help: 'Order value in dollars',
  buckets: [10, 50, 100, 500, 1000, 5000],
  registers: [register],
});

// Express app
const app = express();

// Middleware for metrics
app.use((req, res, next) => {
  const start = Date.now();
  
  activeConnections.inc();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const route = req.route ? req.route.path : req.path;
    
    httpRequestDuration
      .labels(req.method, route, res.statusCode)
      .observe(duration);
      
    httpRequestTotal
      .labels(req.method, route, res.statusCode)
      .inc();
      
    activeConnections.dec();
  });
  
  next();
});

// Health check endpoint
app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    version: process.env.APP_VERSION || '1.0.0',
  });
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

// Business logic with metrics
app.post('/api/register', async (req, res) => {
  try {
    const { email, source = 'web' } = req.body;
    
    // Registration logic here
    await registerUser(email);
    
    // Track metric
    userRegistrations.labels(source).inc();
    
    res.json({ success: true });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.post('/api/orders', async (req, res) => {
  try {
    const { amount, items } = req.body;
    
    // Order processing logic
    const order = await createOrder({ amount, items });
    
    // Track business metric
    orderValue.observe(amount);
    
    res.json(order);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

// Error handling middleware
app.use((error, req, res, next) => {
  console.error('Unhandled error:', error);
  
  // Track error metric
  httpRequestTotal
    .labels(req.method, req.route?.path || req.path, 500)
    .inc();
    
  res.status(500).json({ error: 'Internal server error' });
});

const port = process.env.PORT || 3000;
app.listen(port, () => {
  console.log(`Server running on port ${port}`);
});

Best Practices for DevOps

Security

  • Secret management and rotation
  • Container image scanning
  • Infrastructure security scanning
  • Compliance automation

Automation

  • Everything as Code
  • Automated testing strategies
  • Self-healing systems
  • Automated incident response

Culture

  • Shared responsibility model
  • Continuous learning
  • Blameless post-mortems
  • Cross-functional collaboration

Conclusion

DevOps and automation are not just about tools and technology—they're about creating a culture of collaboration, continuous improvement, and shared responsibility. The goal is to deliver value to users faster and more reliably.

Start small, automate incrementally, and always prioritize reliability and security. Remember that the best automation is the one that makes your team more effective and your systems more reliable.