The DevOps Revolution
DevOps has transformed how we build, deploy, and maintain software. By breaking down silos between development and operations teams, we can deliver software faster, more reliably, and with better quality through automation.
"The goal is not to become agile or do Agile, but to become agile." - Ahmed Sidky
CI/CD Pipeline Fundamentals
Continuous Integration
- Automated testing on every commit
- Code quality checks and linting
- Build artifact generation
- Fast feedback to developers
Continuous Delivery
- Automated deployment to staging
- Environment consistency
- Release readiness validation
- Manual production deployment
Continuous Deployment
- Fully automated production deployment
- Zero-downtime deployments
- Automated rollback capabilities
- Real-time monitoring and alerts
GitHub Actions CI/CD
Complete Node.js Application Pipeline
GitHub Actions Workflow
# .github/workflows/ci-cd.yml name: CI/CD Pipeline on: push: branches: [ main, develop ] pull_request: branches: [ main ] env: NODE_VERSION: '18.x' REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: test: runs-on: ubuntu-latest services: postgres: image: postgres:14 env: POSTGRES_PASSWORD: postgres POSTGRES_DB: testdb options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 5432:5432 redis: image: redis:6 options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 6379:6379 steps: - name: Checkout code uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: ${{ env.NODE_VERSION }} cache: 'npm' - name: Install dependencies run: npm ci - name: Run linter run: npm run lint - name: Run type check run: npm run type-check - name: Run unit tests run: npm run test:unit env: NODE_ENV: test DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb REDIS_URL: redis://localhost:6379 - name: Run integration tests run: npm run test:integration env: NODE_ENV: test DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb REDIS_URL: redis://localhost:6379 - name: Run E2E tests run: npm run test:e2e env: NODE_ENV: test DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb - name: Generate test coverage run: npm run test:coverage - name: Upload coverage to Codecov uses: codecov/codecov-action@v3 with: token: ${{ secrets.CODECOV_TOKEN }} file: ./coverage/lcov.info - name: Build application run: npm run build - name: Upload build artifacts uses: actions/upload-artifact@v3 with: name: build-files path: dist/ security: runs-on: ubuntu-latest needs: test steps: - name: Checkout code uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: ${{ env.NODE_VERSION }} cache: 'npm' - name: Install dependencies run: npm ci - name: Run security audit run: npm audit --audit-level=moderate - name: Run dependency check uses: dependency-check/Dependency-Check_Action@main with: project: 'my-app' path: '.' format: 'ALL' - name: Run SAST scan uses: github/codeql-action/init@v2 with: languages: javascript - name: Perform CodeQL Analysis uses: github/codeql-action/analyze@v2 build-and-push: runs-on: ubuntu-latest needs: [test, security] if: github.ref == 'refs/heads/main' permissions: contents: read packages: write steps: - name: Checkout code uses: actions/checkout@v4 - name: Download build artifacts uses: actions/download-artifact@v3 with: name: build-files path: dist/ - name: Setup Docker Buildx uses: docker/setup-buildx-action@v3 - name: Log in to Container Registry uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=ref,event=pr type=sha,prefix={{branch}}- type=raw,value=latest,enable={{is_default_branch}} - name: Build and push Docker image uses: docker/build-push-action@v5 with: context: . platforms: linux/amd64,linux/arm64 push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max deploy-staging: runs-on: ubuntu-latest needs: build-and-push if: github.ref == 'refs/heads/main' environment: staging steps: - name: Checkout code uses: actions/checkout@v4 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Deploy to ECS run: | # Update ECS service with new image aws ecs update-service \ --cluster staging-cluster \ --service my-app-service \ --force-new-deployment - name: Wait for deployment run: | aws ecs wait services-stable \ --cluster staging-cluster \ --services my-app-service - name: Run health check run: | # Wait for service to be healthy for i in {1..30}; do if curl -f https://staging.myapp.com/health; then echo "Health check passed" exit 0 fi echo "Waiting for service to be healthy..." sleep 10 done echo "Health check failed" exit 1 - name: Run smoke tests run: | npm run test:smoke -- --url=https://staging.myapp.com deploy-production: runs-on: ubuntu-latest needs: deploy-staging if: github.ref == 'refs/heads/main' environment: production steps: - name: Checkout code uses: actions/checkout@v4 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Blue-Green Deployment run: | # Get current running service CURRENT_SERVICE=$(aws ecs describe-services \ --cluster production-cluster \ --services my-app-blue my-app-green \ --query 'services[?status==`ACTIVE`].serviceName' \ --output text) # Determine target service if [ "$CURRENT_SERVICE" = "my-app-blue" ]; then TARGET_SERVICE="my-app-green" else TARGET_SERVICE="my-app-blue" fi echo "Deploying to $TARGET_SERVICE" # Update target service aws ecs update-service \ --cluster production-cluster \ --service $TARGET_SERVICE \ --force-new-deployment # Wait for deployment aws ecs wait services-stable \ --cluster production-cluster \ --services $TARGET_SERVICE # Update load balancer to point to new service aws elbv2 modify-target-group \ --target-group-arn ${{ secrets.TARGET_GROUP_ARN }} \ --targets Id=$TARGET_SERVICE # Wait for health checks sleep 30 # Stop old service aws ecs update-service \ --cluster production-cluster \ --service $CURRENT_SERVICE \ --desired-count 0
Jenkins Pipeline
Declarative Pipeline for Java Application
Jenkins Declarative Pipeline
// Jenkinsfile pipeline { agent { kubernetes { yaml """ apiVersion: v1 kind: Pod spec: containers: - name: maven image: maven:3.8.6-openjdk-17 command: - cat tty: true volumeMounts: - name: docker-sock mountPath: /var/run/docker.sock - name: docker image: docker:20.10.21-dind securityContext: privileged: true volumeMounts: - name: docker-sock mountPath: /var/run/docker.sock - name: kubectl image: bitnami/kubectl:1.25 command: - cat tty: true volumes: - name: docker-sock hostPath: path: /var/run/docker.sock """ } } environment { DOCKER_REGISTRY = 'your-registry.com' IMAGE_NAME = 'my-java-app' KUBECONFIG = credentials('kubeconfig') SONAR_TOKEN = credentials('sonar-token') SLACK_WEBHOOK = credentials('slack-webhook') } parameters { choice( name: 'DEPLOY_ENV', choices: ['dev', 'staging', 'production'], description: 'Environment to deploy to' ) booleanParam( name: 'SKIP_TESTS', defaultValue: false, description: 'Skip running tests' ) string( name: 'IMAGE_TAG', defaultValue: '', description: 'Custom image tag (optional)' ) } stages { stage('Checkout') { steps { checkout scm script { env.GIT_COMMIT_SHORT = sh( script: 'git rev-parse --short HEAD', returnStdout: true ).trim() env.BUILD_TAG = params.IMAGE_TAG ?: "${env.BUILD_NUMBER}-${env.GIT_COMMIT_SHORT}" } } } stage('Build') { steps { container('maven') { sh ''' mvn clean compile -DskipTests=true mvn versions:set -DnewVersion=${BUILD_TAG} ''' } } } stage('Test') { when { not { params.SKIP_TESTS } } parallel { stage('Unit Tests') { steps { container('maven') { sh 'mvn test' } } post { always { publishTestResults( testResultsPattern: 'target/surefire-reports/*.xml', allowEmptyResults: false ) } } } stage('Integration Tests') { steps { container('maven') { sh 'mvn verify -Dtest.profile=integration' } } post { always { publishTestResults( testResultsPattern: 'target/failsafe-reports/*.xml', allowEmptyResults: false ) } } } } } stage('Code Quality') { parallel { stage('SonarQube Analysis') { steps { container('maven') { withSonarQubeEnv('SonarQube') { sh ''' mvn sonar:sonar \ -Dsonar.token=${SONAR_TOKEN} \ -Dsonar.projectKey=my-java-app \ -Dsonar.projectName="My Java App" ''' } } } } stage('Security Scan') { steps { container('maven') { sh 'mvn org.owasp:dependency-check-maven:check' } } post { always { publishHTML([ allowMissing: false, alwaysLinkToLastBuild: true, keepAll: true, reportDir: 'target', reportFiles: 'dependency-check-report.html', reportName: 'OWASP Dependency Check' ]) } } } } } stage('Quality Gate') { steps { timeout(time: 5, unit: 'MINUTES') { waitForQualityGate abortPipeline: true } } } stage('Package') { steps { container('maven') { sh 'mvn package -DskipTests=true' } container('docker') { script { def image = docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG}") docker.withRegistry("https://${DOCKER_REGISTRY}", 'docker-registry-credentials') { image.push() image.push('latest') } } } } } stage('Deploy') { when { anyOf { branch 'main' branch 'develop' expression { params.DEPLOY_ENV != null } } } steps { container('kubectl') { script { def deployEnv = params.DEPLOY_ENV ?: (env.BRANCH_NAME == 'main' ? 'production' : 'staging') sh """ # Update deployment image kubectl set image deployment/my-java-app \ my-java-app=${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG} \ -n ${deployEnv} # Wait for rollout kubectl rollout status deployment/my-java-app -n ${deployEnv} --timeout=300s # Verify deployment kubectl get pods -n ${deployEnv} -l app=my-java-app """ } } } } stage('Smoke Tests') { when { anyOf { branch 'main' branch 'develop' } } steps { container('maven') { script { def deployEnv = env.BRANCH_NAME == 'main' ? 'production' : 'staging' def baseUrl = deployEnv == 'production' ? 'https://api.myapp.com' : 'https://staging-api.myapp.com' sh """ mvn test -Dtest.profile=smoke \ -Dapi.base.url=${baseUrl} \ -Dtest.timeout=60 """ } } } } } post { always { // Archive artifacts archiveArtifacts( artifacts: 'target/*.jar,target/dependency-check-report.html', allowEmptyArchive: true ) // Clean workspace cleanWs() } success { script { def deployEnv = params.DEPLOY_ENV ?: (env.BRANCH_NAME == 'main' ? 'production' : 'staging') slackSend( channel: '#deployments', color: 'good', message: """ ✅ Deployment Successful! • Project: ${env.JOB_NAME} • Build: ${env.BUILD_NUMBER} • Environment: ${deployEnv} • Image: ${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_TAG} • Duration: ${currentBuild.durationString} """.trim(), webhookUrl: env.SLACK_WEBHOOK ) } } failure { slackSend( channel: '#deployments', color: 'danger', message: """ ❌ Deployment Failed! • Project: ${env.JOB_NAME} • Build: ${env.BUILD_NUMBER} • Stage: ${env.STAGE_NAME} • Duration: ${currentBuild.durationString} • Logs: ${env.BUILD_URL}console """.trim(), webhookUrl: env.SLACK_WEBHOOK ) } unstable { slackSend( channel: '#deployments', color: 'warning', message: """ ⚠️ Deployment Unstable! • Project: ${env.JOB_NAME} • Build: ${env.BUILD_NUMBER} • Issues: Check test results • Logs: ${env.BUILD_URL}console """.trim(), webhookUrl: env.SLACK_WEBHOOK ) } } }
Infrastructure as Code
Terraform for AWS Infrastructure
Terraform Configuration
# terraform/main.tf terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } backend "s3" { bucket = "my-terraform-state" key = "infrastructure/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-state-lock" encrypt = true } } provider "aws" { region = var.aws_region default_tags { tags = { Environment = var.environment Project = var.project_name ManagedBy = "terraform" } } } # Variables variable "aws_region" { description = "AWS region" type = string default = "us-east-1" } variable "environment" { description = "Environment name" type = string validation { condition = contains(["dev", "staging", "production"], var.environment) error_message = "Environment must be dev, staging, or production." } } variable "project_name" { description = "Project name" type = string default = "my-app" } # Data sources data "aws_availability_zones" "available" { state = "available" } data "aws_caller_identity" "current" {} # VPC Configuration resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true tags = { Name = "${var.project_name}-${var.environment}-vpc" } } resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "${var.project_name}-${var.environment}-igw" } } resource "aws_subnet" "public" { count = 2 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 1}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] map_public_ip_on_launch = true tags = { Name = "${var.project_name}-${var.environment}-public-${count.index + 1}" Type = "public" } } resource "aws_subnet" "private" { count = 2 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 10}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] tags = { Name = "${var.project_name}-${var.environment}-private-${count.index + 1}" Type = "private" } } # NAT Gateway resource "aws_eip" "nat" { count = 2 domain = "vpc" tags = { Name = "${var.project_name}-${var.environment}-nat-eip-${count.index + 1}" } } resource "aws_nat_gateway" "main" { count = 2 allocation_id = aws_eip.nat[count.index].id subnet_id = aws_subnet.public[count.index].id tags = { Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}" } depends_on = [aws_internet_gateway.main] } # Route Tables resource "aws_route_table" "public" { vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.main.id } tags = { Name = "${var.project_name}-${var.environment}-public-rt" } } resource "aws_route_table" "private" { count = 2 vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.main[count.index].id } tags = { Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}" } } resource "aws_route_table_association" "public" { count = 2 subnet_id = aws_subnet.public[count.index].id route_table_id = aws_route_table.public.id } resource "aws_route_table_association" "private" { count = 2 subnet_id = aws_subnet.private[count.index].id route_table_id = aws_route_table.private[count.index].id } # ECS Cluster resource "aws_ecs_cluster" "main" { name = "${var.project_name}-${var.environment}" configuration { execute_command_configuration { logging = "OVERRIDE" log_configuration { cloud_watch_encryption_enabled = true cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs.name } } } setting { name = "containerInsights" value = "enabled" } tags = { Name = "${var.project_name}-${var.environment}-cluster" } } # CloudWatch Log Group resource "aws_cloudwatch_log_group" "ecs" { name = "/ecs/${var.project_name}-${var.environment}" retention_in_days = 30 tags = { Name = "${var.project_name}-${var.environment}-logs" } } # Application Load Balancer resource "aws_security_group" "alb" { name_prefix = "${var.project_name}-${var.environment}-alb-" vpc_id = aws_vpc.main.id ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } tags = { Name = "${var.project_name}-${var.environment}-alb-sg" } } resource "aws_lb" "main" { name = "${var.project_name}-${var.environment}-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = aws_subnet.public[*].id enable_deletion_protection = var.environment == "production" tags = { Name = "${var.project_name}-${var.environment}-alb" } } # Auto Scaling Group for ECS resource "aws_launch_template" "ecs" { name_prefix = "${var.project_name}-${var.environment}-" image_id = data.aws_ami.ecs.id instance_type = var.environment == "production" ? "t3.medium" : "t3.small" vpc_security_group_ids = [aws_security_group.ecs_instance.id] iam_instance_profile { name = aws_iam_instance_profile.ecs_instance.name } user_data = base64encode(templatefile("${path.module}/user-data.sh", { cluster_name = aws_ecs_cluster.main.name })) tag_specifications { resource_type = "instance" tags = { Name = "${var.project_name}-${var.environment}-ecs-instance" } } } # Outputs output "vpc_id" { description = "VPC ID" value = aws_vpc.main.id } output "cluster_name" { description = "ECS Cluster name" value = aws_ecs_cluster.main.name } output "load_balancer_dns" { description = "Load balancer DNS name" value = aws_lb.main.dns_name }
Docker and Containerization
Multi-stage Docker Build
Optimized Dockerfile
# Multi-stage Dockerfile for Node.js application FROM node:18-alpine AS base # Install security updates RUN apk update && apk upgrade && apk add --no-cache dumb-init # Create app directory WORKDIR /usr/src/app # Copy package files COPY package*.json ./ # Development stage FROM base AS development ENV NODE_ENV=development RUN npm ci --include=dev COPY . . EXPOSE 3000 CMD ["dumb-init", "npm", "run", "dev"] # Build stage FROM base AS build ENV NODE_ENV=production # Install only production dependencies RUN npm ci --only=production && npm cache clean --force # Copy source code COPY . . # Build the application RUN npm run build # Production stage FROM node:18-alpine AS production # Install security updates and create non-root user RUN apk update && apk upgrade && apk add --no-cache dumb-init && \ addgroup -g 1001 -S nodejs && \ adduser -S nextjs -u 1001 # Set working directory WORKDIR /usr/src/app # Copy built application and dependencies COPY --from=build --chown=nextjs:nodejs /usr/src/app/dist ./dist COPY --from=build --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules COPY --from=build --chown=nextjs:nodejs /usr/src/app/package*.json ./ # Create logs directory RUN mkdir -p /usr/src/app/logs && chown nextjs:nodejs /usr/src/app/logs # Switch to non-root user USER nextjs # Expose port EXPOSE 3000 # Health check HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD node health-check.js # Start application CMD ["dumb-init", "node", "dist/index.js"] # Multi-service Docker Compose # docker-compose.yml version: '3.8' services: app: build: context: . target: production ports: - "3000:3000" environment: - NODE_ENV=production - DATABASE_URL=postgresql://postgres:password@db:5432/myapp - REDIS_URL=redis://redis:6379 - JWT_SECRET=${JWT_SECRET} depends_on: db: condition: service_healthy redis: condition: service_healthy volumes: - app-logs:/usr/src/app/logs restart: unless-stopped networks: - app-network deploy: resources: limits: cpus: '0.5' memory: 512M reservations: cpus: '0.25' memory: 256M db: image: postgres:14-alpine environment: - POSTGRES_DB=myapp - POSTGRES_USER=postgres - POSTGRES_PASSWORD=password volumes: - postgres-data:/var/lib/postgresql/data - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql ports: - "5432:5432" healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 30s timeout: 10s retries: 3 restart: unless-stopped networks: - app-network redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis-data:/data command: redis-server --appendonly yes healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 30s timeout: 10s retries: 3 restart: unless-stopped networks: - app-network nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx/nginx.conf:/etc/nginx/nginx.conf - ./nginx/ssl:/etc/nginx/ssl - app-logs:/var/log/app depends_on: - app restart: unless-stopped networks: - app-network prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml - prometheus-data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' - '--storage.tsdb.retention.time=200h' - '--web.enable-lifecycle' restart: unless-stopped networks: - app-network grafana: image: grafana/grafana:latest ports: - "3001:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana-data:/var/lib/grafana - ./monitoring/grafana/provisioning:/etc/grafana/provisioning restart: unless-stopped networks: - app-network volumes: postgres-data: redis-data: app-logs: prometheus-data: grafana-data: networks: app-network: driver: bridge
Monitoring and Observability
Metrics
- Application performance metrics
- Infrastructure monitoring
- Business metrics tracking
- Custom dashboard creation
Logging
- Centralized log aggregation
- Structured logging formats
- Log correlation and tracing
- Alerting on log patterns
Tracing
- Distributed request tracing
- Performance bottleneck identification
- Service dependency mapping
- Error rate analysis
Application Monitoring Setup
// Application monitoring with Prometheus and OpenTelemetry const express = require('express'); const promClient = require('prom-client'); const { NodeSDK } = require('@opentelemetry/sdk-node'); const { Resource } = require('@opentelemetry/resources'); const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions'); // Initialize OpenTelemetry const sdk = new NodeSDK({ resource: new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: 'my-app', [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0', }), }); sdk.start(); // Prometheus metrics const register = new promClient.Registry(); // Default metrics promClient.collectDefaultMetrics({ register, prefix: 'myapp_', }); // Custom metrics const httpRequestDuration = new promClient.Histogram({ name: 'myapp_http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status_code'], buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5], registers: [register], }); const httpRequestTotal = new promClient.Counter({ name: 'myapp_http_requests_total', help: 'Total number of HTTP requests', labelNames: ['method', 'route', 'status_code'], registers: [register], }); const activeConnections = new promClient.Gauge({ name: 'myapp_active_connections', help: 'Number of active connections', registers: [register], }); // Business metrics const userRegistrations = new promClient.Counter({ name: 'myapp_user_registrations_total', help: 'Total number of user registrations', labelNames: ['source'], registers: [register], }); const orderValue = new promClient.Histogram({ name: 'myapp_order_value_dollars', help: 'Order value in dollars', buckets: [10, 50, 100, 500, 1000, 5000], registers: [register], }); // Express app const app = express(); // Middleware for metrics app.use((req, res, next) => { const start = Date.now(); activeConnections.inc(); res.on('finish', () => { const duration = (Date.now() - start) / 1000; const route = req.route ? req.route.path : req.path; httpRequestDuration .labels(req.method, route, res.statusCode) .observe(duration); httpRequestTotal .labels(req.method, route, res.statusCode) .inc(); activeConnections.dec(); }); next(); }); // Health check endpoint app.get('/health', (req, res) => { res.json({ status: 'healthy', timestamp: new Date().toISOString(), uptime: process.uptime(), version: process.env.APP_VERSION || '1.0.0', }); }); // Metrics endpoint app.get('/metrics', async (req, res) => { res.set('Content-Type', register.contentType); res.end(await register.metrics()); }); // Business logic with metrics app.post('/api/register', async (req, res) => { try { const { email, source = 'web' } = req.body; // Registration logic here await registerUser(email); // Track metric userRegistrations.labels(source).inc(); res.json({ success: true }); } catch (error) { res.status(500).json({ error: error.message }); } }); app.post('/api/orders', async (req, res) => { try { const { amount, items } = req.body; // Order processing logic const order = await createOrder({ amount, items }); // Track business metric orderValue.observe(amount); res.json(order); } catch (error) { res.status(500).json({ error: error.message }); } }); // Error handling middleware app.use((error, req, res, next) => { console.error('Unhandled error:', error); // Track error metric httpRequestTotal .labels(req.method, req.route?.path || req.path, 500) .inc(); res.status(500).json({ error: 'Internal server error' }); }); const port = process.env.PORT || 3000; app.listen(port, () => { console.log(`Server running on port ${port}`); });
Best Practices for DevOps
Security
- Secret management and rotation
- Container image scanning
- Infrastructure security scanning
- Compliance automation
Automation
- Everything as Code
- Automated testing strategies
- Self-healing systems
- Automated incident response
Culture
- Shared responsibility model
- Continuous learning
- Blameless post-mortems
- Cross-functional collaboration
Conclusion
DevOps and automation are not just about tools and technology—they're about creating a culture of collaboration, continuous improvement, and shared responsibility. The goal is to deliver value to users faster and more reliably.
Start small, automate incrementally, and always prioritize reliability and security. Remember that the best automation is the one that makes your team more effective and your systems more reliable.