Introduction to Database Systems
Databases are the backbone of modern applications, storing and managing the vast amounts of data that power everything from simple websites to complex enterprise systems. Understanding database design and optimization is crucial for building scalable, efficient applications.
"Data is a precious thing and will last longer than the systems themselves." - Tim Berners-Lee
SQL vs NoSQL: Understanding the Landscape
The choice between SQL and NoSQL databases depends on your specific requirements, data structure, and scalability needs.
Aspect | SQL Databases | NoSQL Databases |
---|---|---|
Structure | Fixed schema, tables with rows and columns | Flexible schema, various data models |
ACID Properties | Full ACID compliance | Eventually consistent (varies by type) |
Scalability | Vertical scaling (scale up) | Horizontal scaling (scale out) |
Query Language | Standardized SQL | Varies by database |
Use Cases | Complex queries, transactions, reporting | Big data, real-time applications, flexible schemas |
Relational Database Design
Database Normalization
Normalization is the process of organizing data to reduce redundancy and improve data integrity.
First Normal Form (1NF)
Each table cell contains a single value, and each column contains values of the same type.
Second Normal Form (2NF)
Must be in 1NF and all non-key attributes are fully functionally dependent on the primary key.
Third Normal Form (3NF)
Must be in 2NF and no transitive functional dependencies exist.
Database Schema Design Example
-- E-commerce Database Schema CREATE DATABASE ecommerce; USE ecommerce; -- Users table CREATE TABLE users ( user_id INT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(50) UNIQUE NOT NULL, email VARCHAR(100) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, INDEX idx_email (email), INDEX idx_username (username) ); -- Categories table CREATE TABLE categories ( category_id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100) NOT NULL, description TEXT, parent_id INT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (parent_id) REFERENCES categories(category_id) ON DELETE SET NULL ); -- Products table CREATE TABLE products ( product_id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10, 2) NOT NULL, category_id INT NOT NULL, stock_quantity INT DEFAULT 0, sku VARCHAR(50) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, FOREIGN KEY (category_id) REFERENCES categories(category_id) ON DELETE RESTRICT, INDEX idx_category (category_id), INDEX idx_sku (sku), INDEX idx_price (price) ); -- Orders table CREATE TABLE orders ( order_id INT PRIMARY KEY AUTO_INCREMENT, user_id INT NOT NULL, status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled') DEFAULT 'pending', total_amount DECIMAL(10, 2) NOT NULL, shipping_address TEXT NOT NULL, order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE RESTRICT, INDEX idx_user_id (user_id), INDEX idx_status (status), INDEX idx_order_date (order_date) ); -- Order items table CREATE TABLE order_items ( order_item_id INT PRIMARY KEY AUTO_INCREMENT, order_id INT NOT NULL, product_id INT NOT NULL, quantity INT NOT NULL, price_at_time DECIMAL(10, 2) NOT NULL, FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE, FOREIGN KEY (product_id) REFERENCES products(product_id) ON DELETE RESTRICT, INDEX idx_order_id (order_id), INDEX idx_product_id (product_id) );
SQL Optimization Techniques
Query Optimization
Index Optimization
Proper indexing is crucial for query performance:
-- Create composite index for common query patterns CREATE INDEX idx_product_category_price ON products(category_id, price); -- Create partial index for active products CREATE INDEX idx_active_products ON products(name) WHERE stock_quantity > 0; -- Analyze index usage EXPLAIN SELECT * FROM products WHERE category_id = 1 AND price BETWEEN 10.00 AND 50.00;
Query Performance Best Practices
-- Bad: Using SELECT * SELECT * FROM products p JOIN categories c ON p.category_id = c.category_id WHERE p.price > 100; -- Good: Select only needed columns SELECT p.name, p.price, c.name as category_name FROM products p JOIN categories c ON p.category_id = c.category_id WHERE p.price > 100; -- Bad: Using LIKE with leading wildcard SELECT * FROM products WHERE name LIKE '%phone%'; -- Good: Using full-text search SELECT * FROM products WHERE MATCH(name, description) AGAINST('phone' IN NATURAL LANGUAGE MODE); -- Optimized pagination SELECT product_id, name, price FROM products WHERE product_id > 1000 ORDER BY product_id LIMIT 20;
Advanced SQL Techniques
Window Functions
-- Calculate running totals SELECT order_date, total_amount, SUM(total_amount) OVER (ORDER BY order_date) as running_total FROM orders ORDER BY order_date; -- Rank products by sales within each category SELECT p.name, c.name as category, SUM(oi.quantity * oi.price_at_time) as total_sales, RANK() OVER (PARTITION BY c.category_id ORDER BY SUM(oi.quantity * oi.price_at_time) DESC) as sales_rank FROM products p JOIN categories c ON p.category_id = c.category_id JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.product_id, c.category_id;
NoSQL Database Types
Document Databases
Examples: MongoDB, CouchDB
Store data in documents (JSON-like). Great for content management, catalogs, and user profiles.
Graph Databases
Examples: Neo4j, Amazon Neptune
Store data as nodes and relationships. Perfect for social networks, recommendation engines.
Column-Family
Examples: Cassandra, HBase
Store data in column families. Ideal for time-series data and analytics.
Key-Value Stores
Examples: Redis, DynamoDB
Simple key-value pairs. Perfect for caching, session management, and real-time analytics.
MongoDB - Document Database
MongoDB Schema Design
Document Structure
// User document with embedded addresses { "_id": ObjectId("60d5ecb74b1b4c001f3d5e8a"), "username": "john_doe", "email": "john@example.com", "profile": { "firstName": "John", "lastName": "Doe", "avatar": "https://example.com/avatar.jpg" }, "addresses": [ { "type": "home", "street": "123 Main St", "city": "New York", "zipCode": "10001" }, { "type": "work", "street": "456 Business Ave", "city": "New York", "zipCode": "10002" } ], "preferences": { "newsletter": true, "notifications": { "email": true, "sms": false } }, "createdAt": ISODate("2021-06-25T10:30:00Z"), "lastLogin": ISODate("2021-06-26T14:22:00Z") }
MongoDB Operations
// Insert documents db.products.insertMany([ { name: "Laptop", price: 999.99, category: "Electronics", specs: { processor: "Intel i7", memory: "16GB", storage: "512GB SSD" }, tags: ["computers", "laptops", "electronics"], inStock: true }, { name: "Smartphone", price: 699.99, category: "Electronics", specs: { os: "Android", storage: "128GB", camera: "48MP" }, tags: ["phones", "mobile", "electronics"], inStock: true } ]); // Complex queries with aggregation db.products.aggregate([ // Match products in stock { $match: { inStock: true } }, // Group by category and calculate averages { $group: { _id: "$category", avgPrice: { $avg: "$price" }, count: { $sum: 1 }, maxPrice: { $max: "$price" }, minPrice: { $min: "$price" } } }, // Sort by average price { $sort: { avgPrice: -1 } } ]); // Text search db.products.createIndex({ name: "text", "specs.processor": "text" }); db.products.find({ $text: { $search: "intel laptop" } }); // Geospatial queries db.stores.createIndex({ location: "2dsphere" }); db.stores.find({ location: { $near: { $geometry: { type: "Point", coordinates: [-73.9857, 40.7484] }, $maxDistance: 1000 } } });
Redis - Key-Value Store
Redis Data Structures and Use Cases
Redis Operations
# Strings - Simple key-value pairs SET user:1001:name "John Doe" GET user:1001:name INCR page:views:homepage # Lists - Ordered collections LPUSH queue:emails "email1@example.com" RPOP queue:emails # Sets - Unique collections SADD user:1001:interests "programming" "music" "travel" SISMEMBER user:1001:interests "programming" # Sorted Sets - Ordered by score ZADD leaderboard 1500 "player1" 1200 "player2" ZRANGE leaderboard 0 -1 WITHSCORES # Hashes - Field-value pairs HSET user:1001 name "John Doe" email "john@example.com" age 30 HGETALL user:1001 # Expiration and TTL SETEX session:abc123 3600 "user_data" TTL session:abc123
Redis for Caching
// Node.js Redis caching example const redis = require('redis'); const client = redis.createClient(); // Cache-aside pattern async function getUser(userId) { const cacheKey = `user:${userId}`; // Try to get from cache first let user = await client.get(cacheKey); if (user) { return JSON.parse(user); } // If not in cache, get from database user = await database.getUser(userId); // Store in cache with expiration await client.setex(cacheKey, 3600, JSON.stringify(user)); return user; } // Write-through pattern async function updateUser(userId, userData) { const cacheKey = `user:${userId}`; // Update database const updatedUser = await database.updateUser(userId, userData); // Update cache await client.setex(cacheKey, 3600, JSON.stringify(updatedUser)); return updatedUser; }
Database Performance Optimization
Query Optimization Strategies
Indexing Strategy
- Create indexes on frequently queried columns
- Use composite indexes for multi-column queries
- Monitor and remove unused indexes
- Consider partial indexes for filtered queries
Query Design
- Select only necessary columns
- Use LIMIT for pagination
- Avoid N+1 query problems
- Use EXISTS instead of IN for subqueries
Database Tuning
- Optimize buffer pool size
- Configure appropriate isolation levels
- Monitor and optimize slow queries
- Regular maintenance and statistics updates
Connection Pooling
// Node.js connection pooling example const mysql = require('mysql2/promise'); const pool = mysql.createPool({ host: 'localhost', user: 'your_username', password: 'your_password', database: 'your_database', waitForConnections: true, connectionLimit: 10, queueLimit: 0, acquireTimeout: 60000, timeout: 60000 }); // Using the pool async function getProducts(categoryId) { const connection = await pool.getConnection(); try { const [rows] = await connection.execute( 'SELECT * FROM products WHERE category_id = ?', [categoryId] ); return rows; } finally { connection.release(); } }
Database Scaling Strategies
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up)
- Increase CPU, RAM, or storage
- Simpler to implement
- Limited by hardware constraints
- Single point of failure
Horizontal Scaling (Scale Out)
- Add more database servers
- Better fault tolerance
- More complex to implement
- Requires data partitioning
Read Replicas and Sharding
-- Read replica configuration (MySQL) -- On master server CREATE USER 'replica_user'@'%' IDENTIFIED BY 'password'; GRANT REPLICATION SLAVE ON *.* TO 'replica_user'@'%'; FLUSH PRIVILEGES; -- On replica server CHANGE MASTER TO MASTER_HOST='master_server_ip', MASTER_USER='replica_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=0; START SLAVE; -- Application-level read/write splitting // Write operations go to master const writeResult = await masterDB.execute( 'INSERT INTO users (name, email) VALUES (?, ?)', [name, email] ); // Read operations go to replica const readResult = await replicaDB.execute( 'SELECT * FROM users WHERE id = ?', [userId] );
Data Modeling Best Practices
SQL Data Modeling
Relationship Design
- One-to-One: User → Profile (separate tables for optional data)
- One-to-Many: Category → Products (foreign key in products table)
- Many-to-Many: Users ↔ Roles (junction table with user_id and role_id)
NoSQL Data Modeling
Embedding vs Referencing
// Embedding - Good for data accessed together { "user_id": "123", "name": "John Doe", "orders": [ { "order_id": "ord_001", "date": "2023-06-15", "items": [ { "product": "Laptop", "price": 999.99 } ] } ] } // Referencing - Good for data accessed independently // Users collection { "user_id": "123", "name": "John Doe" } // Orders collection { "order_id": "ord_001", "user_id": "123", "date": "2023-06-15", "total": 999.99 }
Database Security
Security Best Practices
Access Control
- Use principle of least privilege
- Create specific database users for applications
- Implement role-based access control
- Regular access audits
Data Encryption
- Encrypt data at rest
- Use TLS for data in transit
- Encrypt sensitive columns
- Secure key management
SQL Injection Prevention
- Use parameterized queries
- Input validation and sanitization
- Stored procedures when appropriate
- Regular security testing
Backup and Recovery
Backup Strategies
# MySQL backup strategies # Full backup mysqldump -u root -p --all-databases --single-transaction --routines --triggers > full_backup.sql # Incremental backup using binary logs mysqlbinlog --start-datetime="2023-06-15 00:00:00" \ --stop-datetime="2023-06-15 23:59:59" \ mysql-bin.000001 > incremental_backup.sql # Point-in-time recovery mysql -u root -p < full_backup.sql mysqlbinlog --stop-datetime="2023-06-15 14:30:00" mysql-bin.000001 | mysql -u root -p # MongoDB backup mongodump --db ecommerce --out /backup/mongodb/ mongorestore --db ecommerce /backup/mongodb/ecommerce/
Modern Database Trends
Cloud Databases
Managed database services like AWS RDS, Google Cloud SQL, Azure Database offer automatic scaling, backups, and maintenance.
NewSQL Databases
Combine ACID properties of SQL with horizontal scalability of NoSQL. Examples: CockroachDB, TiDB.
Time Series Databases
Optimized for time-stamped data. Examples: InfluxDB, TimescaleDB for IoT and monitoring applications.
Conclusion
Database design and optimization is a critical skill for building scalable applications. Whether you choose SQL or NoSQL depends on your specific requirements, but understanding both paradigms gives you the flexibility to make informed decisions.
Focus on proper data modeling, query optimization, and monitoring. As your application grows, be prepared to implement scaling strategies like read replicas, sharding, or transitioning to distributed database systems.