Topic 7: Performance, Scalability, and Reliability

Indexing, Query Optimization, Sharding, Replication, and High Availability

Topic Overview

Database performance, scalability, and reliability determine system capacity and user experience in production. Indexing strategies balance query speed against write performance and storage overhead: B-tree indexes support range queries but require maintenance, while hash indexes provide O(1) lookups but only support equality. Query optimization requires understanding execution plans, identifying full table scans, and choosing appropriate join algorithms. Scaling approaches include vertical scaling (larger machines), horizontal scaling via sharding (partitioning data across machines), and read replicas (distributing read load). High availability patterns use replication and failover to maintain service during machine failures. Students must evaluate performance bottlenecks, design scalable architectures that handle growth, and implement reliability patterns that maintain data consistency during failures.

Student Presentation Assignments

Student 1:Indexing Strategies

Required Coverage:

Must compare B-tree vs hash indexes, analyzing structure differences, use cases, and performance characteristics with quantitative examples, using execution plan reasoning, not merely listing index types
Must explain composite indexes, specifying design principles and how they optimize multi-column queries
Must explain covering indexes, specifying how they eliminate table lookups and analyzing storage trade-offs
Must evaluate index maintenance costs, quantifying write performance impact and storage overhead
Must identify at least three index misuse examples (over-indexing, wrong types, missing indexes), explaining performance consequences
Must explain index selection strategies, specifying when to create indexes and what columns to include, analyzing design constraints and query patterns, not providing generic tuning advice

Student 2:Query Performance Anti-Patterns

Required Coverage:

Must analyze SELECT * performance impact, specifying when it matters (network transfer, covering indexes) and quantifying costs
Must explain missing WHERE clauses, analyzing full table scan costs and how to identify them in execution plans, emphasizing understanding execution plans, not SQL style advice
Must explain unbounded queries, specifying pagination strategies and result set limits to prevent memory exhaustion
Must identify bad join patterns (Cartesian products, missing conditions), explaining how they cause performance degradation
Must evaluate at least one real-world performance failure (incident, postmortem, or case study), analyzing root causes and lessons learned, focusing on design constraints that led to failure
Must explain query profiling and optimization techniques, specifying methodologies for identifying bottlenecks and analyzing execution plans, focusing on methodology, not tool screenshots

Student 3:Scalability & Resource Management

Required Coverage:

Must explain connection pooling, analyzing benefits, configuration parameters, and common pitfalls (connection leaks, pool exhaustion)
Must compare thread vs async models, analyzing blocking vs non-blocking I/O trade-offs in database access, remaining conceptual and system-level, not language-specific
Must explain sharding strategies, specifying horizontal partitioning approaches and key selection criteria
Must analyze read replicas, evaluating how they scale reads, replication lag implications, and consistency trade-offs
Must identify horizontal scaling limits, specifying when scaling out becomes difficult (cross-shard queries, transactions)
Must evaluate resource management, identifying bottlenecks and analyzing design constraints, not providing generic tuning lists

Student 4:Reliability & High Availability

Required Coverage:

Must compare replication models (master-slave, master-master, multi-master), analyzing consistency guarantees and failure modes, including concrete failure scenarios, not just topology descriptions
Must evaluate failover strategies, comparing automatic vs manual approaches and analyzing detection mechanisms
Must explain backup and recovery, specifying strategies and quantifying RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
Must analyze multi-region deployments, evaluating consistency trade-offs, latency implications, and disaster recovery capabilities
Must explain designing for failure, specifying circuit breakers, retry strategies, and graceful degradation patterns
Must evaluate monitoring and alerting, specifying key metrics (latency, throughput, errors) and how they inform architectural decisions, not SRE tool tutorials

Presentation Requirements

All presentations must be 17–20 minutes in duration and include the following components:

Problem Context: What problem this technology solves and why traditional databases struggle
Core Concepts: Clear explanation with correct technical terminology
System Details: How it works in practice with concrete examples
Trade-offs: Strengths, limitations, and when it is appropriate vs not appropriate
Real-World Perspective: At least one realistic application scenario and production considerations

Note: Presentations that only summarize definitions, list features, or copy diagrams without interpretation will receive low marks. Each presentation must demonstrate analytical reasoning through comparisons, trade-off analysis, and justification of design decisions. Reading slides verbatim or presenting material that could be satisfied by reading documentation will be penalized.

Report Requirement: In addition to the presentation, each student must submit an individual PDF report. See Seminar Report Requirements for format, content, and submission details.

Evaluation Criteria

Criterion	Weight	Description
Technical Correctness	30%	Accuracy of technical content, correct use of terminology, absence of errors
Depth of Understanding	25%	Goes beyond surface-level definitions, demonstrates system-level comprehension
Clarity and Structure	20%	Logical flow, clear explanations, appropriate use of examples and visuals
Use of Examples and Trade-offs	15%	Concrete examples, discussion of limitations, comparison with alternatives
Slide Quality and Time Management	10%	Professional formatting, appropriate pacing, stays within time limit

Recommended References

Books:

Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly Media, 2017. (Chapters 3: Storage and Retrieval, 5: Replication, 6: Partitioning, 8: The Trouble with Distributed Systems, 9: Consistency and Consensus)

Documentation:

Database-specific performance tuning guides (PostgreSQL, MySQL, MongoDB)
Query optimization and indexing best practices

Academic / Technical:

Database indexing and query optimization research papers
High availability and replication case studies