← Back to Topics
Topic Overview
Graph databases implement a data model where relationships between entities are stored as first-class
objects, enabling efficient traversal of connected data without multi-table joins. The property graph
model represents entities as nodes with properties and connections as typed, directed edges with their
own properties. This model addresses query patterns where relationship traversal depth and path analysis
are central operations, which relational databases handle inefficiently through repeated joins. Neo4j
implements native graph storage using index-free adjacency, enabling constant-time relationship traversal
at the cost of write performance and horizontal scaling limitations. The Cypher query language provides
pattern-matching syntax optimized for graph traversal, contrasting with SQL's set-based operations.
Students must evaluate when graph databases provide performance and modeling advantages over relational
or document stores, and identify scenarios where graph databases are inappropriate.
Student Presentation Assignments
Student 1:Graph Data Models & When to Use Them
Required Coverage:
- Must formally define the property graph model, specifying how nodes, relationships, and properties differ from relational tuples and document fields
- Must compare graph databases with relational and document models, analyzing expressiveness vs performance trade-offs with concrete query examples
- Must justify graph database selection for at least two use cases (e.g., social networks, fraud detection) by demonstrating why relational joins would be inefficient
- Must identify and explain at least three scenarios where graph databases are inappropriate, justifying why alternative models are preferable
- Must discuss trade-offs: query complexity for deep traversals, horizontal scaling limitations, and consistency model constraints
Student 2:Neo4j Architecture & Storage
Required Coverage:
- Must explain Neo4j's architecture components (transaction log, page cache, storage engine) and how data flows through the system
- Must explain index-free adjacency: how relationships are stored physically adjacent to nodes, enabling constant-time traversal
- Must analyze Neo4j's transaction model, specifying ACID guarantees and how they differ from distributed graph databases
- Must explain clustering and replication design, identifying single-writer limitations and replication lag implications
- Must identify specific scalability bottlenecks (e.g., write contention, memory requirements) with quantitative constraints where available
- Must compare Neo4j's native storage with graph databases using relational backends (e.g., ArangoDB), analyzing performance trade-offs
Student 3:Cypher Query Language
Required Coverage:
- Must explain Cypher's pattern-matching paradigm, demonstrating core constructs (MATCH, CREATE, WHERE, RETURN) with executable query examples
- Must compare Cypher pattern matching with equivalent SQL joins, analyzing conceptual differences and performance implications
- Must demonstrate variable-length path queries (e.g., *1..5) and explain traversal cost differences compared to fixed-depth queries
- Must analyze performance considerations: when pattern matching becomes expensive and how query planning differs from SQL
- Must identify at least three query anti-patterns (e.g., unbounded traversals, missing constraints) and explain optimization strategies
Student 4:Graph Analytics & Real-World Applications
Required Coverage:
- Must explain at least three graph algorithms from Neo4j GDS (e.g., PageRank, shortest path, community detection), specifying their computational complexity and use cases
- Must compare analytical vs transactional workloads in graph databases, analyzing how Neo4j handles each and identifying performance trade-offs
- Must analyze a real-world application (e.g., LinkedIn connections, recommendation engines), explaining how graph structure enables the use case
- Must explain integration patterns with ML or recommendation systems, specifying how graph data feeds into ML pipelines
- Must evaluate production strengths and weaknesses: when graph analytics outperform alternatives and when they do not
- Must justify when to use graph analytics vs matrix-based or relational approaches for similar problems
Presentation Requirements
All presentations must be 17–20 minutes in duration and include the following components:
- Problem Context: What problem this technology solves and why traditional databases struggle
- Core Concepts: Clear explanation with correct technical terminology
- System Details: How it works in practice with concrete examples
- Trade-offs: Strengths, limitations, and when it is appropriate vs not appropriate
- Real-World Perspective: At least one realistic application scenario and production considerations
Note: Presentations that only summarize definitions, list features, or copy diagrams without
interpretation will receive low marks. Each presentation must demonstrate analytical reasoning through
comparisons, trade-off analysis, and justification of design decisions. Reading slides verbatim or
presenting material that could be satisfied by reading documentation will be penalized.
Report Requirement: In addition to the presentation, each student must submit an individual PDF report.
See Seminar Report Requirements for format, content, and submission details.
Evaluation Criteria
| Criterion |
Weight |
Description |
| Technical Correctness |
30% |
Accuracy of technical content, correct use of terminology, absence of errors |
| Depth of Understanding |
25% |
Goes beyond surface-level definitions, demonstrates system-level comprehension |
| Clarity and Structure |
20% |
Logical flow, clear explanations, appropriate use of examples and visuals |
| Use of Examples and Trade-offs |
15% |
Concrete examples, discussion of limitations, comparison with alternatives |
| Slide Quality and Time Management |
10% |
Professional formatting, appropriate pacing, stays within time limit |
Recommended References
Books:
- Robinson, Ian et al. Graph Databases. O'Reilly Media, 2015.
- Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly Media, 2017. (Chapter 2: Graph-Like Data Models)
Documentation:
Academic / Technical:
- Survey papers on graph data management (SIGMOD, VLDB proceedings)
- Neo4j Graph Data Science algorithms reference and performance guides