A modular, six-layer framework for managing interlinked CloudRAN data using Knowledge Graphs, enabling efficient analytics and decision-making in telecommunications.
Carleton University, Ottawa · Telefonaktiebolaget LM Ericsson, Stockholm
In Cloud Radio Access Networks (CloudRAN), data originates from numerous heterogeneous sources: relational databases storing test configurations, JSON files containing execution results, text logs documenting system behavior, and metrics tracking operational performance. This diversity creates a fundamental challenge for telecommunications companies like Ericsson.
When engineers need to diagnose a test failure, they must manually traverse multiple disconnected systems. A test case resides in one database, its execution results in another, software version details in a third system, and deployment logs in yet another location. This fragmentation transforms routine investigations into hours-long data archaeology expeditions.
Ericsogate addresses this problem by constructing a unified Knowledge Graph—a semantic network where entities from disparate sources are automatically identified, linked, and made queryable through their relationships.
What previously required manual correlation across five systems now resolves through a single traversal: click a failed test, immediately access its linked software version, the engineer who executed it, the hardware configuration, and related historical failures.
A Knowledge Graph represents information as a network of entities (nodes) and their relationships (edges). Unlike traditional databases that store isolated records, Knowledge Graphs explicitly encode semantic connections.
Consider three independent data sources at Ericsson:
Ericsogate identifies that cell_id 123 and Ottawa appear across sources, then creates unified representations:
Queries can now traverse these connections: starting from a cell tower, follow edges to discover its technology, location, and demographic context—information that was previously siloed.
Ericsogate uses the Resource Description Framework (RDF) to model knowledge. Each fact becomes a triple: <subject, predicate, object>. For example:
The Knowledge Graph consists of two layers: Instance Data (specific facts about entities) and Ontology (class hierarchies and relationship schemas). The ontology defines that "Functional Test" and "Performance Test" are subclasses of "Test Case," enabling queries like "find all test cases" to automatically include both types without explicit enumeration.
Each layer addresses specific requirements: data provenance, quality assurance, entity alignment, horizontal scalability, access control, and data freshness.
Figure 1. The six-layer architecture showing data flow from ingestion through transformation, control, storage, API exposure, and application consumption.
Interfaces with heterogeneous data sources via APIs, extracting entities and properties from structured, semi-structured, and unstructured content.
Converts heterogeneous formats into standardized JSON triples (subject-predicate-object), enriched with local ontologies defining source-specific class hierarchies.
Orchestrates the data pipeline: schedules ingestion tasks, aligns entities across sources, and integrates local ontologies into a global schema.
Stores the Knowledge Graph using Apache Jena triple stores, distributed across multiple engines for horizontal scalability with federated SPARQL queries.
Exposes REST APIs returning JSON, abstracting SPARQL complexity. Provides granular access control through credential-protected endpoints.
User-facing applications leveraging the Knowledge Graph: semantic search dashboards, summarization tools, recommendation systems, and error detection.
Ericsogate addresses eight critical requirements for CloudRAN data management, each mapped to specific architectural components.
R1 - Data Provenance: Ingestion Layer embeds origin metadata from source files, recording the complete data lineage for auditability and transparency.
R2 - Data Quality: Ingestion Layer enforces validation (format, type, value specifications) and cleaning (duplicates, irregularities, missing entries) before downstream processing.
R3 - Data Heterogeneity: Transformation Layer standardizes formats into JSON triples. Control Layer performs local matching (within sources) and global matching (across sources) using entity alignment algorithms.
R4 - Schema Flexibility: RDF's schema-later approach allows fluid data addition through triples without predefined schemas, enabling rapid integration of new data structures.
R5 - Scalability: Modular data readers (source scalability) and DM Controller distribution across multiple graph engines (horizontal scalability) ensure unlimited capacity growth.
R6 - Security: API Layer generates unique, credential-protected endpoints per user or group, ensuring users access only authorized data subsets.
R7 - Data Freshness: Scheduler in Control Layer automatically requests updates from data sources at configured intervals (milliseconds to days), maintaining current state.
R8 - Cost-Effectiveness: Deployment on open-source Apache Jena eliminates licensing costs while meeting functional requirements.
Comparative analysis of Ericsogate against traditional data management systems at Ericsson.
Table 1. Performance comparison across key operational metrics
| Metric | Traditional System | Ericsogate (KG) | Improvement |
|---|---|---|---|
| Update Speed | 20H hours | H hours | 20× faster |
| Feature Development | F features/month | 10F features/month | 10× throughput |
| Navigation Depth | 2 hops | Unlimited | Infinite traversal |
| Summarization Compression | ~80% | 99.90% | 19.9 percentage points |
Note: H and F are normalization constants to preserve proprietary performance baselines.
A critical outcome is the reduction in Knowledge Graph expertise requirements. Only developers working on the Control and API layers require SPARQL knowledge. Data ingestion developers work with standard JSON, application developers consume REST APIs—dramatically reducing training overhead and enabling faster team onboarding.
Stakeholders needed to understand test execution patterns across 100,000+ nodes representing test cases, runs, software versions, and configurations. Traditional grouping methods obscured critical distinctions, bit compression optimized storage without facilitating analysis, and simplification techniques risked omitting important patterns.
Pattern-Statistics-based Method leverages the ontology structure to abstract instances by their class types. Instead of showing TestCase_1, TestCase_2, TestCase_3 individually, the system displays "TestCase" class with aggregated statistics: "3 instances, 2 functional, 1 performance, executed 4 times total." With class hierarchies, stakeholders see test type distributions, execution frequencies, and failure concentrations without raw data overload.
Achieved 99.90% compression (100K to ~100 nodes) while maintaining 94% representativeness. Stakeholders gained immediate pattern visibility: which test types dominate, execution frequency distributions, and failure concentration areas—enabling informed strategic decisions without manual analysis.
Engineers investigating test failures needed to manually query multiple disconnected systems: test case definitions in one database, execution results in another, software versions in a third, deployment logs in a fourth. This fragmentation transformed routine diagnoses into multi-hour investigations requiring specialized knowledge of each system's query interface.
Semantic Search enables meaning-based queries across linked data. Users filter by fields from multiple sources through a unified dashboard. Clicking a failed test run triggers traversal through the Knowledge Graph, instantly retrieving the linked software version, engineer who executed it, hardware configuration, and related historical failures—data previously requiring correlation across five independent queries.
Investigation time reduced from hours to seconds. Feature development accelerated 10× (F to 10F features/month) due to unified data access. Engineers now perform unlimited-depth navigation compared to the previous 2-hop limitation, enabling discovery of non-obvious relationships between test configurations and failure modes.
For use of Ericsogate framework, methodology, or applications in research.