Bring Your Own Cloud

Components

Speedscale Components

Component	Description
nettap	Network tap that intercepts traffic from customer applications. It captures requests/responses flowing through the cluster for recording and replay.
forwarder	Receives captured traffic from nettap and forwards it downstream to the OTel collector. Acts as a buffering/routing layer between capture and storage.
operator	Kubernetes operator that manages the lifecycle of Speedscale resources (CRDs), orchestrates replay, and coordinates the other Speedscale components.

Customer Components

Component	Description
apps-to-capture	The customer's own application workloads whose traffic is being captured by nettap. These are the services under test.
otel-collector	An OpenTelemetry Collector instance that receives forwarded traffic data as logs and routes it to both short-term and long-term storage backends. Examples include FluentBit, OTelCollector, Datadog agent, etc.

Customer Cloud

Short Term Storage

Hot/queryable storage for recent traffic data. Feeds visualization and EMR processing. This may be the entire RRPair JSON object or a selection of fields. Recommended lifecycle is 7-30 days depending on observability/queryability needed.

Cloud	Options
AWS	Amazon OpenSearch, Amazon Timestream, Amazon DynamoDB, Amazon ElastiCache (Redis)
GCP	Cloud Bigtable, Memorystore (Redis), Firestore, Elasticsearch on GCE
Azure	Azure Data Explorer (ADX), Azure Cache for Redis, Azure Cosmos DB, Azure AI Search

Long Term Storage

Cold/archival storage for full RRPair objects and EMR output. Optimized for cost and durability. Recommended lifecycle is 365 days or indefinite depending on use cases for security vs observability. One strategy may be to retain raw data for X days and keep saved artifacts ie. snapshots from EMR jobs indefinitely.

Cloud	Options
AWS	Amazon S3, S3 Glacier
GCP	Cloud Storage (Standard, Nearline, Coldline, Archive)
Azure	Azure Blob Storage, Azure Data Lake Storage Gen2

Visualization

Dashboards and UI for exploring captured traffic and test results stored in short-term storage.

Cloud	Options
AWS	Amazon Managed Grafana, Amazon QuickSight, OpenSearch Dashboards
GCP	Looker, Grafana on GKE, Google Cloud Console custom dashboards
Azure	Azure Managed Grafana, Azure Monitor Workbooks, Power BI

EMR

Batch or stream processing engine that reads from short-term or long-term storage, transforms/aggregates traffic data and writes snapshots back to long-term storage.

Cloud	Options
AWS	Amazon EMR (Spark/Hadoop), AWS Glue, Amazon Kinesis Data Analytics, AWS Lambda
GCP	Cloud Dataproc (Spark/Hadoop), Cloud Dataflow (Apache Beam), Cloud Functions
Azure	Azure HDInsight (Spark/Hadoop), Azure Databricks, Azure Stream Analytics, Azure Synapse Analytics

Using Data Effectively

Observability

Depending on the choice of short term storage, you will want to index certain dimensions for quick lookups:

Time - choose a data store that can handle time series data efficiently for quick partitioning and filtering.
Service/Namespace/Cluster - information about where the RRPair is from.
Command/Location/Status - for http rrpairs this is something like "POST /endpoint 404", for non-http rrpairs this is protocol dependent.
Direction - always a simple IN/OUT relative to the service being captured.
Long Term Storage Location - The full RRPair JSON may be too large to index completely depending on your choice of short-term storage in which case it is useful to index where the full object can be retrieved from.

Using Traffic for Tests/Mocks

Speedscale's native processes operate on a 'raw' file usually named raw.jsonl. This is a newline delimited file of JSON objects with no particular guarantees about ordering or grouping.

In order to use raw traffic there are two main patterns that can be implemented:

Use short-term storage as a data source: Construct a raw file by grabbing full objects from short-term storage either directly or by using the indexed long-term storage location.
Use long-term storage as a data source: Point your EMR job at the long-term storage with the correct set of filters to construct output. Usually the output of an EMR job will be several chunks that need to be combined to form a working raw file.

Rehydration

It may be useful to load a slice of captured traffic that has aged out of short-term storage back in so that it can be inspected and visualized again. There may be a way to configure your collector to read from long-term and ingest into short-term again (for eg. S3 input -> FluentBit -> short term) or you may want to have a separate job that ingests into the short-term storage directly.

Components​

Speedscale Components​

Customer Components​

Customer Cloud​

Short Term Storage​

Long Term Storage​

Visualization​

EMR​

Using Data Effectively​

Observability​

Using Traffic for Tests/Mocks​

Rehydration​