Skip to main content

Bring Your Own Cloud

Components

Speedscale Components

ComponentDescription
nettapNetwork tap that intercepts traffic from customer applications. It captures requests/responses flowing through the cluster for recording and replay.
forwarderReceives captured traffic from nettap and forwards it downstream to the OTel collector. Acts as a buffering/routing layer between capture and storage.
operatorKubernetes operator that manages the lifecycle of Speedscale resources (CRDs), orchestrates replay, and coordinates the other Speedscale components.

Customer Components

ComponentDescription
apps-to-captureThe customer's own application workloads whose traffic is being captured by nettap. These are the services under test.
otel-collectorAn OpenTelemetry Collector instance that receives forwarded traffic data as logs and routes it to both short-term and long-term storage backends. Examples include FluentBit, OTelCollector, Datadog agent, etc.

Customer Cloud

Short Term Storage

Hot/queryable storage for recent traffic data. Feeds visualization and EMR processing. This may be the entire RRPair JSON object or a selection of fields. Recommended lifecycle is 7-30 days depending on observability/queryability needed.

CloudOptions
AWSAmazon OpenSearch, Amazon Timestream, Amazon DynamoDB, Amazon ElastiCache (Redis)
GCPCloud Bigtable, Memorystore (Redis), Firestore, Elasticsearch on GCE
AzureAzure Data Explorer (ADX), Azure Cache for Redis, Azure Cosmos DB, Azure AI Search

Long Term Storage

Cold/archival storage for full RRPair objects and EMR output. Optimized for cost and durability. Recommended lifecycle is 365 days or indefinite depending on use cases for security vs observability. One strategy may be to retain raw data for X days and keep saved artifacts ie. snapshots from EMR jobs indefinitely.

CloudOptions
AWSAmazon S3, S3 Glacier
GCPCloud Storage (Standard, Nearline, Coldline, Archive)
AzureAzure Blob Storage, Azure Data Lake Storage Gen2

Visualization

Dashboards and UI for exploring captured traffic and test results stored in short-term storage.

CloudOptions
AWSAmazon Managed Grafana, Amazon QuickSight, OpenSearch Dashboards
GCPLooker, Grafana on GKE, Google Cloud Console custom dashboards
AzureAzure Managed Grafana, Azure Monitor Workbooks, Power BI

EMR

Batch or stream processing engine that reads from short-term or long-term storage, transforms/aggregates traffic data and writes snapshots back to long-term storage.

CloudOptions
AWSAmazon EMR (Spark/Hadoop), AWS Glue, Amazon Kinesis Data Analytics, AWS Lambda
GCPCloud Dataproc (Spark/Hadoop), Cloud Dataflow (Apache Beam), Cloud Functions
AzureAzure HDInsight (Spark/Hadoop), Azure Databricks, Azure Stream Analytics, Azure Synapse Analytics

Using Data Effectively

Observability

Depending on the choice of short term storage, you will want to index certain dimensions for quick lookups:

  • Time - choose a data store that can handle time series data efficiently for quick partitioning and filtering.
  • Service/Namespace/Cluster - information about where the RRPair is from.
  • Command/Location/Status - for http rrpairs this is something like "POST /endpoint 404", for non-http rrpairs this is protocol dependent.
  • Direction - always a simple IN/OUT relative to the service being captured.
  • Long Term Storage Location - The full RRPair JSON may be too large to index completely depending on your choice of short-term storage in which case it is useful to index where the full object can be retrieved from.

Using Traffic for Tests/Mocks

Speedscale's native processes operate on a 'raw' file usually named raw.jsonl. This is a newline delimited file of JSON objects with no particular guarantees about ordering or grouping.

In order to use raw traffic there are two main patterns that can be implemented:

  • Use short-term storage as a data source: Construct a raw file by grabbing full objects from short-term storage either directly or by using the indexed long-term storage location.
  • Use long-term storage as a data source: Point your EMR job at the long-term storage with the correct set of filters to construct output. Usually the output of an EMR job will be several chunks that need to be combined to form a working raw file.

Rehydration

It may be useful to load a slice of captured traffic that has aged out of short-term storage back in so that it can be inspected and visualized again. There may be a way to configure your collector to read from long-term and ingest into short-term again (for eg. S3 input -> FluentBit -> short term) or you may want to have a separate job that ingests into the short-term storage directly.