Group 9: Real-Time Stream Processing

Focus on continuous ingestion and near real-time transformation/query of event data. Azure surfaces: Event Hubs (ingest), Kafka on HDInsight (legacy / OSS alignment), Azure Stream Analytics (managed SQL-like stream processing), and Data Explorer (Kusto) (low-latency analytics & materialized transformations). Complementary patterns often pair ingestion (EH/Kafka) with processing (ASA/Spark/Kusto) and downstream sinks (ADLS, Synapse, Databricks).

Latency Budgeting: Partition overall end-to-end SLO into ingest buffering, transformation, query serving. Choose minimal toolchain that fits within budget while preserving maintainability.

Services & Roles

Key Differences

ServicePrimary RoleStrengthsWhen to Prefer
Event HubsHigh-scale ingestionManaged partitions, protocol gatewaysMassive telemetry/firehose
Kafka (HDI)OSS-compatible brokerEcosystem pluginsStrict Kafka API parity
Stream AnalyticsDeclarative temporal queriesSQL-like windowingLow-ops rapid streaming jobs
Data ExplorerLow-latency analyticsKQL, materialized viewsAd-hoc + time-series blend

Selection Model

0–10 sliders shape emphasis. Scores are weighted linear combinations; lower need for custom code or operations can favor managed services.

Score_EventHubs = 0.24*C_ingest + 0.20*C_throughput + 0.16*C_protocol + 0.14*C_latency + 0.14*C_consumerFanout + 0.12*(10 - C_customBrokers)
Score_Kafka   = 0.26*C_customBrokers + 0.20*C_protocol + 0.18*C_throughput + 0.14*C_partitionCtrl + 0.12*C_latency + 0.10*(10 - C_opsSimp)
Score_StreamAn = 0.26*C_declTransform + 0.20*C_temporal + 0.16*C_latency + 0.14*C_operSimp + 0.14*C_sinkVar + 0.10*(10 - C_customCode)
Score_Kusto   = 0.25*C_adHoc + 0.20*C_timeSeries + 0.18*C_latency + 0.14*C_material + 0.13*C_scaleQuery + 0.10*(10 - C_customCode)
        
{{s.name}}: {{s.val | number:2}}

Interpretation

  • Event Hubs: Favor when managed ingestion scale & multi-consumer patterns dominate.
  • Kafka: Choose for custom broker plugins or strict ecosystem parity.
  • Stream Analytics: Rapid SQL-like temporal processing without managing clusters.
  • Kusto: Harmonize streaming ingest + powerful time-series & ad-hoc queries.

When NOT to Use

  • Ultra-low latency microsecond trading (consider specialized infra).
  • Simple nightly batch (big data batch tools cheaper/simpler).
  • Single event source with trivial transform (might inline in function).