Problem Statement
Current Behavior: SevOne Data Publisher (SDP) currently sends one data point per message to Kafka/Pulsar/Event Hub publishers. While Kafka has protocol-level batching for network efficiency, consumers still receive individual JSON messages.
Example - Current Output:
Message 1: {"deviceId": 1, "objectId": 2, "value": 100, "timestamp": 1234567890}
Message 2: {"deviceId": 1, "objectId": 3, "value": 200, "timestamp": 1234567891}
Message 3: {"deviceId": 1, "objectId": 4, "value": 300, "timestamp": 1234567892}
Business Impact:
- Higher Consumer Processing Overhead: Consumers must process each message individually
- Increased Network Calls: More frequent consumer polling and processing
- Higher Kafka Partition Load: More messages = more partition metadata overhead
- Inefficient for Bulk Processing: Downstream analytics systems prefer batch processing
- Cost Impact: More messages = higher cloud messaging costs (especially Azure Event Hub)
Proposed Solution
Feature Request: Add configurable application-level batching that groups multiple data points into a single message as a JSON array.
Desired Output:
Message 1: [
{"deviceId": 1, "objectId": 2, "value": 100, "timestamp": 1234567890},
{"deviceId": 1, "objectId": 3, "value": 200, "timestamp": 1234567891},
{"deviceId": 1, "objectId": 4, "value": 300, "timestamp": 1234567892}
]
Configuration Parameters:
sdp:
enable-batching: true
batch-size: 100 # Max messages per batch (X)
batch-max-wait-ms: 1000 # Max wait time from first message (Y milliseconds)
Batching Logic:
- Collect up to X messages OR wait Y milliseconds (whichever comes first)
- Send batch as single message containing JSON array
- Apply to all publisher types: Kafka, Pulsar, Azure Event Hub
Business Benefits
- Reduced Consumer Load: Process 100 messages in one operation vs 100 separate operations
- Lower Latency: Fewer network round-trips for consumers
-
Cost Savings:
- Azure Event Hub: Charged per message - 100 data points = 1 message instead of 100
- Kafka: Reduced partition metadata overhead
- Better Analytics Performance: Bulk inserts into databases/data lakes
- Backward Compatible: Can be disabled for existing deployments
- Universal: Works across Kafka, Pulsar, and Event Hub publishers
Key Components:
- BatchProducer Wrapper: Wraps existing producers with batching logic
-
Configurable Parameters: batch-size and batch-max-wait-ms
-
Flush Triggers:
- Batch full (X messages)
- Timer expired (Y milliseconds)
- Graceful shutdown
- Error Handling: Batch-level errors propagated to all constituent messages
Use Cases
Use Case 1: High-Volume Monitoring
- Scenario: 10,000 devices × 100 metrics = 1M data points/minute
- Current: 1M Kafka messages/minute
- With Batching: 10K Kafka messages/minute (100x reduction)
- Benefit: Massive cost savings on Azure Event Hub
Use Case 2: Analytics Pipeline
- Scenario: Streaming data to Elasticsearch/Splunk
- Current: Individual inserts (slow)
- With Batching: Bulk inserts (10-100x faster)
- Benefit: Real-time dashboards with lower latency
Use Case 3: Cloud Cost Optimization
- Scenario: Azure Event Hub charges per message
- Current: $X per million messages
- With Batching: $X/100 per million data points
- Benefit: Direct cost reduction
Competitive Analysis
Industry Standard:
-
Telegraf: Supports batching with metric_batch_size and metric_buffer_limit
-
Logstash: Has batch_size and batch_delay for output plugins
- Fluentd: Supports buffering and batching
- Datadog Agent: Batches metrics before sending
SDP Gap: Currently lacks application-level batching for array-of-JSON format
Customer Impact
Priority: High
Affected Customers:
- All customers using SDP with high-volume data collection
- Customers using Azure Event Hub (cost-sensitive)
- Customers with analytics pipelines requiring bulk processing
- Customers with downstream systems that prefer batched data
Workaround Complexity: High
- Requires custom consumer-side batching logic
- Increases consumer complexity and maintenance
- No control over batch size from SDP side
Testing Requirements
Functional Testing:
- Batch fills to X messages → flushes
- Timer expires at Y ms → flushes
- Graceful shutdown → flushes remaining batch
- Error handling → all jobs notified
Performance Testing:
- Throughput: Compare batched vs non-batched
- Latency: Measure end-to-end delay
- Memory: Monitor batch buffer usage
Integration Testing:
- Kafka producer with batching
- Pulsar producer with batching
- Event Hub producer with batching
Backward Compatibility:
- Existing configs work without changes
- Batching disabled by default
Documentation Requirements
Configuration Guide:
- How to enable batching
- Parameter tuning guidelines
- Performance considerations
Migration Guide:
- Consumer changes needed (parse JSON array)
- Rollback procedure
Best Practices:
- Recommended batch sizes for different scenarios
- Latency vs throughput tradeoffs
Success Metrics
KPIs to Track:
- Message Reduction: % reduction in Kafka messages sent
- Consumer Performance: Processing time improvement
- Cost Savings: Azure Event Hub cost reduction
- Adoption Rate: % of customers enabling batching
- Customer Satisfaction: Feedback scores
Target Goals:
- 50-90% reduction in message count (depending on batch size)
- 30-50% improvement in consumer throughput
- 50-90% cost reduction for Event Hub customers