Metering Design Principles
One of the core design principles of a Cloud Metering Service Platform is guaranteed accuracy.
The metering system needs to be built from the ground up with the core design principles of accuracy, idempotency, and data deduplication.
The data must be completely accurate to allow the metering system to be the single source of truth for metering and consumption data. This means other systems can depend on the accuracy and completeness of the metering data in Amberflo for mission-critical functions like invoicing and billing.
A record is processed once, and once only.
That is, it cannot go unprocessed, it must be processed. It cannot be processed twice (or more than once). It must be processed once, and once only.
Example: If a valid record is ingested and fails mid-stream through processing, it still needs to be accounted for correctly (counted and processed once, and once only. No drops and no double counting). Amberflo Cloud Metering Service Platform provides you this guarantee out of the box.
Data deduplication is a technique for eliminating duplicate copies of repeated data. This can be a common occurrence, depending on the source generating the meters, and it is challenging to guarantee deduplication if the processing system (Cloud Metering Service Platform) itself is a highly-distributed stateless system. In a distributed system, any part of the system or node can fail at any time, and if it fails mid-stream as data is undergoing transformation or processing, it makes it very difficult to "guarantee" data de-duplication.
In cloud metering, it is common for the source (client generating the meters) to send duplicate meters. Because the cloud metering service serves as the system of record and the single source of truth for usage and consumption data, duplicate records must be deduped.
Data deduplication is a core platform tenet to deliver accurate usage data to downstream systems such as pricing, billing, and others.
Amberflo Cloud Metering Service Platform provides you with an out-of-the-box data deduplication guarantee. Here are some different ways how you can enforce data deduplication using Amberflo:
The Amberflo.io platform will not store duplicate data for the same meter record. If you call the ingest API with the same record, the meter repository will only hold the first record and discard any subsequent records. The deduplication key is based on all the meter attributes. That means if one of the meter record attributes is different (either the record time or any dimension value) we will consider it as a different record.
- Unique ID: if the source system generated a unique ID for this records we can use that for the dedup key
2. Unique timestamp: if we want to make sure we use one meter for the current hour/minute/second, we can set the timestamp of the request
Amberflo also offers the ability to define the logic used to identify duplicate events. Users can define a uniqueness criterion (by default it is the system-generated unique-id, but with the configurable logic, the user can select any dimension value to be used as the unique-id.
If Amberflo has ingested an event having a unique-id that has already been encountered, then the incoming event will be rejected.
Consider the following customer example:
Problem:
Each time a vendor ingests customer data (mainly text data from the support cases - customer cases, notes, etc.), the data is processed (using the text-blocks-processed meter) to find NLP signals. Each customer is charged for all text blocks that are processed (every comment, customer case, etc).
When onboarding customers or if something goes wrong in the data pipeline (it is an ML/analytics workflow so there are multiple potential failure points), the vendor will sometimes have to reprocess this text data for the customers (ie. re-analyze the text using the ML models). Customers should not be billed for these re-processings.
Solution:
Amberflo delivered a solution to address this by adding configurable/more granular deduplication logic. In this case, each text block processed is seen as an event with a unique block-id. The pipeline will peer into the past (using the block-id as uniqueness criterion) to check if a text block has been processed in a previous meter event. If it has, then the pipeline will throw out the duplicate event instead of counting it; thus the customer will not be billed for re-processed events.