Usage Metering
Overview
Metering Design Principles
7min
one of the core design principles of a cloud metering service platform is guaranteed accuracy the metering system needs to be built from the ground up with the core design principles of accuracy, idempotency, and data deduplication 1\ accuracy the data must be completely accurate to allow the metering system to be the single source of truth for metering and consumption data this means other systems can depend on the accuracy and completeness of the metering data in amberflo for mission critical functions like invoicing and billing 2\ idempotency a record is processed once, and once only that is, it cannot go unprocessed, it must be processed it cannot be processed twice (or more than once) it must be processed once, and once only example if a valid record is ingested and fails mid stream through processing, it still needs to be accounted for correctly (counted and processed once, and once only no drops and no double counting) amberflo cloud metering service platform provides you this guarantee out of the box 3\ data deduplication data deduplication is a technique for eliminating duplicate copies of repeated data this can be a common occurrence, depending on the source generating the meters, and it is challenging to guarantee deduplication if the processing system (cloud metering service platform) itself is a highly distributed stateless system in a distributed system, any part of the system or node can fail at any time, and if it fails mid stream as data is undergoing transformation or processing, it makes it very difficult to "guarantee" data de duplication in cloud metering, it is common for the source (client generating the meters) to send duplicate meters because the cloud metering service serves as the system of record and the single source of truth for usage and consumption data, duplicate records must be deduped data deduplication is a core platform tenet to deliver accurate usage data to downstream systems such as pricing, billing, and others amberflo cloud metering service platform provides you with an out of the box data deduplication guarantee here are some different ways how you can enforce data deduplication using amberflo the amberflo io platform will not store duplicate data for the same meter record if you call the ingest api with the same record, the meter repository will only hold the first record and discard any subsequent records the deduplication key is based on all the meter attributes that means if one of the meter record attributes is different (either the record time or any dimension value) we will consider it as a different record unique id if the source system generated a unique id for this records we can use that for the dedup key dimensions with unique id append({'name' 'unique id', 'value' str(uuid4())}) metering meter(options tenant, options meter name,int(options meter value), dimensions=dimensions with unique id) 2\ unique timestamp if we want to make sure we use one meter for the current hour/minute/second, we can set the timestamp of the request metering meter(options tenant, options meter name,int(options meter value), dimensions=dimensions,timestamp=str(int(round(time time() 1000)))) configurable deduplication logic amberflo also offers the ability to define the logic used to identify duplicate events users can define a uniqueness criterion (by default it is the system generated unique id, but with the configurable logic, the user can select any dimension value to be used as the unique id if amberflo has ingested an event having a unique id that has already been encountered, then the incoming event will be rejected consider the following customer example problem each time a vendor ingests customer data (mainly text data from the support cases customer cases, notes, etc ), the data is processed (using the text blocks processed meter) to find nlp signals each customer is charged for all text blocks that are processed (every comment, customer case, etc) when onboarding customers or if something goes wrong in the data pipeline (it is an ml/analytics workflow so there are multiple potential failure points), the vendor will sometimes have to reprocess this text data for the customers (ie re analyze the text using the ml models) customers should not be billed for these re processings solution amberflo delivered a solution to address this by adding configurable/more granular deduplication logic in this case, each text block processed is seen as an event with a unique block id the pipeline will peer into the past (using the block id as uniqueness criterion) to check if a text block has been processed in a previous meter event if it has, then the pipeline will throw out the duplicate event instead of counting it; thus the customer will not be billed for re processed events