Usage Metering
Meter Corrections (undo events)
16min
introduction there are occasions where a meter event source may send or report incorrect or erroneous meters for example, if a bug causes some system to send incorrect meter events to amberflo or if it is simply a specific event you wish to cancel see additional examples below how do you correct an invoice or bill based on inaccurate usage data? how do you undo (delete) incorrect meters? amberflo provides the ability to cancel (undo) one or more meter events as needed a metering system can only claim to be an accurate system of record for usage and consumption data if and only if it provides built in artifacts (idempotency, data deduplication) and tooling (undo ingested meters) to guarantee accuracy amberflo provides an easy to use api that you can invoke to cancel a single usage event or a large set of events the system will also take care of propagating the changes to the usage aggregations as well as to the invoicing or rewards payment systems if those are affected by the cancellation examples where meter events need to be undone (and corrected) system errors these sorts of errors are basically data problems or bugs in the account's system that caused the account to send false or inaccurate data to amberflo these errors are called system errors as their source is related to the origin system that sent the events to amberflo example let's assume that ‘smart ml’ uses amberflo for usage based pricing ‘smart ml’ sends amberflo a meter called ‘api calls’ which has the following dimensions ‘region’, ‘cluster’, and ‘host type’ now let’s assume that on 2/2/2022 at 1 00am there was a “bad deployment”, and because of some bug, all of the ‘api calls’ events do not have values for the “cluster” and “host type” dimensions on 2/15/2022 at 4pm, george, a pm with ‘smart ml’ discovers something wrong with the meters sent to amberflo, and after short investigation, ‘smart ml’ oncall reaches to the conclusion that all of the ‘api calls’ meters sent to amberflo since 2/2/2022 are invalid unintentional or unsuccessful usage unlike systematic errors which are sourced from bad data or a bug in the source system, unintentional or unsuccessful usage is a more natural occurrence that happens in the course of normal use in this scenario, a customer starts or intends to start using a resource, but then for some reason the action the customer wanted to take by using the resource is canceled, so ultimately, the resource ends up not being used example ‘rabbit io’ is a company that allows users to develop and run big data pipelines on its clusters (some sort of paas company) ‘smart ml’ is a customer of ‘rabbit io’ on 3/3/2022 at 3 21am ‘smart ml’ sends a request to ‘rabbit io’ to start a certain data pipeline job ‘rabbit io’ assigned cluster x machines for this job and even sent amberflo an event stating the ‘smart ml’ started using machines on cluster x on 3/3/2022 at 3 21am { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 500, // let's assume the job takes 500 cpus "metertimeinmillis" 1646320860, // 3/3/2022 at 3 21am "dimensions" { "cluster" "x", "tenant type" "tech" } } but then 15 minutes later, at 3 36am, cluster x goes down and the job fails ‘rabbit io’ doesn’t want to charge ‘smart ml’ for failed resource usage, so they want to cancel that usage event since it never successfully took place coping with system errors to help cope with ‘system errors’ amberflo exposes a ‘filtering rules’ api this api allows you to define rules for filtering out and removing sets of previously ingested meter events example if we take the ‘smart ml’ example, then in order to filter out bad ‘api calls’, all ‘smart ml’ needs to do is https //app amberflo io/ingest snapshot/custom filtering rules https //app amberflo io/ingest snapshot/custom filtering rules { "type" "by property filter out", "id" "2 2 2022 api calls cancelation", "ingestiontimerange" { "starttimeinseconds" 1643763600, // 2/2/2022 1am "endtimeinseconds" 1644940800 // 2/15/2022 4pm }, "meterapiname" "api calls" } advanced filters let’s assume ‘smart ml’ only had a bad deployment for region ‘us west 1’, and so there is no need to filter out all of the ‘api calls’ events that happened between 2/2/2022 1am to 2/15/2022 4pm, but instead, only those those with where the ‘region’ dimension is ‘us west 1’ to define such a filter just populate the ‘dimensionvaluesmap’ property with the dimensions keys and values which you wish to filter out for example https //app amberflo io/ingest snapshot/custom filtering rules https //app amberflo io/ingest snapshot/custom filtering rules { "type" "by property filter out", "id" "2 2 2022 api calls cancelation", "ingestiontimerange" { "starttimeinseconds" 1643763600, // 2/2/2022 1am "endtimeinseconds" 1644940800 // 2/15/2022 4pm }, "meterapiname" "api calls", "dimensionvaluesmap" { "region" \[ "us west 1" ] } } for more information, please refer to our filtering rules api create or update a filtering rule 📘 notice a few things to keep in mind regarding the filtering tool time limitations events can only be canceled within one year of ingestion impact on payments while canceling usage and replacing it with a “corrected” usage is possible and easy, canceling events cannot affect invoices which are already locked (past their grace period time) locked invoices should be considered to have already been paid an account can issue a refund or give a future discount in such cases instead canceling unintentional or unsuccessful events you can use the filtering mechanism mentioned above to filter out specific events according to their unique id (just add the unique id of the event you want to the ‘dimensionvaluesmap’ property mentioned above) this provides an accurate way of canceling events, assuming that you have the unique id of the event you wish to cancel and that you did not reuse the same unique id if you do not have the exact unique id of the event you wish to cancel, yet at the point where you want to cancel an event you do have the “resource related properties” then we can allow you an alternative approach first, let’s define “resource related properties” these related properties depend on the meter type as described below as you can see, the “resource related properties” allows you to identify the resource which was utilized to serve the customer now, to cancel the events for a given resource, just send an ingest event that includes the relevant resource id related properties and the following dimension “aflo cancel previous resource event” with “true” as its value for example, for the ‘rabbit io’ example mentioned above, ‘rabbit io’ needs to send the following ingest event in order to cancel the usage https //app amberflo io/ingest https //app amberflo io/ingest \[{ "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 0, // the value of the cancellation event doesn’t matter "metertimeinmillis" 1666278270000, // 10/20/2022 at 15 04 30 "dimensions" { "cluster" "x", "aflo cancel previous resource event" "true" } }] 📘 limitations a few things to keep in mind regarding the this way of canceling events 9 hour time limit from the moment a resource starts being used, you have 9 hours to cancel the usage event cancelation event timestamp also make sure the timestamp you assign to the cancelation event is no later than 9 hours after the original event in fact, try and set the timestamp of the cancelation event to be as close to the original event as possible, or even use the exact same time of the event you want to cancel why these limitations ? notice that when defining a cancelation event we send the “resource related properties” and nothing else that identifies the event amberflo will assume that the event we want to cancel is some very recent event (in the scope of the last 9 hours) and will go ahead and cancel the most recent event of the specific resource if such an event exists so for example if we used the same resource twice in the last 9 hours then amberflo will always cancel the later event of the two this is also why the value of the cancellation event doesn’t matter (because the cancellation event instructions is to just drop the most recent event regardless of its value) canceling unintentional or unsuccessful usage (advance) the "aflo cancel previous resource event" dynamic cancellation described above cancels events these events indicate start using a resource, stop using a resource, or an update to the usage rate for 'max' or 'total duration' meters, amberflo allows to send an secondary flag "aflo ignore cancellation if no usage" (this addition flag has a meaning only if it comes together with "aflo cancel previous resource event") this flag, if set to 'true', will tell amberflo to ignore the dynamic cancelation of "stop events" to see what's the meaning of this secondary flag let's look at an example let's assume that we have a total duration meter and that amberflo system ingested the 3 following events // start usage { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 10, "metertimeinmillis" 1646321760, // 3/3/2022 at 3 36am "dimensions" { "cluster" "x", } }, // stop usage { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 0, "metertimeinmillis" 1646321820, // 3/3/2022 at 3 37am "dimensions" { "cluster" "x", } }, // cancelation { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 0, "metertimeinmillis" 1646321880, // 3/3/2022 at 3 38am "dimensions" { "cluster" "x", "aflo cancel previous resource event" "true" } } as mentioned, event cancelation cancels the latest event regardless of what it indicates so in the example above, after applying the dynamic cancellation, we will be left with only the start event // start usage { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 10, "metertimeinmillis" 1646321760, // 3/3/2022 at 3 36am "dimensions" { "cluster" "x", } } now, if we want the dynamic cancellation to cancel existing usage events (as opposed to cancelling any type of events), we will need to add the "aflo ignore cancellation if no usage", with "true" as a value, to the 3rd event so the sequence of events sent to amberflo will look like // start usage { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 10, "metertimeinmillis" 1646321760, // 3/3/2022 at 3 36am "dimensions" { "cluster" "x", } }, // stop usage { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 0, "metertimeinmillis" 1646321820, // 3/3/2022 at 3 37am "dimensions" { "cluster" "x", } }, // cancelation { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 0, "metertimeinmillis" 1646321880, // 3/3/2022 at 3 38am "dimensions" { "cluster" "x", "aflo cancel previous resource event" "true", "aflo ignore cancellation if no usage" "true" // <== notice additional secondary flag } } in this case the result of the system run will be { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 10, "metertimeinmillis" 1646321760, // 3/3/2022 at 3 36am "dimensions" { "cluster" "x", } }, // stop usage { "customerid" "smart ml 123", // the actual customer id of smart ml "meterapiname" "cpu used", "metervalue" 0, "metertimeinmillis" 1646321820, // 3/3/2022 at 3 37am "dimensions" { "cluster" "x", } } this is because there was no usage to cancel for the 3rd event 📘 notice "aflo ignore cancellation if no usage" has a meaning only in the context of "max" or "total duration" meters