Metrics SDK
Status: Mixed
Users of OpenTelemetry need a way for instrumentation interactions with the OpenTelemetry API to actually produce telemetry. The OpenTelemetry SDK (henceforth referred to as the SDK) is an implementation of the OpenTelemetry API that provides users with this functionally.
All language implementations of OpenTelemetry MUST provide an SDK.
MeterProvider
Status: Stable
A MeterProvider
MUST provide a way to allow a Resource to
be specified. If a Resource
is specified, it SHOULD be associated with all the
metrics produced by any Meter
from the MeterProvider
. The tracing SDK
specification has provided some
suggestions regarding how to implement this efficiently.
MeterProvider Creation
The SDK SHOULD allow the creation of multiple independent MeterProvider
s.
Meter Creation
It SHOULD only be possible to create Meter
instances through a MeterProvider
(see API).
The MeterProvider
MUST implement the Get a Meter API.
The input provided by the user MUST be used to create
an InstrumentationScope
instance which
is stored on the created Meter
.
In the case where an invalid name
(null or empty string) is specified, a
working Meter MUST be returned as a fallback rather than returning null or
throwing an exception, its name
SHOULD keep the original invalid value, and a
message reporting that the specified value is invalid SHOULD be logged.
When a Schema URL is passed as an argument when creating a Meter
the emitted
telemetry for that Meter
MUST be associated with the Schema URL, provided
that the emitted data format is capable of representing such association.
Configuration
Configuration (i.e. MetricExporters,
MetricReaders and Views) MUST be owned by the
MeterProvider
. The configuration MAY be applied at the time of MeterProvider
creation if appropriate.
The MeterProvider
MAY provide methods to update the configuration. If
configuration is updated (e.g., adding a MetricReader
), the updated
configuration MUST also apply to all already returned Meters
(i.e. it MUST NOT
matter whether a Meter
was obtained from the MeterProvider
before or after
the configuration change). Note: Implementation-wise, this could mean that
Meter
instances have a reference to their MeterProvider
and access
configuration only via this reference.
Shutdown
This method provides a way for provider to do any cleanup required.
Shutdown
MUST be called only once for each MeterProvider
instance. After the
call to Shutdown
, subsequent attempts to get a Meter
are not allowed. SDKs
SHOULD return a valid no-op Meter for these calls, if possible.
Shutdown
SHOULD provide a way to let the caller know whether it succeeded,
failed or timed out.
Shutdown
SHOULD complete or abort within some timeout. Shutdown
MAY be
implemented as a blocking API or an asynchronous API which notifies the caller
via a callback or an event. OpenTelemetry SDK authors MAY
decide if they want to make the shutdown timeout configurable.
Shutdown
MUST be implemented at least by invoking Shutdown
on all registered
MetricReader and MetricExporter instances.
ForceFlush
This method provides a way for provider to notify the registered
MetricReader instances that have an associated
Push Metric Exporter, so they can do as much
as they could to collect and send the metrics.
Note: Pull Metric Exporter can only send the
data when it is being asked by the scraper, so ForceFlush
would not make much
sense.
ForceFlush
MUST invoke ForceFlush
on all registered
MetricReader instances that implement ForceFlush
.
ForceFlush
SHOULD provide a way to let the caller know whether it succeeded,
failed or timed out. ForceFlush
SHOULD return some ERROR status if there
is an error condition; and if there is no error condition, it should return some
NO ERROR status, language implementations MAY decide how to model ERROR
and NO ERROR.
ForceFlush
SHOULD complete or abort within some timeout. ForceFlush
MAY be
implemented as a blocking API or an asynchronous API which notifies the caller
via a callback or an event. OpenTelemetry SDK authors MAY
decide if they want to make the flush timeout configurable.
View
A View
provides SDK users with the flexibility to customize the metrics that
are output by the SDK. Here are some examples when a View
might be needed:
- Customize which Instruments are to be processed/ignored. For example, an instrumented library can provide both temperature and humidity, but the application developer might only want temperature.
- Customize the aggregation - if the default aggregation associated with the Instrument does not meet the needs of the user. For example, an HTTP client library might expose HTTP client request duration as Histogram by default, but the application developer might only want the total count of outgoing requests.
- Customize which attribute(s) are to be reported on metrics. For example, an HTTP server library might expose HTTP verb (e.g. GET, POST) and HTTP status code (e.g. 200, 301, 404). The application developer might only care about HTTP status code (e.g. reporting the total count of HTTP requests for each HTTP status code). There could also be extreme scenarios in which the application developer does not need any attributes (e.g. just get the total count of all incoming requests).
The SDK MUST provide functionality for a user to create Views for a
MeterProvider
. This functionality MUST accept as inputs the Instrument
selection criteria and the resulting stream
configuration.
The SDK MUST provide the means to register Views with a MeterProvider
.
Instrument selection criteria
Instrument selection criteria are the predicates that determine if a View will be applied to an Instrument or not.
Criteria SHOULD be treated as additive. This means an Instrument has to match all the provided criteria for the View to be applied. For example, if the criteria are instrument name == “Foobar” and instrument type is Histogram, it will be treated as (instrument name == “Foobar”) AND (instrument type is Histogram).
The SDK MUST accept the following criteria:
-
name
: The name of the Instrument(s) to match. Thisname
is evaluated to match an Instrument in the following manner.- If the value of
name
is*
, the criterion matches all Instruments. - If the value of
name
is exactly the same as an Instrument, then the criterion matches that instrument.
Additionally, the SDK MAY support wildcard pattern matching for the
name
criterion using the following characters.- A question mark (
?
): matches any single character - An asterisk (
*
): matches any number of any characters including none
If wildcard pattern matching is supported, the
name
criterion will match if the wildcard pattern is evaluated to match the Instrument name.If the SDK does not support wildcards in general, it MUST still recognize the special single asterisk (
*
) character as matching all Instruments.Users can provide a
name
, but it is up to their discretion. Therefore, the instrument selection criteria parameter needs to be structured to accept aname
, but MUST NOT obligate a user to provide one. - If the value of
-
type
: The type of Instruments to match. If the value oftype
is the same as an Instrument’s type, then the criterion matches that Instrument.Users can provide a
type
, but it is up to their discretion. Therefore, the instrument selection criteria parameter needs to be structured to accept atype
, but MUST NOT obligate a user to provide one. -
unit
: If the value ofunit
is the same as an Instrument’s unit, then the criterion matches that Instrument.Users can provide a
unit
, but it is up to their discretion. Therefore, the instrument selection criteria parameter needs to be structured to accept aunit
, but MUST NOT obligate a user to provide one. -
meter_name
: If the value ofmeter_name
is the same as the Meter that created an Instrument, then the criterion matches that Instrument.Users can provide a
meter_name
, but it is up to their discretion. Therefore, the instrument selection criteria parameter needs to be structured to accept ameter_name
, but MUST NOT obligate a user to provide one. -
meter_version
: If the value ofmeter_version
is the same version as the Meter that created an Instrument, then the criterion matches that Instrument.Users can provide a
meter_version
, but it is up to their discretion. Therefore, the instrument selection criteria parameter needs to be structured to accept ameter_version
, but MUST NOT obligate a user to provide one. -
meter_schema_url
: If the value ofmeter_schema_url
is the same schema URL as the Meter that created an Instrument, then the criterion matches that Instrument.Users can provide a
meter_schema_url
, but it is up to their discretion. Therefore, the instrument selection criteria parameter needs to be structured to accept ameter_schema_url
, but MUST NOT obligate a user to provide one.
The SDK MAY accept additional criteria. For example, a strongly typed language may support point type criterion (e.g. allow the users to select Instruments based on whether the underlying number is integral or rational). Users can provide these additional criteria the SDK accepts, but it is up to their discretion. Therefore, the instrument selection criteria can be structured to accept the criteria, but MUST NOT obligate a user to provide them.
Stream configuration
Stream configuration are the parameters that define the metric
stream a MeterProvider
will
use to define telemetry pipelines.
The SDK MUST accept the following stream configuration parameters:
-
name
: The metric stream name that SHOULD be used.In order to avoid conflicts, if a
name
is provided the View SHOULD have an instrument selector that selects at most one instrument. If the Instrument selection criteria for a View with a stream configurationname
parameter can select more than one instrument (i.e. wildcards) the SDK MAY fail fast in accordance with initialization error handling principles.Users can provide a
name
, but it is up to their discretion. Therefore, the stream configuration parameter needs to be structured to accept aname
, but MUST NOT obligate a user to provide one. If the user does not provide aname
value, name from the Instrument the View matches MUST be used by default. -
description
: The metric stream description that SHOULD be used.Users can provide a
description
, but it is up to their discretion. Therefore, the stream configuration parameter needs to be structured to accept adescription
, but MUST NOT obligate a user to provide one. If the user does not provide adescription
value, the description from the Instrument a View matches MUST be used by default. -
attribute_keys
: This is, at a minimum, an allow-list of attribute keys for measurements captured in the metric stream. The allow-list contains attribute keys that identify the attributes that MUST be kept, and all other attributes MUST be ignored.Implementations MAY accept additional attribute filtering functionality for this parameter.
Users can provide
attribute_keys
, but it is up to their discretion. Therefore, the stream configuration parameter needs to be structured to acceptattribute_keys
, but MUST NOT obligate a user to provide them. If the user does not provide any value, the SDK SHOULD use theAttributes
advisory parameter configured on the instrument instead. If theAttributes
advisory parameter is absent, all attributes MUST be kept. -
aggregation
: The name of an aggregation function to use in aggregating the metric stream data.Users can provide an
aggregation
, but it is up to their discretion. Therefore, the stream configuration parameter needs to be structured to accept anaggregation
, but MUST NOT obligate a user to provide one. If the user does not provide anaggregation
value, theMeterProvider
MUST apply a default aggregation configurable on the basis of instrument type according to the MetricReader instance. -
Status: Feature-freeze -
exemplar_reservoir
: A functional type that generates an exemplar reservoir aMeterProvider
will use when storing exemplars. This functional type needs to be a factory or callback similar to aggregation selection functionality which allows different reservoirs to be chosen by the aggregation.Users can provide an
exemplar_reservoir
, but it is up to their discretion. Therefore, the stream configuration parameter needs to be structured to accept anexemplar_reservoir
, but MUST NOT obligate a user to provide one. If the user does not provide anexemplar_reservoir
value, theMeterProvider
MUST apply a default exemplar reservoir. -
Status: Experimental -
aggregation_cardinality_limit
: A positive integer value defining the maximum number of data points allowed to be emitted in a collection cycle by a single instrument. See cardinality limits, below.Users can provide an
aggregation_cardinality_limit
, but it is up to their discretion. Therefore, the stream configuration parameter needs to be structured to accept anaggregation_cardinality_limit
, but MUST NOT obligate a user to provide one. If the user does not provide anaggregation_cardinality_limit
value, theMeterProvider
MUST apply the default aggregation cardinality limit theMetricReader
is configured with.
Measurement processing
The SDK SHOULD use the following logic to determine how to process Measurements made with an Instrument:
- Determine the
MeterProvider
which “owns” the Instrument. - If the
MeterProvider
has noView
registered, take the Instrument and apply the default Aggregation on the basis of instrument kind according to the MetricReader instance’saggregation
property. - If the
MeterProvider
has one or moreView
(s) registered:- For each View, if the Instrument could match the instrument selection
criteria:
- Try to apply the View’s stream configuration. If applying the View results in conflicting metric identities the implementation SHOULD apply the View and emit a warning. If it is not possible to apply the View without producing semantic errors (e.g. the View sets an asynchronous instrument to use the Explicit bucket histogram aggregation) the implementation SHOULD emit a warning and proceed as if the View did not exist.
- If the Instrument could not match with any of the registered
View
(s), the SDK SHOULD enable the instrument using the default aggregation and temporality. Users can configure match-all Views using Drop aggregation to disable instruments by default.
- For each View, if the Instrument could match the instrument selection
criteria:
View examples
The following are examples of an SDK’s functionality to create Views for a
MeterProvider
.
# Python
'''
+------------------+
| MeterProvider |
| Meter A |
| Counter X |
| Histogram Y |
| Meter B |
| Gauge Z |
+------------------+
'''
# metrics from X and Y (reported as Foo and Bar) will be exported
meter_provider
.add_view("X")
.add_view("Foo", instrument_name="Y")
.add_view(
"Bar",
instrument_name="Y",
aggregation=HistogramAggregation(buckets=[5.0, 10.0, 25.0, 50.0, 100.0]))
.add_metric_reader(PeriodicExportingMetricReader(ConsoleExporter()))
# all the metrics will be exported using the default configuration
meter_provider.add_metric_reader(PeriodicExportingMetricReader(ConsoleExporter()))
# all the metrics will be exported using the default configuration
meter_provider
.add_view("*") # a wildcard view that matches everything
.add_metric_reader(PeriodicExportingMetricReader(ConsoleExporter()))
# Counter X will be exported as cumulative sum
meter_provider
.add_view("X", aggregation=SumAggregation())
.add_metric_reader(PeriodicExportingMetricReader(ConsoleExporter()))
# Counter X will be exported as delta sum
# Histogram Y and Gauge Z will be exported with 2 attributes (a and b)
meter_provider
.add_view("X", aggregation=SumAggregation())
.add_view("*", attribute_keys=["a", "b"])
.add_metric_reader(PeriodicExportingMetricReader(ConsoleExporter()),
temporality=lambda kind: Delta if kind in [Counter, AsyncCounter, Histogram] else Cumulative)
Aggregation
An Aggregation
, as configured via the View,
informs the SDK on the ways and means to compute
Aggregated Metrics
from incoming Instrument Measurements.
Note: the term aggregation is used instead of aggregator. It is RECOMMENDED that implementors reserve the “aggregator” term for the future when the SDK allows custom aggregation implementations.
An Aggregation
specifies an operation
(i.e. decomposable aggregate function
like Sum, Histogram, Min, Max, Count)
and optional configuration parameter overrides.
The operation’s default configuration parameter values will be used
unless overridden by optional configuration parameter overrides.
Note: Implementors MAY choose the best idiomatic practice for their language to represent the semantic of an Aggregation and optional configuration parameters.
e.g. The View specifies an Aggregation by string name (i.e. “ExplicitBucketHistogram”).
# Use Histogram with custom boundaries
meter_provider
.add_view(
"X",
aggregation="ExplicitBucketHistogram",
aggregation_params={"Boundaries": [0, 10, 100]}
)
e.g. The View specifies an Aggregation by class/type instance.
// Use Histogram with custom boundaries
meterProviderBuilder
.AddView(
instrumentName: "X",
aggregation: new ExplicitBucketHistogramAggregation(
boundaries: new double[] { 0.0, 10.0, 100.0 }
)
);
TODO: after we release the initial Stable version of Metrics SDK specification, we will explore how to allow configuring custom ExemplarReservoirs with the View API.
The SDK MUST provide the following Aggregation
to support the
Metric Points in the
Metrics Data Model.
The SDK SHOULD provide the following Aggregation
:
Drop Aggregation
The Drop Aggregation informs the SDK to ignore/drop all Instrument Measurements for this Aggregation.
This Aggregation does not have any configuration parameters.
Default Aggregation
The Default Aggregation informs the SDK to use the Instrument kind
to select
an aggregation and advisory
parameters to influence aggregation configuration
parameters (as noted in the “Selected Aggregation” column).
Instrument Kind | Selected Aggregation |
---|---|
Counter | Sum Aggregation |
Asynchronous Counter | Sum Aggregation |
UpDownCounter | Sum Aggregation |
Asynchronous UpDownCounter | Sum Aggregation |
Gauge | Last Value Aggregation |
Asynchronous Gauge | Last Value Aggregation |
Histogram | Explicit Bucket Histogram Aggregation, with the ExplicitBucketBoundaries advisory parameter if provided |
This Aggregation does not have any configuration parameters.
Sum Aggregation
The Sum Aggregation informs the SDK to collect data for the Sum Metric Point.
The monotonicity of the aggregation is determined by the instrument type:
Instrument Kind | SumType |
---|---|
Counter | Monotonic |
UpDownCounter | Non-Monotonic |
Histogram | Monotonic |
Asynchronous Gauge | Non-Monotonic |
Asynchronous Counter | Monotonic |
Asynchronous UpDownCounter | Non-Monotonic |
This Aggregation does not have any configuration parameters.
This Aggregation informs the SDK to collect:
- The arithmetic sum of
Measurement
values.
Last Value Aggregation
The Last Value Aggregation informs the SDK to collect data for the Gauge Metric Point.
This Aggregation does not have any configuration parameters.
This Aggregation informs the SDK to collect:
- The last
Measurement
. - The timestamp of the last
Measurement
.
Histogram Aggregations
All histogram Aggregations inform the SDK to collect:
- Count of
Measurement
values in population. - Arithmetic sum of
Measurement
values in population. This SHOULD NOT be collected when used with instruments that record negative measurements (e.g.UpDownCounter
orObservableGauge
). - Min (optional)
Measurement
value in population. - Max (optional)
Measurement
value in population.
Explicit Bucket Histogram Aggregation
The Explicit Bucket Histogram Aggregation informs the SDK to collect data for the Histogram Metric Point using a set of explicit boundary values for histogram bucketing.
This Aggregation honors the following configuration parameters:
Key | Value | Default Value | Description |
---|---|---|---|
Boundaries | double[] | [ 0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000 ] | Array of increasing values representing explicit bucket boundary values. The Default Value represents the following buckets (heavily influenced by the default buckets of Prometheus clients, e.g. Java and Go): (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, 25.0], (25.0, 50.0], (50.0, 75.0], (75.0, 100.0], (100.0, 250.0], (250.0, 500.0], (500.0, 750.0], (750.0, 1000.0], (1000.0, 2500.0], (2500.0, 5000.0], (5000.0, 7500.0], (7500.0, 10000.0], (10000.0, +∞). SDKs SHOULD use the default value when boundaries are not explicitly provided, unless they have good reasons to use something different (e.g. for backward compatibility reasons in a stable SDK release). |
RecordMinMax | true, false | true | Whether to record min and max. |
Explicit buckets are stated in terms of their upper boundary. Buckets are exclusive of their lower boundary and inclusive of their upper bound (except at positive infinity). A measurement is defined to fall into the greatest-numbered bucket with boundary that is greater than or equal to the measurement.
Base2 Exponential Bucket Histogram Aggregation
The Base2 Exponential Histogram Aggregation informs the SDK to collect data
for the Exponential Histogram Metric
Point, which uses a base-2 exponential
formula to determine bucket boundaries and an integer scale
parameter to control resolution. Implementations adjust scale as necessary given
the data.
This Aggregation honors the following configuration parameters:
Key | Value | Default Value | Description |
---|---|---|---|
MaxSize | integer | 160 | Maximum number of buckets in each of the positive and negative ranges, not counting the special zero bucket. |
MaxScale | integer | 20 | Maximum scale factor. |
RecordMinMax | true, false | true | Whether to record min and max. |
The default of 160 buckets is selected to establish default support
for a high-resolution histogram able to cover a long-tail latency
distribution from 1ms to 100s with less than 5% relative error.
Because 160 can be factored into 10 * 2**K
, maximum contrast is
relatively simple to derive for scale K
:
Scale | Maximum data contrast at 10 * 2**K buckets |
---|---|
K+2 | 5.657 (2**(10/4)) |
K+1 | 32 (2**(10/2)) |
K | 1024 (2**10) |
K-1 | 1048576 (2**20) |
The following table shows how the ideal scale for 160 buckets is calculated as a function of the input range:
Input range | Contrast | Ideal Scale | Base | Relative error |
---|---|---|---|---|
1ms - 4ms | 4 | 6 | 1.010889 | 0.542% |
1ms - 20ms | 20 | 5 | 1.021897 | 1.083% |
1ms - 1s | 10**3 | 4 | 1.044274 | 2.166% |
1ms - 100s | 10**5 | 3 | 1.090508 | 4.329% |
1μs - 10s | 10**7 | 2 | 1.189207 | 8.643% |
Note that relative error is calculated as half of the bucket width
divided by the bucket midpoint, which is the same in every bucket.
Using the bucket from [1, base), we have (bucketWidth / 2) / bucketMidpoint = ((base - 1) / 2) / ((base + 1) / 2) = (base - 1) / (base + 1)
.
This Aggregation uses the notion of “ideal” scale. The ideal scale is either:
- The
MaxScale
(see configuration parameters), generally used for single-value histogram Aggregations where scale is not otherwise constrained. - The largest value of scale such that no more than the maximum number of buckets are needed to represent the full range of input data in either of the positive or negative ranges.
Handle all normal values
Implementations are REQUIRED to accept the entire normal range of IEEE floating point values (i.e., all values except for +Inf, -Inf and NaN values).
Implementations SHOULD NOT incorporate non-normal values (i.e., +Inf,
-Inf, and NaNs) into the sum
, min
, and max
fields, because these
values do not map into a valid bucket.
Implementations MAY round subnormal values away from zero to the nearest normal value.
Support a minimum and maximum scale
The implementation MUST maintain reasonable minimum and maximum scale
parameters that the automatic scale parameter will not exceed. The maximum scale
is defined by the MaxScale
configuration parameter.
Use the maximum scale for single measurements
When the histogram contains not more than one value in either of the positive or negative ranges, the implementation SHOULD use the maximum scale.
Maintain the ideal scale
Implementations SHOULD adjust the histogram scale as necessary to maintain the best resolution possible, within the constraint of maximum size (max number of buckets). Best resolution (highest scale) is achieved when the number of positive or negative range buckets exceeds half the maximum size, such that increasing scale by one would not be possible given the size constraint.
Observations inside asynchronous callbacks
Callback functions MUST be invoked for the specific MetricReader
performing collection, such that observations made or produced by
executing callbacks only apply to the intended MetricReader
during
collection.
The implementation SHOULD disregard the use of asynchronous instrument APIs outside of registered callbacks.
The implementation SHOULD use a timeout to prevent indefinite callback execution.
The implementation MUST complete the execution of all callbacks for a given instrument before starting a subsequent round of collection.
The implementation SHOULD NOT produce aggregated metric data for a previously-observed attribute set which is not observed during a successful callback. See MetricReader for more details on the persistence of metrics across successive collections.
Cardinality limits
Status: Experimental
SDKs SHOULD support being configured with a cardinality limit. A cardinality limit is the hard limit on the number of metric streams that can be collected.
Configuration
The cardinality limit for an aggregation is defined in one of three ways:
- A view with criteria matching the instrument an aggregation is
created for has an
aggregation_cardinality_limit
value defined for the stream, that value SHOULD be used. - If there is no matching view, but the
MetricReader
defines a default cardinality limit value based on the instrument an aggregation is created for, that value SHOULD be used. - If none of the previous values are defined, the default value of 2000 SHOULD be used.
Overflow attribute
An overflow attribute set is defined, containing a single attribute
otel.metric.overflow
having (boolean) value true
, which is used to
report a synthetic aggregation of the metric events that could not be
independently aggregated because of the limit.
The SDK MUST create an Aggregator with the overflow attribute set prior to reaching the cardinality limit and use it to aggregate events for which the correct Aggregator could not be created. The maximum number of distinct, non-overflow attributes is one less than the limit, as a result.
Synchronous instrument cardinality limits
Aggregators for synchronous instruments with cumulative temporality MUST continue to export all attribute sets that were observed prior to the beginning of overflow. Metric events corresponding with attribute sets that were not observed prior to the overflow will be reflected in a single data point described by (only) the overflow attribute.
Aggregators of synchronous instruments with delta aggregation temporality MAY choose an arbitrary subset of attribute sets to output to maintain the stated cardinality limit.
Regardless of aggregation temporality, the SDK MUST ensure that every metric event is reflected in exactly one Aggregator, which is either an Aggregator associated with the correct attribute set or an aggregator associated with the overflow attribute set.
Events MUST NOT be double-counted or dropped during an overflow.
Asynchronous instrument cardinality limits
Aggregators of asynchronous instruments SHOULD prefer the first-observed attributes in the callback when limiting cardinality, regardless of temporality.