.NET's ActivityListener sampling API

Distributed tracing can generate a lot of data, and sampling is the most established method to keep data volumes manageable. In .NET, the System.Diagnostics.ActivityListener class exposes two properties to control sampling: Sample, and SampleUsingParentId.

How do these work?

If you just switched browser tabs to check the documentation, I know why you’re back. There’s barely any useful information to be found about these, either in the .NET documentation or elsewhere online. Even the original API design proposal is a dead end. The canonical ActivityListener example, copied everywhere, includes a Sample implementation that’s something like this:

var listener = new ActivityListener();
listener.Sample = (ref ActivityCreationOptions<ActivityContext> _) =>
                                        ActivitySamplingResult.AllData;
// ...

You see, you need to specify a Sample function when creating an activity listener. The default sampling decision is ActivitySamplingResult.None, and if all registered ActivityListeners return this value, no activities will be created at all.

If you want to use the sampling function to do something more sophisticated than simply capture all traces, it’s assumed you’ll plug in the OpenTelemetry SDK and use its sampling APIs to achieve this, and there really isn’t any guidance out there otherwise. Depending on your circumstances, the OpenTelemetry SDK might be the right tool for the job, but it’s still deeply unsatisfying to rely on a core .NET diagnostics API that’s practically undocumented.

This year I’ve spent some time bridging System.Diagnostics.Activity and Serilog, and in the process had to dig deeper into how ActivityListener sampling works. Here are my conclusions, wrapped up in a tiny but non-trivial sampler. I’m fully aware that some of my conclusions and assumptions might be wrong; if you’re kind enough to send corrections I’ll make sure this article is updated.

`IntervalSampler`

The sampler presented here is called IntervalSampler. Its source code lives in a SerilogTracing example project on GitHub.

static class IntervalSampler
{
    public static SampleActivity<ActivityContext> Create(ulong interval)
    {
        ArgumentOutOfRangeException.ThrowIfZero(interval);
        var next = interval - 1;
        
        return (ref ActivityCreationOptions<ActivityContext> options) =>
        {
            if (options.Parent != default)
            {
                return (options.Parent.TraceFlags & ActivityTraceFlags.Recorded) ==
                                                    ActivityTraceFlags.Recorded ?
                    ActivitySamplingResult.AllDataAndRecorded :
                    options.Parent.IsRemote ?
                        ActivitySamplingResult.PropagationData :
                        ActivitySamplingResult.None;
            }

            var n = Interlocked.Increment(ref next) % interval;
            return n == 0
                ? ActivitySamplingResult.AllDataAndRecorded
                : ActivitySamplingResult.PropagationData;
        };
    }
}

IntervalSampler aims to collect one in every N possible traces (the “interval”), selected using modulo arithmetic. A more robust sampler might introduce some randomness into this process to avoid skewing the sample when an application produces the same types of traces in a very regular sequence, but those kinds of details would obscure the parts of the sampler that are important for our current purposes.

The sampler creates a sampling function that is wired up like so:

var listener = new ActivityListener();
listener.Sample = IntervalSampler.Create(7);
// ...

`Sample` vs `SampleUsingParentId`

The first thing you’ll encounter when setting up a sampler are the apparent duplication of the sampling function between

ActivityListener.Sample, which describes the parent of the sampled activity using ActivityContext, and
ActivityListener.SampleUsingParentId, which describes the parent using string.

public SampleActivity<string>? SampleUsingParentId { get; set; }
public SampleActivity<ActivityContext>? Sample { get; set; }

Both APIs were added in .NET 5, so one isn’t an obsolete alternative to the other. When should each be used?

It turns out that SampleUsingParentId supports both W3C and Microsoft’s legacy “hierarchical” tracing schemes. If a listener has both SampleUsingParentId and Sample configured, then SampleUsingParentId will be used. Otherwise, if the activity is using the W3C tracing scheme, Sample will be used.

So this suggests SampleUsingParentId is the best, most general thing to implement? No, not really. Non-W3C tracing is on its way to extinction, and within SampleUsingParentId you can’t directly access the modern, fundamental properties describing the parent activity, such as its trace id, span id, or trace flags.

IntervalSampler supports the Sample delegate signature:

return (ref ActivityCreationOptions<ActivityContext> options) =>
{
    // ...
};

TL;DR: unless you’re writing code that has to work in a legacy tracing scheme, Sample is the way to go, and you can safely ignore SampleUsingParentId.

Sampling traces, vs sampling activities

The next thing to confront is the subtle difference between the purpose of the Sample API — to determine whether or not to create an Activity — and the reason that you’re interested in it, which is to determine whether the trace to which the Activity belongs should be recorded.

An Activity is just one single span within a hierarchical trace. Sampling generally aims to either create all of the spans in a trace, or none of them. Once a decision has been made for the Activity corresponding to the root span in a trace, then all of its child activities should be included in the sample, too.

That’s what the first condition in our sampling delegate is concerned with:

if (options.Parent != default)
{
    return (options.Parent.TraceFlags & ActivityTraceFlags.Recorded) == 
                                        ActivityTraceFlags.Recorded ?
        ActivitySamplingResult.AllDataAndRecorded :
        options.Parent.IsRemote ?
            ActivitySamplingResult.PropagationData :
            ActivitySamplingResult.None;
}

If the activity we’re being asked to make a decision about would become the child of an existing activity, then we use the sampling decision already made for that activity.

If the parent activity is recorded (included in the sample), then the child is too, and ActivitySamplingResult.AllDataAndRecorded is the correct result.

Take care: the very similarly-named ActivitySamplingResult.AllData causes an Activity to be created, but it doesn’t mark the trace as being recorded. If you return ActivitySamplingResult.AllData from your sampler, activities likely won’t show up in your tracing system, and the sampling decision won’t be propagated downstream to other services and systems you call.

In the case that the parent isn’t included in the sample, we return ActivitySamplingResult.PropagationData to ensure a local activity is still created when the parent is remote, and otherwise return ActivitySamplingResult.None to save allocation of a new Activity instance.

At the root of the trace

The next, and final part of IntervalSampler, is concerned with root activities. These don’t have a parent, so when we make a sampling decision for them, we’re really making a decision about the whole trace: this Activity, and its (potential) future children.

var n = Interlocked.Increment(ref next) % interval;
return n == 0
    ? ActivitySamplingResult.AllDataAndRecorded
    : ActivitySamplingResult.PropagationData;

That’s why, when an activity isn’t included in the sample, we return ActivitySamplingResult.PropagationData instead of ActivitySamplingResult.None. If we returned ActivitySamplingResult.None, no activity would be created, and so later on we’d have no way to remember our decision when looking at more deeply-nested activities. The ActivitySamplingResult.PropagationData option does cause creation of an activity, but it’ll be marked in such a way that only minimal processing is performed on it, and it will ultimately be discarded.

So there you have it

Hopefully the information here helps you to skip some of the digging I’ve had to do, and sheds some light on what ActivityListener.Sample is all about. Corrections and errata welcome - and if you spot other examples or documentation surrounding ActivityListener.Sample that I’ve missed, I’d love to know about those, too.

Happy tracing! 👋

2024-10-05: added the IsRemote check when sampling by parent, to ensure an activity is always created for propagation purposes.

IntervalSampler

Sample vs SampleUsingParentId

Sampling traces, vs sampling activities

At the root of the trace

So there you have it

`IntervalSampler`

`Sample` vs `SampleUsingParentId`