Overview

Core Concepts

How TokenSense intercepts calls

TokenSense wraps your LLM client using Python’s __getattr__ proxy pattern. When you call client.messages.create(...), TokenSense:

Forwards the call to the original client — unchanged
Waits for the response
Extracts metadata from the response (tokens, model, cost)
Emits a CallEvent to a background thread
Returns the original response to your code

Your code receives the exact same response object as before. The background thread handles the event asynchronously — your call latency is not affected.

observe()

The core function. Wraps any supported LLM client and returns a drop-in replacement.

Signature

def observe(
    client: Any,
    output: BaseOutput | None = None,
    user_id: str | None = None,
    session_id: str | None = None,
    tags: list[str] | None = None,
    log_prompts: bool = False,
    log_responses: bool = False,
    on_event: Callable[[CallEvent], None] | None = None,
) -> ObservedClient

Parameters

Parameter	Type	Default	Description
`client`	Any	required	LLM client to wrap
`output`	BaseOutput	auto	Where events are sent. Auto-detects by ENV if not set
`user_id`	string	None	Identifier attached to every event from this client
`session_id`	string	None	Groups multiple calls into a session
`tags`	list[str]	None	Labels for filtering and segmentation
`log_prompts`	bool	False	Include prompt content in events (opt-in)
`log_responses`	bool	False	Include response content in events (opt-in)
`on_event`	callable	None	Function called after each event is written

Examples

Minimal — just observe:

from tokensense import observe
client = observe(anthropic.Anthropic())

With output:

from tokensense import observe
from tokensense.outputs import SQLite

client = observe(anthropic.Anthropic(), output=SQLite("./usage.db"))

With user context:

client = observe(
    anthropic.Anthropic(),
    user_id="user_123",
    session_id="chat_session_456",
    tags=["production", "chat-feature"],
)

Wrapping OpenAI:

import openai
client = observe(openai.OpenAI())
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

Wrapping Groq:

import groq
client = observe(groq.Groq())
response = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[{"role": "user", "content": "Hello"}]
)

Async client:

import anthropic
client = observe(anthropic.AsyncAnthropic())
response = await client.messages.create(...)

With explicit prompt logging:

# only do this when you specifically need prompt content in your logs
client = observe(
    anthropic.Anthropic(),
    log_prompts=True,
    log_responses=True,
)

Get Started

Observability

Smart Routing

Providers

Advanced

Privacy & Security

Core Concepts

How TokenSense intercepts calls

observe()

Signature

Parameters

Examples

​Core Concepts

​How TokenSense intercepts calls

​observe()

​Signature

​Parameters

​Examples

Core Concepts

How TokenSense intercepts calls

observe()

Signature

Parameters

Examples