Skip to main content

Core Concepts

How TokenSense intercepts calls

TokenSense wraps your LLM client using Python’s __getattr__ proxy pattern. When you call client.messages.create(...), TokenSense:
  1. Forwards the call to the original client — unchanged
  2. Waits for the response
  3. Extracts metadata from the response (tokens, model, cost)
  4. Emits a CallEvent to a background thread
  5. Returns the original response to your code
Your code receives the exact same response object as before. The background thread handles the event asynchronously — your call latency is not affected.

observe()

The core function. Wraps any supported LLM client and returns a drop-in replacement.

Signature

def observe(
    client: Any,
    output: BaseOutput | None = None,
    user_id: str | None = None,
    session_id: str | None = None,
    tags: list[str] | None = None,
    log_prompts: bool = False,
    log_responses: bool = False,
    on_event: Callable[[CallEvent], None] | None = None,
) -> ObservedClient

Parameters

ParameterTypeDefaultDescription
clientAnyrequiredLLM client to wrap
outputBaseOutputautoWhere events are sent. Auto-detects by ENV if not set
user_idstringNoneIdentifier attached to every event from this client
session_idstringNoneGroups multiple calls into a session
tagslist[str]NoneLabels for filtering and segmentation
log_promptsboolFalseInclude prompt content in events (opt-in)
log_responsesboolFalseInclude response content in events (opt-in)
on_eventcallableNoneFunction called after each event is written

Examples

Minimal — just observe:
from tokensense import observe
client = observe(anthropic.Anthropic())
With output:
from tokensense import observe
from tokensense.outputs import SQLite

client = observe(anthropic.Anthropic(), output=SQLite("./usage.db"))
With user context:
client = observe(
    anthropic.Anthropic(),
    user_id="user_123",
    session_id="chat_session_456",
    tags=["production", "chat-feature"],
)
Wrapping OpenAI:
import openai
client = observe(openai.OpenAI())
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
Wrapping Groq:
import groq
client = observe(groq.Groq())
response = client.chat.completions.create(
    model="llama3-8b-8192",
    messages=[{"role": "user", "content": "Hello"}]
)
Async client:
import anthropic
client = observe(anthropic.AsyncAnthropic())
response = await client.messages.create(...)
With explicit prompt logging:
# only do this when you specifically need prompt content in your logs
client = observe(
    anthropic.Anthropic(),
    log_prompts=True,
    log_responses=True,
)