Tracking Token Usage in Azure OpenAI (PAYG vs. PTU)
How Azure OpenAI Tracks Token Usage
Azure OpenAI measures usage in terms of tokens processed â counting both prompt tokens (input) and completion tokens (output) for each API call. Every requestâs response actually includes a usage breakdown in the JSON (showing prompt tokens, completion tokens, and total tokens). Under the hood, Azure OpenAI aggregates these token counts as metrics. Importantly, this usage tracking works the same way regardless of billing model â whether you are on Pay-As-You-Go or using Provisioned Throughput Units (PTUs). The service doesnât âcountâ tokens differently for different billing plans; it always tallies the number of tokens consumed by your requests in a consistent manner (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). In other words, token consumption is recorded identically; only the billing interpretation of that usage differs (as explained below).
- Pay-As-You-Go (PAYG): On the PAYG model, you are charged per token consumed (with different rates for input vs. output tokens, varying by model). The Azure OpenAI resource still tracks how many tokens you use, but in this model those token counts directly translate into costs on your bill. Azure imposes certain rate limits (tokens-per-minute quotas) on PAYG deployments since they run on shared infrastructure. For example, API responses for PAYG calls include headers like
x-ratelimit-remaining-tokens
indicating how many tokens you have left in the current time window (Azure OpenAI PTU utilization – Microsoft Q&A). These headers help you gauge usage against the rate limit but do not change how tokens are counted â theyâre purely for throttling feedback. - Provisioned Throughput Units (PTU): PTU (a provisioned capacity model) means you reserve a dedicated throughput (measured in token processing units per second/minute) for a fixed hourly or monthly fee. Token usage is still counted in the same way (the service logs how many tokens your calls used), but you arenât charged per token. Instead, you pay for the reserved capacity (whether you fully utilize it or not). Because capacity is prepaid, thereâs no pay-per-token charge to âmeterâ in real time; however, Azure provides metrics to show how much of your capacity youâre using. For instance, API responses on PTU deployments include an
azure-openai-deployment-utilization
header, which indicates the current utilization percentage of your reserved throughput (Azure OpenAI PTU utilization – Microsoft Q&A). This header tells you how close the deployment is to its maximum PTU capacity at that moment (unlike PAYG, where the focus is on remaining tokens before throttling). Again, the internal token counting is the same; itâs the billing that differs (PTU is a flat rate for capacity, so those token counts arenât directly billed, but they are used to calculate utilization).
Consistent Token Counting Across Billing Models
No matter the billing model, Azure OpenAIâs usage metrics count tokens uniformly. Each callâs prompt and completion tokens are summed as âtokens processed,â and these get recorded in Azureâs monitoring system. Microsoftâs documentation confirms that core usage metrics like âProcessed Prompt Tokensâ (input tokens) and âGenerated Completion Tokensâ (output tokens) apply to both PTU and Pay-As-You-Go deployments (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). In other words, the same metric definitions are used whether youâre on a PTU (provisioned) deployment or a standard PAYG deployment. The billing model does not change how the service measures token usage â it only changes how you pay for that usage.
To be clear, using PTUs doesnât give you any âdifferent kindâ of token count; it simply means you have purchased a certain throughput. You can imagine that under both models, an internal counter is adding up tokens in the same way. The PAYG model converts those counts into a dollar cost per 1,000 tokens, whereas the PTU model converts them into a percentage of your reserved capacity used. Microsoftâs official metrics reference shows that metrics like âProcessed Inference Tokensâ (which counts total tokens = prompt + completion) are reported for all deployment types (Standard PAYG, PTU, and PTU-managed) (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). This confirms that usage reporting is consistent across billing models â the system counts tokens the same; itâs just that with PTU you wonât see a running monetary charge for each token.
Additionally, the Azure OpenAI monitoring dashboards provided in Azure illustrate both scenarios side by side. The out-of-box dashboard for an Azure OpenAI resource has a âTokens-Based Usageâ section (showing token consumption over time) and a âPTU Utilizationâ section for those with provisioned throughput (Monitor Azure OpenAI Service – Azure AI services | Microsoft Learn). The presence of both categories indicates that token usage is tracked universally, while PTU customers get an extra view of capacity usage. In summary, you can trust that a âtokenâ is a token â counted the same way â regardless of whether you pay per token (PAYG) or via reserved capacity (PTU). The billing model only affects how costs are calculated, not how usage is measured.
PTU-Specific Metrics (Utilization and Throughput)
While the fundamental usage metrics are the same for all billing models, Azure provides additional metrics for PTU deployments to help you monitor your reserved capacity utilization. If you are using PTUs, youâll want to pay attention to metrics that reflect throughput and utilization of your provisioned units:
- Utilization (%) Metrics: The key PTU-specific metric is âProvisioned-Managed Utilization V2â, which measures what percentage of your allocated throughput is being used over time (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). This metric essentially tracks
(tokens consumed / tokens capacity)
in each time interval to show how close you are to saturating your PTU. Microsoft documentation describes this metric as âUtilization % for a provisioned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100â, reported in 1-minute increments (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn) (Azure OpenAI Service provisioned throughput – Azure AI services | Microsoft Learn). When this utilization hits 100%, it means your deployment is at full capacity; further requests will be throttled with HTTP 429 errors until utilization drops. The Azure portalâs PTU Utilization dashboard will graph this percentage so you can see your usage vs. capacity at a glance. (For PAYG deployments, this metric isnât applicable, since thereâs no fixed capacity â instead PAYG uses rate-limit policies at the service level.) - Active Tokens (Throughput) Metric: Another PTU-related metric is âActive Tokensâ, which represents the number of tokens processed minus any tokens served from cache (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). This metric is used for PTU and PTU-managed deployments to gauge the actual token throughput hitting the model (excluding cached reuse). In practice, âActive Tokensâ helps PTU customers understand their TPS/TPM (tokens per second or per minute) against the provisioned capacity (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). You can compare this to your expected throughput to see if youâre within bounds. (This metric isnât as relevant for PAYG because on PAYG youâre typically more concerned with total tokens for cost, whereas PTU youâre concerned with tokens per time interval for utilization.)
- Tokens per Second: Azure Monitor also offers a âTokens Per Secondâ metric (a real-time throughput rate) and related timing metrics, but note that Microsoft currently reports Tokens/sec and some latency metrics only for PTU deployments, not for pay-as-you-go (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). This is likely because in shared (PAYG) mode those performance metrics can vary unpredictably, whereas with dedicated PTU capacity they can measure consistent throughput. So if you are on PTU, you have a richer set of performance metrics (e.g. latency, time between tokens, etc.) to analyze; on PAYG these specific metrics are not populated (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn).
In short, PTU customers get extra metrics to manage their reserved capacity (utilization % and throughput rates), which are accessible via Azure Monitor. These are in addition to the standard token consumption metrics that everyone gets. The billing model doesnât affect the counting of tokens, but with PTU youâll use these metrics to ensure youâre using what you paid for efficiently (and not consistently hitting 100% utilization, for example).
Viewing Usage Metrics in the Azure Portal
You can validate and monitor token usage for your Azure OpenAI resource directly in the Azure Portal. Microsoft provides multiple ways to see both your token consumption and, if applicable, your PTU utilization:
- Azure OpenAI Resource Dashboards: Navigate to your Azure OpenAI resource in the Azure Portal. On the Overview blade, youâll typically see some high-level metrics. For a deeper look, Microsoftâs documentation mentions an AI Foundry metrics dashboard accessible via the Azure OpenAI resource page (thereâs a âGo to Azure AI Foundry portalâ link on the overview pane) (Monitor Azure OpenAI Service – Azure AI services | Microsoft Learn). The built-in metrics dashboard is grouped into categories like âHTTP Requestsâ, âTokens-Based Usageâ, âPTU Utilizationâ, and âFine-tuningâ (Monitor Azure OpenAI Service – Azure AI services | Microsoft Learn). To check token usage, youâd focus on the Tokens-Based Usage graphs, which display the number of tokens used over time. If you have PTU deployments, the PTU Utilization section will show how much of your capacity is being used (often as a percentage or as active tokens vs. allocated tokens). These out-of-box dashboards provide a convenient at-a-glance view. For example, you might see a chart of âTotal Tokens per hourâ and a chart of âUtilization % of PTU deployment Xâ on the same page.
- Metrics Explorer (Custom Metrics): For more control, use the Metrics blade under Monitoring for your Azure OpenAI resource. Here you can plot and filter specific metrics. In the Metrics explorer:
- Select your Azure OpenAI resource and the metric namespace (it may default to Azure OpenAI metrics).
- For token usage, choose metrics such as âProcessed Prompt Tokensâ (input tokens), âGenerated Completion Tokensâ (output tokens), or âProcessed Inference Tokensâ (total tokens). These metrics are recorded automatically for all deployments (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn) (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). You can view them as a sum over time, or rate, etc., and adjust the time range to your needs.
- Apply splits or filters by deployment or model, if desired. The metrics include dimensions like ModelDeploymentName and ModelName. For instance, you can filter the metric to a specific deployment (if you have multiple model deployments under the same Azure OpenAI resource) to see token usage for that particular model endpoint (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). This effectively gives you a per-deployment breakdown of token consumption. Similarly, splitting by ModelName could show separate lines for GPT-4 vs GPT-3.5 deployments, etc.
- If you are using PTU, select metrics like âAzureOpenAIProvisionedManagedUtilizationV2â (the utilization % discussed above) or âActiveTokensâ to monitor capacity usage (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn) (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). These can be filtered by deployment as well (in case you have multiple PTU deployments).
- You can also set up alerts on these metrics (for example, an alert if utilization % goes above 90% or if token usage spikes beyond a certain rate).
- Cost Analysis (Billing): For PAYG users, you can cross-check cost and usage via Azure Cost Management. In the Azure Portal, go to your subscriptionâs Cost Analysis (or the resourceâs Cost Analysis if supported) to see charges. Token usage charges appear under Cognitive Services for the Azure OpenAI resource. For example, you can filter by your resource or by service name to see how much you spent on input and output tokens in a given period. This is more about dollars, but it indirectly reflects the token counts (since cost is proportional to tokens in PAYG). For PTU, your cost will be a fixed amount (for the reserved capacity hours) rather than per-token charges, so Cost Analysis will show the reservation costs. The token metrics in Azure Monitor are the better way to see actual token volumes in PTU scenarios (since cost wonât fluctuate with usage when on a fixed PTU plan).
In summary, the Azure Portalâs Monitoring -> Metrics section is the primary place to validate token usage numbers. The Metrics dashboard gives a friendly overview, and the Metrics explorer allows detailed queries and breakdowns (e.g., per deployment or model). All these are consistent across billing models â youâll see token counts in both cases; if you have PTU, youâll just have some extra metrics like utilization available as well.
Breakdown by Deployment, Model, or API Key
By default, Azure OpenAIâs built-in metrics allow you to break down usage per deployment and model, but not inherently by individual end-user or API key (since the service doesnât inherently know about multiple API keys if youâre just using the single resource key). Hereâs how to achieve various breakdowns:
- Per Deployment / Model: As noted, metrics can be filtered by the ModelDeploymentName (the name you gave the deployment in Azure) or ModelName (the base model, e.g., âgpt-4â or âgpt-35-turboâ). This means if you have multiple deployments (for example, one deployment of gpt-4 and another of gpt-3.5 in your resource), you can see token usage for each separately (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). In the Azure Portal metrics interface, you would either apply a filter for a specific deployment or use the âSplitâ function on the deployment dimension to get a chart with one line per deployment. This is very useful for monitoring which model is consuming how many tokens. It answers questions like âWhich of my deployments is driving most of the usage?â directly from the Azure metrics â no external instrumentation needed.
- Per API Key / Consumer: Azure OpenAI doesnât natively report metrics by API key or caller identity if youâre using the resource access keys directly â all usage with a given resource key aggregates under that resource. If you need to track usage by different users or applications, you have a couple of options:
- Use Azure API Management (APIM): Microsoft recommends using APIM as a front-end to Azure OpenAI if you want to expose it to multiple internal or external consumers with separate credentials. By importing the Azure OpenAI API into APIM, you can issue separate subscription keys to different consumers. APIM can then emit custom metrics per subscription. In fact, there is a built-in APIM policy called
azure-openai-emit-token-metric
which records token usage metrics to Application Insights, and it allows adding dimensions such as the APIM Subscription ID (which maps to an individual API key/consumer) (Azure API Management policy reference – azure-openai-emit-token-metric | Microsoft Learn). This way, you can get a breakdown of tokens used per client. Essentially, APIM will capture the usage from each caller separately and forward the calls to your Azure OpenAI resource. Azure OpenAI itself will still see total tokens, but APIMâs metrics or logs will attribute which subscription (user) was responsible. This is a recommended approach for multi-tenant scenarios or chargeback models. (Microsoftâs documentation and samples confirm you can include dimensions likeSubscription ID
orUser ID
in the token metric policy to achieve per-consumer tracking (Azure API Management policy reference – azure-openai-emit-token-metric | Microsoft Learn).) - Custom Logging in Application Code: Alternatively, you could parse the usage from each API response (the usage JSON mentioned earlier) and log it along with an identifier of the user/request in your own database or analytics tool. This requires more custom work, but itâs another way to get per-user token counts if APIM is not used. Each response provides
total_tokens
, so your application can sum those per user over time.
- Use Azure API Management (APIM): Microsoft recommends using APIM as a front-end to Azure OpenAI if you want to expose it to multiple internal or external consumers with separate credentials. By importing the Azure OpenAI API into APIM, you can issue separate subscription keys to different consumers. APIM can then emit custom metrics per subscription. In fact, there is a built-in APIM policy called
Itâs worth noting that Azure Monitorâs default metrics do include a dimension called UsageChannel
and ApiName
, which indicate how the call was made (for example, which API operation or channel â such as ChatCompletion vs Completion) (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn). But they do not include a caller ID by default. Thus, for token breakdown by API key or user, you will need to implement a solution like APIM or custom logging. The billing itself (especially for PAYG) is at the resource level, so Azureâs own cost reports wonât split by user â thatâs something youâd manually create via the above methods if needed for internal accounting.
Summary
In conclusion, Azure OpenAI tracks token usage uniformly across both Pay-As-You-Go and PTU billing models. All usage is measured in tokens (input and output) and surfaced through Azure Monitor metrics. The billing model only affects how you are charged (per token in PAYG vs. per hour of capacity in PTU) â it does not change the underlying token counting. PAYG and PTU deployments both report into metrics like âProcessed Prompt Tokensâ and âGenerated Completion Tokensâ (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn), ensuring consistent usage reporting. PTU deployments simply have additional metrics (like utilization percentage) to help you gauge your usage of the reserved capacity (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn).
To monitor your usage, use the Azure Portal: check the Metrics (or the provided dashboards) for token counts and, if applicable, utilization stats. You can see breakdowns per model deployment easily in the metrics view. For more granular per-user or per-key insights, consider fronting the service with API Management and using its token metrics capability or implement custom logging. All official guidance (Microsoft Learn docs and Azure Portal tools) confirms that token usage is counted the same regardless of PTU vs PAYG â the difference lies only in cost calculation and capacity management (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn) (Azure OpenAI Service provisioned throughput – Azure AI services | Microsoft Learn). By regularly checking the âTokens-Based Usageâ metrics and (for PTU) the âPTU Utilizationâ metrics in the Azure Portal, you can validate exactly how many tokens are being used and ensure that aligns with your expectations and billing model (Monitor Azure OpenAI Service – Azure AI services | Microsoft Learn).
Sources:
- Microsoft Azure OpenAI Monitoring Reference â showing token metrics apply to both PAYG and PTU deployments (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn) (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn).
- Azure OpenAI PTU Utilization and Throughput â utilization metric definition and usage in Azure Monitor (Monitoring data reference for Azure OpenAI – Azure AI services | Microsoft Learn) (Azure OpenAI Service provisioned throughput – Azure AI services | Microsoft Learn).
- Azure documentation on built-in dashboards for OpenAI (Tokens-Based Usage and PTU Utilization categories) (Monitor Azure OpenAI Service – Azure AI services | Microsoft Learn).
- Microsoft Q&A and Azure API Management docs â confirming PAYG vs PTU headers and methods for tracking usage per subscription (APIM) (Azure OpenAI PTU utilization – Microsoft Q&A) (Azure API Management policy reference – azure-openai-emit-token-metric | Microsoft Learn).