OpenAI API Performance

OpenAI has created a large API library, which is being used by millions of developers and companies around the world. We ran performance tests for every available OpenAI API - all the way from chat to image generation and fine tuning own models. The following is a comprehensive report of all of our findings, and also suggestions for improvements in case your application is suffering from slow requests to OpenAI, ChatGPT, DALL-E, and all the rest.

 

Complete OpenAI API Performance Benchmarks

Below is our complete summary of the benchmarks we performed on the OpenAI APIs. In all cases, we use the following base URL: https://api.openai.com/v1

For each test, we ran a set of 10 calls, and took note of the average response times, using PostMan to make all of the connections and calls.

When testing APIs like images or chat, we made sure to create several different inputs to try an capture variability in performance. We can safely state that the OpenAI team is very nicely using parallel threads, and therefore there isn't much difference between generating one image or three.

 

APIAPI FunctionTypeEndpointTime ms (avg)Model Used For TestsResponse Size
Models List Models GET /models 318    
Models Retrieve Model GET /models/babbage 198    
Completions (text) Create Completion POST /completions 636 gpt-3.5-turbo-instruct  
Chat (text) Chat Completion POST /chat/completions 22,484 gpt-3.5-turbo 250 tokens
Chat (text) Chat Completion POST /chat/completions 31,431 gpt-3.5-turbo 1000 tokens
Chat (text) Chat Completion POST /chat/completions 14,457 gpt-4 250 tokens
Chat (text) Chat Completion POST /chat/completions 18,974 gpt-4 1000 tokens
Images Create Image POST /images/generations 7,882 dall-e-2 1 image 1024x1024px
Images Create Image POST /images/generations 8,594 dall-e-2 2 images 1024x1024px
Images Create Image POST /images/generations 8,833 dall-e-2 3 images 1024x1024px
Images Create Image POST /images/generations 7,126 dall-e-2 1 image 512x512px
Images Create Image POST /images/generations 7,391 dall-e-2 2 images 512x512px
Images Create Image POST /images/generations 7,617 dall-e-2 3 images 512x512px
Images Create Image POST /images/generations 6,569 dall-e-2 1 image 256x256px
Images Create Image POST /images/generations 6,775 dall-e-2 2 images 256x256px
Images Create Image POST /images/generations 7,003 dall-e-2 3 images 256x256px
Images Create Image POST /images/generations 12,795 dall-e-3 1 image 1024x1024px
Images Create Image POST /images/generations 16,962 dall-e-3 1 image 1024x1792px
Images Create Image POST /images/generations 15,977 dall-e-3 1 image 1792x1024px
Images Create Image Edit POST /images/edits 12,546   1 image 1024x1024px
Images Create Image Edit POST /images/edits 13,441   2 images 1024x1024px
Images Create Image Edit POST /images/edits 13,587   3 images 1024x1024px
Images Create Image Edit POST /images/edits 9,901   1 image 512x512px
Images Create Image Edit POST /images/edits 10,622   2 images 512x512px
Images Create Image Edit POST /images/edits 10,978   3 images 512x512px
Images Create Image Edit POST /images/edits 9,150   1 image 256x256px
Images Create Image Edit POST /images/edits 9,093   2 images 256x256px
Images Create Image Edit POST /images/edits 9,438   3 images 256x256px
Images Create Image Variation POST /images/variations 9,885   1 image 1024x1024px
Images Create Image Variation POST /images/variations 10,231   2 images 1024x1024px
Images Create Image Variation POST /images/variations 10,654   3 images 1024x1024px
Images Create Image Variation POST /images/variations 8,139   1 image 512x512px
Images Create Image Variation POST /images/variations 8,456   2 images 512x512px
Images Create Image Variation POST /images/variations 8,861   3 images 512x512px
Images Create Image Variation POST /images/variations 7,302   1 image 256x256px
Images Create Image Variation POST /images/variations 7,564   2 images 256x256px
Images Create Image Variation POST /images/variations 7,847   3 images 256x256px
Speech To Text Create Transcription POST /audio/transcriptions 4,011 whisper-1 (60s audio mp3 192kbps)  
Speech To Text Create Transcription POST /audio/transcriptions 1,295 whisper-1 (10s audio mp3 192kbps)  
Embeddings Create Embedding POST /embeddings 205 text-embedding-ada-002 30-100 tokens
Files Upload File POST /files 711 10kb JSONL file  
Files Get Files GET /files 185   list of files
Files Get Files GET /files/:file_id 106   single file description
Files Get File Contents GET /files/:file_id/content 578 10kb JSONL file single file content
Fine-Tunes Create Fine Tune POST /fine-tunes 247 davinci  
Fine-Tunes List Fine Tunes GET /fine-tunes 231    
Fine-Tunes Retrieve Fine Tune GET /fine-tunes/:fine_tune_id 141    
Fine-Tunes Cancel Fine Tune POST /fine-tunes/:fine_tune_id/cancel 198    
Fine-Tunes List Fine Tune Events GET /fine-tunes/:fine_tune_id/events 128    
Fine-Tunes Delete Fine Tuned Model DEL /models/:model 139    
Fine-Tunes Use Fine Tuned Model POST /completions 655 similar to normal /completions call  
Moderations Check Moderation POST /moderations 331 text-moderation 10-500 tokens
Engines List Engines GET /engines 175    
Engines Retrieve Engine GET /engines/:engineId 134    

OpenAI API Average Response Times For Different Inputs and Models

  

How To Use The Above Performance Tests

When developing an application, it is often the case that there is slow performance in some specific area, and the dev team would need to start profiling and logging to see exactly what's happening. With the above performance metrics, hopefully this task will be easier, giving the application developers the chance to compare observed speeds with another team.

For example, if a certain API response is taking a long time, it is nice to see if anyone else has the same issue.

 

How The Benchmarks Can Change

We noticed that over time, OpenAI APIs are becoming faster. This is most likely due to additional hardware being commissioned to run the services, and an improving infrastructure overall. 

Likewise, the exact opposite can happen in case the APIs experience a spike in requests, such as when a new product is released or big news comes out.

 

Why Is the OpenAI Chat API Slow?

The worst performing OpenAI API is definitely the chat completion with the conversation object. This API comes has an average response time of 22 seconds - which is a lot! 

We believe the main reason for this is, the conversation object has to go through a lot of processing just to prepare the LLM context, which has to be referenced on every newly generated token. Chances are, the OpenAI team does not have much chance to improve on this, aside for getting faster machines.

 

How Can We Improve Our ChatGPT API Call Performance?

There are really only two options available to improve performance on the client side:

  • Use buffering - this will also give the same effect like ChatGPT has, to give the result one word or token at a time
  • Use the alternative API (base_url/completions) which only works on a single question/answer basis

 

The second option is a hidden gem, which may save many applications out there! We have a tutorial devoted to creating a fine-tuned model, and using this endpoint we got the GPT API to respond in under 0.25 seconds per request.

First, it works very fast (0.6sec per request on average).  Secondly, the results are quite good. Not as good as the complete model with the full conversation - but still very respectable.

We also ran some tests of just the completions endpoint and compared the results to the chat completions endpoint, and given the performance difference we will have a serious look at changing some of our applications.

 

Image Generation Performance Benchmarks

On average, with the OpenAI API, we can roughly expect the following performance speeds for images:

  • Generating a new image from text: ~7.5 sec
  • Generating a new image, from an existing image: ~8.7 sec
  • Filling an image cutout: ~10.9 sec

 

Depending on the output image size, performance will vary slightly, but not by a lot. So, we recommend to just work with the 1024x1024 pixel image sizes at all times. 

To get the best possible performance out of the API, stick to generating just one image and in the smaller possible resolution necessary (there are three options: 256x256px, 512x512px and 1024x1024px).

As a sidenote, we can make some assumptions about how the APIs work to create images.  There is one main Image Generation AI, which takes a prompt and a canvas, and fills it with content. The other two capabilities of filling cutouts and generating variations are just preparsing the input into this AI. This is only a guess, as the actual algorithms are kept in secret.

 

OpenAI API Unexpected Server Error

While performing tests, we can across an unexpected error, and here is what it looks like:

openai unknown api error

 

We accidentally sent an optional parameter with an empty value into the API. Hopefully this helps both the developers to catch this case, and also the OpenAI team to better handle such cases.

 

Which OpenAI APIs Are The Best Performers?

We firmly believe the best APIs are these: fine tuning and text completion

These APIs specifically perform very well, under 1 second per call, and given the work done by these functions, this level of performance was actually pleasant surprise! 

With these APIs, we can train custom models, and run them at amazing speeds - even faster than the main ChatGPT LLM.

 

What Is The Single Best OpenAI API?

We believe the Moderations API is highly underrated, and will eventually be used by all applications that use AI. It takes a single text input, and tells us if the text violates any terms of service for being unsafe for the user. Besides doing a good task, it also runs very fast and just around 0.3 sec per request.

 

Conclusion

OpenAI APIs are overall performing very well given the tasks at hand. It is no easy matter to generate new content on the fly, while making sure it is valuable and safe at the same time!

If you think you are ready for developing some OpenAI plugins, join the OpenAI developer waitlist, and make some plugins for ChatGPT.




The fields marked with * are required.

I have read the privacy policy.