OpenAI API Performance

OpenAI has created a large API library, which is being used by millions of developers and companies around the world. We ran performance tests for every available OpenAI API - all the way from chat to image generation and fine tuning own models. The following is a comprehensive report of all of our findings, and also suggestions for improvements in case your application is suffering from slow requests to OpenAI, ChatGPT, DALL-E, and all the rest.

Complete OpenAI API Performance Benchmarks

Below is our complete summary of the benchmarks we performed on the OpenAI APIs. In all cases, we use the following base URL: https://api.openai.com/v1

For each test, we ran a set of 10 calls, and took note of the average response times, using PostMan to make all of the connections and calls.

When testing APIs like images or chat, we made sure to create several different inputs to try an capture variability in performance. We can safely state that the OpenAI team is very nicely using parallel threads, and therefore there isn't much difference between generating one image or three.

API	API Function	Type	Endpoint	Time ms (avg)	Model Used For Tests	Response Size
Models	List Models	GET	/models	318
Models	Retrieve Model	GET	/models/babbage	198
Completions (text)	Create Completion	POST	/completions	636	gpt-3.5-turbo-instruct
Chat (text)	Chat Completion	POST	/chat/completions	22,484	gpt-3.5-turbo	250 tokens
Chat (text)	Chat Completion	POST	/chat/completions	31,431	gpt-3.5-turbo	1000 tokens
Chat (text)	Chat Completion	POST	/chat/completions	14,457	gpt-4	250 tokens
Chat (text)	Chat Completion	POST	/chat/completions	18,974	gpt-4	1000 tokens
Images	Create Image	POST	/images/generations	7,882	dall-e-2	1 image 1024x1024px
Images	Create Image	POST	/images/generations	8,594	dall-e-2	2 images 1024x1024px
Images	Create Image	POST	/images/generations	8,833	dall-e-2	3 images 1024x1024px
Images	Create Image	POST	/images/generations	7,126	dall-e-2	1 image 512x512px
Images	Create Image	POST	/images/generations	7,391	dall-e-2	2 images 512x512px
Images	Create Image	POST	/images/generations	7,617	dall-e-2	3 images 512x512px
Images	Create Image	POST	/images/generations	6,569	dall-e-2	1 image 256x256px
Images	Create Image	POST	/images/generations	6,775	dall-e-2	2 images 256x256px
Images	Create Image	POST	/images/generations	7,003	dall-e-2	3 images 256x256px
Images	Create Image	POST	/images/generations	12,795	dall-e-3	1 image 1024x1024px
Images	Create Image	POST	/images/generations	16,962	dall-e-3	1 image 1024x1792px
Images	Create Image	POST	/images/generations	15,977	dall-e-3	1 image 1792x1024px
Images	Create Image Edit	POST	/images/edits	12,546		1 image 1024x1024px
Images	Create Image Edit	POST	/images/edits	13,441		2 images 1024x1024px
Images	Create Image Edit	POST	/images/edits	13,587		3 images 1024x1024px
Images	Create Image Edit	POST	/images/edits	9,901		1 image 512x512px
Images	Create Image Edit	POST	/images/edits	10,622		2 images 512x512px
Images	Create Image Edit	POST	/images/edits	10,978		3 images 512x512px
Images	Create Image Edit	POST	/images/edits	9,150		1 image 256x256px
Images	Create Image Edit	POST	/images/edits	9,093		2 images 256x256px
Images	Create Image Edit	POST	/images/edits	9,438		3 images 256x256px
Images	Create Image Variation	POST	/images/variations	9,885		1 image 1024x1024px
Images	Create Image Variation	POST	/images/variations	10,231		2 images 1024x1024px
Images	Create Image Variation	POST	/images/variations	10,654		3 images 1024x1024px
Images	Create Image Variation	POST	/images/variations	8,139		1 image 512x512px
Images	Create Image Variation	POST	/images/variations	8,456		2 images 512x512px
Images	Create Image Variation	POST	/images/variations	8,861		3 images 512x512px
Images	Create Image Variation	POST	/images/variations	7,302		1 image 256x256px
Images	Create Image Variation	POST	/images/variations	7,564		2 images 256x256px
Images	Create Image Variation	POST	/images/variations	7,847		3 images 256x256px
Speech To Text	Create Transcription	POST	/audio/transcriptions	4,011	whisper-1 (60s audio mp3 192kbps)
Speech To Text	Create Transcription	POST	/audio/transcriptions	1,295	whisper-1 (10s audio mp3 192kbps)
Embeddings	Create Embedding	POST	/embeddings	205	text-embedding-ada-002	30-100 tokens
Files	Upload File	POST	/files	711	10kb JSONL file
Files	Get Files	GET	/files	185		list of files
Files	Get Files	GET	/files/:file_id	106		single file description
Files	Get File Contents	GET	/files/:file_id/content	578	10kb JSONL file	single file content
Fine-Tunes	Create Fine Tune	POST	/fine-tunes	247	davinci
Fine-Tunes	List Fine Tunes	GET	/fine-tunes	231
Fine-Tunes	Retrieve Fine Tune	GET	/fine-tunes/:fine_tune_id	141
Fine-Tunes	Cancel Fine Tune	POST	/fine-tunes/:fine_tune_id/cancel	198
Fine-Tunes	List Fine Tune Events	GET	/fine-tunes/:fine_tune_id/events	128
Fine-Tunes	Delete Fine Tuned Model	DEL	/models/:model	139
Fine-Tunes	Use Fine Tuned Model	POST	/completions	655	similar to normal /completions call
Moderations	Check Moderation	POST	/moderations	331	text-moderation	10-500 tokens
Engines	List Engines	GET	/engines	175
Engines	Retrieve Engine	GET	/engines/:engineId	134

OpenAI API Average Response Times For Different Inputs and Models

How To Use The Above Performance Tests

When developing an application, it is often the case that there is slow performance in some specific area, and the dev team would need to start profiling and logging to see exactly what's happening. With the above performance metrics, hopefully this task will be easier, giving the application developers the chance to compare observed speeds with another team.

For example, if a certain API response is taking a long time, it is nice to see if anyone else has the same issue.

How The Benchmarks Can Change

We noticed that over time, OpenAI APIs are becoming faster. This is most likely due to additional hardware being commissioned to run the services, and an improving infrastructure overall.

Likewise, the exact opposite can happen in case the APIs experience a spike in requests, such as when a new product is released or big news comes out.

Why Is the OpenAI Chat API Slow?

The worst performing OpenAI API is definitely the chat completion with the conversation object. This API comes has an average response time of 22 seconds - which is a lot!

We believe the main reason for this is, the conversation object has to go through a lot of processing just to prepare the LLM context, which has to be referenced on every newly generated token. Chances are, the OpenAI team does not have much chance to improve on this, aside for getting faster machines.

How Can We Improve Our ChatGPT API Call Performance?

There are really only two options available to improve performance on the client side:

Use buffering - this will also give the same effect like ChatGPT has, to give the result one word or token at a time
Use the alternative API (base_url/completions) which only works on a single question/answer basis

The second option is a hidden gem, which may save many applications out there! We have a tutorial devoted to creating a fine-tuned model, and using this endpoint we got the GPT API to respond in under 0.25 seconds per request.

First, it works very fast (0.6sec per request on average). Secondly, the results are quite good. Not as good as the complete model with the full conversation - but still very respectable.

We also ran some tests of just the completions endpoint and compared the results to the chat completions endpoint, and given the performance difference we will have a serious look at changing some of our applications.

Image Generation Performance Benchmarks

On average, with the OpenAI API, we can roughly expect the following performance speeds for images:

Generating a new image from text: ~7.5 sec
Generating a new image, from an existing image: ~8.7 sec
Filling an image cutout: ~10.9 sec

Depending on the output image size, performance will vary slightly, but not by a lot. So, we recommend to just work with the 1024x1024 pixel image sizes at all times.

To get the best possible performance out of the API, stick to generating just one image and in the smaller possible resolution necessary (there are three options: 256x256px, 512x512px and 1024x1024px).

As a sidenote, we can make some assumptions about how the APIs work to create images. There is one main Image Generation AI, which takes a prompt and a canvas, and fills it with content. The other two capabilities of filling cutouts and generating variations are just preparsing the input into this AI. This is only a guess, as the actual algorithms are kept in secret.

OpenAI API Unexpected Server Error

While performing tests, we can across an unexpected error, and here is what it looks like:

openai unknown api error

We accidentally sent an optional parameter with an empty value into the API. Hopefully this helps both the developers to catch this case, and also the OpenAI team to better handle such cases.

Which OpenAI APIs Are The Best Performers?

We firmly believe the best APIs are these: fine tuning and text completion.

These APIs specifically perform very well, under 1 second per call, and given the work done by these functions, this level of performance was actually pleasant surprise!

With these APIs, we can train custom models, and run them at amazing speeds - even faster than the main ChatGPT LLM.

What Is The Single Best OpenAI API?

We believe the Moderations API is highly underrated, and will eventually be used by all applications that use AI. It takes a single text input, and tells us if the text violates any terms of service for being unsafe for the user. Besides doing a good task, it also runs very fast and just around 0.3 sec per request.

Conclusion

OpenAI APIs are overall performing very well given the tasks at hand. It is no easy matter to generate new content on the fly, while making sure it is valuable and safe at the same time!

If you think you are ready for developing some OpenAI plugins, join the OpenAI developer waitlist, and make some plugins for ChatGPT.