OpenAI Moderation API Example

Keeping our applications moderated is really a problem when we open them up to the world. Some users will try all kinds of tricks to create an issue or there are also just accidental copy/pastes happening.

It is vital to moderate all user inputs with the OpenAI API, and the reason is there are terms of service in effect, which may render an application as undesirable in the ecosystem. This article provides example code, and API requests with responses to demonstrate how moderation works.

This is often a missing piece in AI applications, and not even mentioned by most online resources, and that is why we cover it.

What Is Content Moderation?

One of OpenAI's main objectives is to make AI safe to use. Hence, any misuse of the tools (such as ChatGPT, DALL-E, Whisper, etc) will result in direct action from OpenAI. We do not really know what this action is, but most likely there may be a ban on the account in question.

Simple PHP Client For OpenAI Moderation API

Below is an implementation for a very simple PHP client, which you can use to make a connection to the Moderation API at OpenAI. We assume you already have a valid API key (if not this tutorial will guide you).

The moderation API is just a single endpoint where up to 2000 characters of text can be sent at a time.


<?php
class OpenAIModerationSimpleClient {
    private static $open_ai_key = 'your-openai-chatgpt-api-key-goes-here';
    
    private static $open_ai_url = 'https://api.openai.com/v1'; //current version of the API endpoint
    
    /**
     * Doc: https://platform.openai.com/docs/api-reference/moderations/create
     * @param string $input text to classify
     * @param string $model valid options are "text-moderation-latest", "text-moderation-stable"
     */
    public static function moderate($input, $maxTokens=100, $model='text-moderation-latest') {
        //create message to post
        $message = new stdClass();
        $message -> input = $input;
        $message -> model = $model;
        
        $result = self::_sendMessage('/moderations', json_encode($message));
        
        return $result;
    }
    
    private static function _sendMessage($endpoint, $data = '', $method = 'post') {
        $apiEndpoint = self::$open_ai_url.$endpoint;
        
        $curl = curl_init();
        
        if($method == 'post') {
            $params = array(
                CURLOPT_URL => $apiEndpoint,
                CURLOPT_SSL_VERIFYHOST => false,
                CURLOPT_SSL_VERIFYPEER => false,
                CURLOPT_RETURNTRANSFER => true,
                CURLOPT_MAXREDIRS => 10,
                CURLOPT_TIMEOUT => 90,
                CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
                CURLOPT_CUSTOMREQUEST => "POST",
                CURLOPT_NOBODY => false,
                CURLOPT_HTTPHEADER => array(
                  "content-type: application/json",
                  "accept: application/json",
                  "authorization: Bearer ".self::$open_ai_key
                )
            );
            curl_setopt_array($curl, $params);
            curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
        } else if($method == 'get') {
            $params = array(
                CURLOPT_URL =>  $apiEndpoint . ($data!=''?('?'.$data):''),
                CURLOPT_SSL_VERIFYHOST => false,
                CURLOPT_SSL_VERIFYPEER => false,
                CURLOPT_RETURNTRANSFER => true,
                CURLOPT_MAXREDIRS => 10,
                CURLOPT_TIMEOUT => 90,
                CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
                CURLOPT_CUSTOMREQUEST => "GET",
                CURLOPT_NOBODY => false,
                CURLOPT_HTTPHEADER => array(
                  "content-type: application/json",
                  "accept: application/json",
                  "authorization: Bearer ".self::$open_ai_key
                )
            );
            curl_setopt_array($curl, $params);
        }
        
        $response = curl_exec($curl);
        
        curl_close($curl);
        
        $data = json_decode($response, true);
        if(!is_array($data)) return array();
        
        return $data;
    }
}

Sending an Example Message to Moderate

We will immediately follow up on the simple client, with a simple example using it:


<?php
include_once('./OpenAIModerationSimpleClient.php');
$response = OpenAIModerationSimpleClient::moderate("Hello, how are you?");
print_r($response);

Please remember to place the simple client in the same directory as the test message above.

Looking at the response from OpenAI below, note the different moderation classification categories, and we expected the text to pass the moderation check:


Array
(
    [id] => modr-8Gz8koDZPdAr69gYO3tTV9CeUlJBQ
    [model] => text-moderation-006
    [results] => Array
        (
            [0] => Array
                (
                    [flagged] =>
                    [categories] => Array
                        (
                            [sexual] =>
                            [hate] =>
                            [harassment] =>
                            [self-harm] =>
                            [sexual/minors] =>
                            [hate/threatening] =>
                            [violence/graphic] =>
                            [self-harm/intent] =>
                            [self-harm/instructions] =>
                            [harassment/threatening] =>
                            [violence] =>
                        )
                    [category_scores] => Array
                        (
                            [sexual] => 1.3550932635553E-5
                            [hate] => 2.3837439755425E-7
                            [harassment] => 3.7592290027533E-6
                            [self-harm] => 1.7858912571E-8
                            [sexual/minors] => 8.3438955300608E-8
                            [hate/threatening] => 6.2585550075767E-9
                            [violence/graphic] => 2.2840293922854E-7
                            [self-harm/intent] => 3.4014193683873E-9
                            [self-harm/instructions] => 4.2348307083273E-9
                            [harassment/threatening] => 9.8226074385366E-8
                            [violence] => 1.6770583215475E-6
                        )
                )
        )
)

What Does a Flagged For Moderation Response Look Like?

We will not show the actual request here and leave that up to your imagination, but below is an example flagged response. Notice that "flagged" = true, and this is all that matters. Also, there are further classifications for specific reasons for the flagging, with numeric values for each one.

The recommended way to deal with such requests is to just not send them for further processing to OpenAI, and show the user a warning message.


Array
(
    [id] => modr-8Gzba9trAgc4CNsnAYmZ3PSnc0UMb
    [model] => text-moderation-006
    [results] => Array
        (
            [0] => Array
                (
                    [flagged] => 1
                    [categories] => Array
                        (
                            [sexual] =>
                            [hate] =>
                            [harassment] => 1
                            [self-harm] => 1
                            [sexual/minors] =>
                            [hate/threatening] =>
                            [violence/graphic] =>
                            [self-harm/intent] => 1
                            [self-harm/instructions] => 1
                            [harassment/threatening] => 1
                            [violence] => 1
                        )
                    [category_scores] => Array
                        (
                            [sexual] => 0.0056698704138398
                            [hate] => 0.0047370679676533
                            [harassment] => 0.99877089262009
                            [self-harm] => 0.9786776304245
                            [sexual/minors] => 1.9241338122811E-6
                            [hate/threatening] => 0.00058177195023745
                            [violence/graphic] => 0.00713293813169
                            [self-harm/intent] => 0.99002480506897
                            [self-harm/instructions] => 0.98893576860428
                            [harassment/threatening] => 0.51154780387878
                            [violence] => 0.66159009933472
                        )
                )
        )
)

How Much Did The OpenAI Moderation API Cost?

The OpenAI moderation API is free. We applaud the OpenAI team for making this decision, and making such a critical part of our applications not only run fast but also cost nothing. At this point there is no excuse to not use it.

Conclusion

Going through OpenAI's documentation, we have learned the following about moderation:

The moderation classification model is continuously being improved
Non-English texts have little support, but this is also being improved
The "flagged" attribute is what matters most, and we should not rely on the other numeric values (avoid recalibration)

Hopefully this article has been helpful, especially in the pursuit of building a safe AI application for your users. If you feel like you are ready to create some awesome OpenAI plugins, have a look at how to join the OpenAI developer waitlist.