8 May 2025 / Tutorials

Transcribing Pipe's Video Recordings With Replicate and OpenAI's Speech Recognition Model (Whisper)

This tutorial is part of a series of tutorials for integrating the Pipe Recording Platform with transcription services. We previously covered integrating with Amazon Transcribe and with ElevenLabs.

We'll guide you through integrating Pipe audio video and screen recording platform with OpenAI's Whisper model hosted by Replicate.

Pipe is a powerful platform for adding audio, video, or screen recording to your website or app. It provides a robust recording client together with full infrastructure and storage.

Replicate allows you to run open-source AI models via a cloud API. This means you don't have to be a machine learning expert or manage your own server infrastructure to be able to access and use powerful AI models from a broad range of types: image models, video models, audio models, and text models.

You can use open-source models that others have published, or, for the more tech-savvy, you can even bring your training data to create and fine-tune your own models or build and publish custom models altogether.

Our tutorial will focus on open-source audio models, specifically OpenAI's Whisper, as we will be tackling transcribing recordings made with Pipe.

Whisper is an automatic speech recognition (ASR) open-source AI model trained and released by OpenAI in various versions. It can be run locally or through Replicate.

The multilingual versions of the model support transcription in 57 languages and translation from these languages into English.

Replicate provides multiple versions of the Whisper model, each with pros and cons. Some are faster at transcribing but only in a few languages; others are cheaper but slower. A quick search on Replicate for 'whisper' will show all of them. You can browse and choose which one fits your needs. For our tutorial, we will use the original model released by OpenAI in its Large-v3 variant.

OpenAI recently released newer AI models for speech-to-text: gpt-4o-transcribe and gpt-4o-mini-transcribe, but these are not open source, and therefore not available through Replicate.

To get started, you will need the following:

A Pipe account (sign up here if you don't already have one). Both an active trial and any paid subscription will work for the purposes of this tutorial.
A Replicate account (sign up here). At the time of writing, the only way to sign up is via GitHub. Replicate's pricing is based on the hardware used to run the AI model. It is usage-based and billed by the second.

The tutorial is split into the following steps:

Pipe Platform configuration to extract the audio from the recordings
Getting your Replicate API token.
Implementing the Replicate API specific endpoint in a custom PHP script.
Triggering the custom script through Pipe's video_copied_pipe_s3 webhook.

Step 1. Configure the Pipe Platform to Extract Audio Data from Recordings

The Pipe Platform will handle audio and video recording for you. From my testing, we can feed video directly to the Replicate API. Still, they do not specify any file size limits, and you may hit some undisclosed limits, especially with screen recordings.

So let's set up Pipe to extract the audio as a separate file from the video files, ensuring we use smaller audio files as input. Pipe will automatically convert the audio to AAC in an MP4 container while preserving the original number of channels and sample rate.

Setting it up is very easy:

Sign in to your account dashboard and go to the transcoding section
Scroll down to Extract Audio and enable Extract audio data into a separate file.
Scroll down and Save.

That's all there is to it. Each time you make a new recording, a separate file containing just the audio will be created. The filename will have the name pattern STREAMNAME_audio.mp4.

Worth mentioning that for the purposes of this tutorial, we will be using Pipe's Complimentary Storage, so make sure the Do not store files option is disabled (default) in the storage section of the dashboard. The tutorial/transcription also works when you use your own storage.

Step 2. Get your Replicate API Token

We will be making requests to a specific Replicate API endpoint, which requires a valid API token to be accessed:

Sign in to your Replicate account.
Go to https://replicate.com/account/api-tokens.
Enter a token name and click Create token.
Copy the API token and save it in a secure location. We will use it later in our code implementation.

Creating a Replicate API token to be used when transcribing Pipe recordings

Step 3. Making a Transcribe Audio Request

With a valid API token, we can now make requests to the Replicate API endpoint to transcribe our MP4 audio recordings.

Each AI model, has its own specific playground page in the Replicate dashboard, that comes with all the knobs that you can control and that are also accessible via the API. Here is the one for OpenAI's Whisper.

One important note: Every AI model will have a different formatted API response output and will return specific kinds of data for your transcribing result. Depending on the Whisper model version you choose, you may need to change the custom code to correctly extract the transcribed text from the API response. You can see the response format on the Replicate playground, on the right side of the page.

Replicate provides extensive documentation that can be consulted for more information. They also offer various API access methods, such as NodeJS and Python client libraries.

For our tutorial, we will use Replicate’s HTTP API. A PHP script will implement it, which will be triggered by one of Pipe's webhooks.

The API call will be a simple POST request that provides, in its payload, the version (which is the model ID assigned by Replicate) and the audio (as part of the input object) which will be the audio recording URL hosted by Pipe on the complimentary storage. The Pipe Platform includes the URL in the payload of its storage webhooks.

Of course, you can use any other server-side script language; we chose PHP, and the code is available below. Here is a summary of what it does:

Defines the main variables you must fill in, such as the API token.
Verifies the webhook request's validity by ensuring it came from the Pipe platform via the webhook signature.
Checks and retrieves the data received from the Pipe Platform. We are interested in the audioFileUrl field.
Sends the audio file URL to the Replicate API via a POST request.
Receives the transcription result as a JSON object and writes the transcription result to disk.

The complete code is available below and is documented to be easily understood (webhook-handler.php):

<?php
$replicate_api_token  = "YOUR_REPLICATE_API_TOKEN";
$modelToUse = "8099696689d249cf8b122d833c36ac3f75505c666a395ca40ef26f68e7d3d16e"; // openai/whisper model ID on Replicate
$webhook_key = "YOUR_WEBHOOK_KEY";
$webhookURL = "YOUR_WEBHOOK_HANDLER_URL"; //https://YOUR_DOMAIN.com/webhook_handler.php

if($_SERVER['CONTENT_TYPE']=="application/json"){
  $webhookData=file_get_contents("php://input");
}else if ($_SERVER['CONTENT_TYPE']=="application/x-www-form-urlencoded"){
  $webhookData=$_POST["payload"];
}

$localSignature = generateSignature($webhook_key, $webhookURL , $webhookData);

$log = "\n--------------------------- " . date('d-m-Y H:i:s') . " --------------------------\n";
$log .= "---------------------------------------------------------------------\n";
$log .= "X-Pipe-Signature received from Pipe = ". $_SERVER['HTTP_X_PIPE_SIGNATURE']."\n";
$log .= "signature generated locally = ". $localSignature."\n";


// Check if the signature matches
if ($_SERVER['HTTP_X_PIPE_SIGNATURE'] !== $localSignature) {
    $log .= "Error: Signature mismatch. Possible tampering detected.\n";
    file_put_contents(dirname(__FILE__) . '/transcribe-replicate.log', $log . PHP_EOL, LOCK_EX | FILE_APPEND);
    die("Signature mismatch. Possible tampering detected.");
}

// Decode the JSON data from the webhook
$webhookData = json_decode($webhookData, true);
if ($webhookData === null) {
    $log .= "Error: Failed to decode JSON data from the webhook.\n";
    die();
}

// Check if the required keys are present in the webhook data
if(!isset($webhookData['data']['audioFileUrl'])) {
    $log .= "Error: 'audioFileUrl' not found in the webhook data.\n";
    die();
}

// Only process the webhook if the event is 'video_copied_pipe_s3'
if($webhookData['event'] !== 'video_copied_pipe_s3') {
    die();
}

//Get the audio file URL from the webhook data
$myAudioS3Url = $webhookData['data']['audioFileUrl'];

$log .= "Attempting to transcribe audio from URL '{$myAudioS3Url}'...\n";
$log .= "Using Model: {$modelToUse}\n";

// Call the function to transcribe the audio file
$result = transcribeAudio($replicate_api_token , $myAudioS3Url,$modelToUse);

$log .= "\n--- Result ---\n";
if ($result['success']) {
    $log .= "Transcription Request Successful:\n";
    $log .= "---------------------------------\n";
    $log .= $result['data'] . "\n";
    $log .= "---------------------------------\n";
} else {
    $log .= "Transcription Failed:\n";
    $log .= $result['data'] . "\n";
}
$log .= "--------------\n";

// Log the result to a file
file_put_contents(dirname(__FILE__) . '/transcribe-replicate.log', $log . PHP_EOL, LOCK_EX | FILE_APPEND);

echo json_encode($result, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);

function transcribeAudio($apiKey, $audioFileUrl, $modelId){
    
    // Set the Replicate API endpoint URL
    $apiUrl = "https://api.replicate.com/v1/predictions";

    // Validate inputs
    if (empty($apiKey)) {
        return ['success' => false, 'data' => 'API key is required.'];
    }
    if (empty($audioFileUrl) || !filter_var($audioFileUrl, FILTER_VALIDATE_URL)) {
         return ['success' => false, 'data' => "Invalid or empty audio file URL provided."];
    }
    
    if (empty($modelId)) {
        return ['success' => false, 'data' => 'Model ID is required.'];
    }
    
    // Prepare the data to be sent in the POST request
    $postData = [
      'version' => $modelId,
      'input' => [
          'audio' => $audioFileUrl
      ]
  ];

    // Initialize cURL session
    $ch = curl_init($apiUrl);

    // Set cURL options
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_POST, true);          
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($postData));
    curl_setopt($ch, CURLOPT_HTTPHEADER, [
      'Authorization: Bearer ' . $apiKey,
      'Content-Type: application/json',
      'Prefer: wait'
    ]);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT, 180);        
    curl_setopt($ch, CURLOPT_FAILONERROR, false); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
    
    $response = curl_exec($ch);
    $httpStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    $curlErrorNo = curl_errno($ch);
    $curlError = curl_error($ch);

    curl_close($ch);
    
    // Check for cURL errors
    if ($curlErrorNo > 0) {
        return ['success' => false, 'data' => "cURL Error ({$curlErrorNo}): {$curlError}"];
    }

    // Check for HTTP errors
    if ($httpStatusCode >= 200 && $httpStatusCode < 300) {
      $responseData = json_decode($response, true);

      if (json_last_error() !== JSON_ERROR_NONE) {
          return ['success' => false, 'data' => "Failed to decode JSON response. Error: " . json_last_error_msg() . ". Response: " . $response];
      }

      // Check if the expected 'output' key exists with transcription data
      if (isset($responseData['output']['transcription'])) {
          return ['success' => true, 'data' => $responseData['output']['transcription']];
      } else {  
          return ['success' => false, 'data' => "Received successful status code ({$httpStatusCode}) but response format is unexpected (missing 'output' key?): " . $response];
      }

    } else {
      // Handle API errors (4xx, 5xx)
      $errorData = json_decode($response, true);
      $errorMessage = "API Error (HTTP {$httpStatusCode}): ";

      if ($errorData && isset($errorData['detail'])) {
          if (is_array($errorData['detail'])) {
              $errorMessage .= json_encode($errorData['detail']);
          } elseif (is_string($errorData['detail'])) {
              $errorMessage .= $errorData['detail'];
          } else {
              $errorMessage .= $response; // Fallback
          }
      } else {
          $errorMessage .= $response; // Fallback if JSON decoding fails or structure is different
      }
      return ['success' => false, 'data' => $errorMessage];
    }
}


function generateSignature($key, $url,  $jsonData){
  $data_to_sign = $url . $jsonData;
  return base64_encode(hash_hmac('sha1', $data_to_sign, $key, true));
}

?>

What you need to do:

Host the above script on a web server so that the Pipe processing servers can trigger it.
Add your Replicate API token, the Pipe webhook key (available when creating or editing a Pipe webhook), and the Pipe webhook URL (as we will see in Step 4).

Things to note here:

The language will be automatically detected, but can also be specified via the language attribute of the input object.
The recording file passed on to Replicate via the URL provided by the Pipe webhook must be accessible through HTTPS. When using Pipe's complimentary storage powered by Amazon S3 and Scaleway, we'll give you proper links to the files on our storage. The links are private for as long as you keep them private.
The API's response time depends on the model's temperature as expressed by Replicate. You'll get a faster response if the AI model is warm and running, and a slower response if it is cold and needs to start up first. Generally, the more popular the model is, the more it is used and the higher the chance of being warm and giving fast responses.
The response time also varies based on the recording length and the amount of speech in that particular recording; the bigger the recording, the longer it will take to be transcribed, resulting in a longer response time. From my tests, a 10-second recording took around 2-3 seconds to make the transcription.

With the code implemented, we only need to ensure it will be triggered.

3.1 Audio Transcription Details

The Replicate API returns a detailed JSON object, but as I mentioned before, it differs from AI model to AI model. This Large-v3 variant of the AI model allows you to choose between 3 different transcription formats: plain text, srt, and vtt. All types will be returned as part of a JSON object, as a JSON string specifically formatted based on your chosen format. For example, the SRT format will be a string that includes timestamps.

In our PHP webhook script, we write the transcription (the value of the transcription attribute of the output object) to disk via the local execution log, and we also return it to the Pipe Platform as the webhook response. Therefore, you can view part of the result of your transcription in your Pipe account dashboard webook logs.

Moving forward, it is up to your imagination what you can do with the transcription and how you choose to implement it further.

The full JSON object can be seen in the Replicate playground page.

You can also view all your API requests in the Replicate dashboard's predictions section.

Step 4. Configure the Pipe Webhook

Let's configure Pipe to execute our PHP webhook handler implemented above whenever a new recording (together with all associated files, including the extracted audio file) finishes uploading to Pipe's complimentary storage.

Go to https://dashboard.addpipe.com/webhooks and click on Add New Webhook.
In the Webhook URL field, enter the URL for where you hosted the webhook-handler.php file (e.g., https://your_domain.com/webhook/webhook-handler.php).
For the event, make sure you select video_copied_pipe_s3.
For content type, it is recommended that you leave the default: application/json.
Copy and paste the webhook key from here into the code above.
Make sure the Active box is checked.
Click Save Webhook.

If you need more help, check out the Pipe docs on setting up a webhook and handling webhook data.

Now you have all the necessary pieces to transcribe audio, video, and screen recordings.

Go ahead and make a new recording using Pipe. Once the recording files are pushed to storage, the Pipe webhook triggers and requests the Replicate API endpoint to transcribe the extracted audio using Whisper.

Pushing Recordings From Your Own S3 Storage to Replicate

As a Pipe PRO subscriber, you can push your recordings to your Amazon S3 bucket or other S3-compatible services.

If you do so, you need to use the video_copied_s3 webhook instead:

When creating the webhook, choose the video_copied_s3 event instead of video_copied_pipe_s3.
In your webhook handler script (PHP code), listen for the video_copied_s3 webhook event.