30 April 2025 / Tutorials

Transcribing Pipe's Video Recordings With ElevenLabs' New Speech-to-Text Model (Scribe)

In this tutorial, we will cover integrating the Pipe audio video and screen recording platform with ElevenLabs' new speech-to-text model, Scribe.

The Pipe Platform allows you to add an audio, video or screen recorder to your website or web app. It has a powerful recording client and it will take care of ingestion, processing and storage.

ElevenLabs is a leading AI voice technology company specializing in ultra-realistic speech synthesis, voice cloning, multilingual audio generation and speech to text. We will focus solely on their new automatic speech recognition model, Scribe.

Scribe was recently launched. It supports 99 languages, speaker diarization, character-level timestamps, and non-speech events such as laughing.

This is what you'll need to get started:

A Pipe account (sign up here if you don't already have one). Both an active trial and any paid subscription will work for the purposes of this tutorial.
An ElevenLabs account (sign up here). Their free tier includes 2 hours and 30 minutes of audio transcription through their API (which is what we'll be using). Check out their pricing page for more details.

There are a few steps in this tutorial:

configure the Pipe Platform to extract the audio from recordings
get your ElevenLabs API key
integrate the ElevenLabs API speech-to-text endpoint in a custom PHP script
trigger the script through Pipe's video_copied_pipe_s3 webhook.

Step 1. Configure the Pipe Platform to Extract Audio Data from Recordings

The Pipe Platform will handle audio and video recording for you. We could feed videos directly to the ElevenLabs API, but by doing so, we risk hitting their 1GB file size limit faster, especially with screen recordings.

So, we'll se it up so that for video files it will extract the audio as a separate audio only file. Pipe will automatically convert the audio to AAC in an MP4 container while preserving the original number of channels and sample rate.

To set up the Pipe Platform to extract the audio, follow these steps:

Sign in to your account dashboard and go to the transcoding section
Scroll down to Extract Audio and enable Extract audio data into a separate file.
Scroll down and Save

That's it! Now, with each new recording, a separate file containing just the audio will be created. The filename will have the format STREAMNAME_audio.mp4.

One more small thing: for the purposes of this tutorial, we will be using Pipe's Complimentary Storage, so make sure the Do not store files option is disabled (default) in the storage section of the dashboard. The tutorial/transcription also works when you use your own storage.

Step 2. Obtain an ElevenLabs API Key

To make requests to any API endpoint, we need to have a valid ElevenLabs API key, so let's generate one:

Sign in to your ElevenLabs account
Go to https://elevenlabs.io/app/settings/api-keys
Click on Create API Key
Name your API key so you can easily recognize it later
You can also choose to restrict the kind of access the key has. In our case, it should only have access to Speech to Text
Click Create
Save the generated API key in a secure location locally until we use it later in the code

Creating a new ElevenLabs API key to be used for transcribing Pipe recordings

Step 3. Making a Transcribe Audio Request

Now that we have our API key, we can make requests to the ElevenLabs API endpoint to take our mp4 audio recordings and transcribe them.

The API call against the ElevenLabs API will be made by a PHP script which will be triggered by one of Pipe's webhooks.

The API call involves a simple POST request that specifies only the model_id (only scribe_v1 and scribe_v1_experimental are available) and the cloud_storage_url which is the URL to the recording as hosted by us on our complimentary storage. The Pipe Platform includes the URL with the storage webhooks data. For more information about the ElevenLabs' API check out their speech-to-text documentation.

The PHP script we used is available below. You can use any kind of server-side script like JavaScript under Node or an AWS Lambda function, we'll just use PHP. Here's what it does:

starts by defining a few variables that you have to fill in
verifies that the webhook request came from our platform using the webhook key and that the request is of type video_copied_pipe_s3
reads and verifies the data received from Pipe (we're interested in the audioFileUrl field)
sends the audio file URL to the ElevenLabs transcription API as a new POST request
receives the transcription result as a JSON object and writes it to disk

Here is the complete webhook handler code. The code is documented to be easily understood:

<?php
$elevenLabsAPIKey = '';
$pipeWebhookKey = "";
$pipeWebhookURL = ""; //https://YOUR_DOMAIN.com/webhook_handler.php

$modelToUse = 'scribe_v1';

if($_SERVER['CONTENT_TYPE']=="application/json"){
  $webhookData=file_get_contents("php://input");
}else if ($_SERVER['CONTENT_TYPE']=="application/x-www-form-urlencoded"){
  $webhookData=$_POST["payload"];
}

$localSignature = generateSignature($pipeWebhookKey, $pipeWebhookURL , $webhookData);

$log = "\n--------------------------- " . date('d-m-Y H:i:s') . " --------------------------\n";
$log .= "---------------------------------------------------------------------\n";
$log .= "X-Pipe-Signature received from Pipe = ". $_SERVER['HTTP_X_PIPE_SIGNATURE']."\n";
$log .= "signature generated locally = ". $localSignature."\n";

// Check if the signature matches
if ($_SERVER['HTTP_X_PIPE_SIGNATURE'] !== $localSignature) {
    $log .= "Error: Signature mismatch. Possible tampering detected.\n";
    file_put_contents(dirname(__FILE__) . '/transcribe.log', $log . PHP_EOL, LOCK_EX | FILE_APPEND);
    die("Signature mismatch. Possible tampering detected.");
}

// Decode the JSON data from the webhook
$webhookData = json_decode($webhookData, true);
if ($webhookData === null) {
    $log .= "Error: Failed to decode JSON data from the webhook.\n";
    die();
}

// Only process the webhook if the event is 'video_copied_pipe_s3'
if($webhookData['event'] !== 'video_copied_pipe_s3') {
    $log .= "Error: Webhook script triggered but not for the video_copied_pipe_s3 event.\n";
    die();
}

// Check if the required keys are present in the webhook data
if(!isset($webhookData['data']['audioFileUrl'])) {
    $log .= "Error: 'audioFileUrl' not found in the webhook data.\n";
    die();
}

//Get the audio file URL from the webhook payload
$myAudioS3Url = $webhookData['data']['audioFileUrl'];

$log .= "Attempting to transcribe audio from URL '{$myAudioS3Url}'...\n";
$log .= "Using Model: {$modelToUse}\n";

// Call the function to transcribe the audio file
$result = transcribeRecording($elevenLabsAPIKey, $myAudioS3Url,$modelToUse);

$log .= "\n--- Result ---\n";
if ($result['success']) {
    $log .= "Transcription Request Successful:\n";
    $log .= "---------------------------------\n";
    $log .= $result['data'] . "\n";
    $log .= "---------------------------------\n";
} else {
    $log .= "Transcription Failed:\n";
    $log .= $result['data'] . "\n";
}
$log .= "--------------\n";

// Log the result to a file
file_put_contents(dirname(__FILE__) . '/transcribe.log', $log . PHP_EOL, LOCK_EX | FILE_APPEND);

echo json_encode($result, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);

function transcribeRecording($apiKey, $recordingFileURL, $modelId = 'scribe_v1'){
    
    // Set the API endpoint URL
    $apiUrl = "https://api.elevenlabs.io/v1/speech-to-text";

    // Validate inputs
    if (empty($apiKey)) {
        return ['success' => false, 'data' => 'API key is required.'];
    }
    if (empty($recordingFileURL) || !filter_var($recordingFileURL, FILTER_VALIDATE_URL)) {
         return ['success' => false, 'data' => "Invalid or empty audio file URL provided."];
    }
    
    if (empty($modelId)) {
        return ['success' => false, 'data' => 'Model ID is required.'];
    }
    
    // Prepare the data to be sent in the POST request
    $postData = [
        'model_id' => $modelId,
        'cloud_storage_url' => $recordingFileURL,
    ];

    // Initialize cURL session
    $ch = curl_init();

    // Set cURL options
    curl_setopt($ch, CURLOPT_URL, $apiUrl);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_POST, true);          
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData); 
    curl_setopt($ch, CURLOPT_HTTPHEADER, [
      'Accept: application/json',
      'xi-api-key: ' . $apiKey
    ]);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT, 180);        
    curl_setopt($ch, CURLOPT_FAILONERROR, false); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
    
    $response = curl_exec($ch);
    $httpStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    $curlErrorNo = curl_errno($ch);
    $curlError = curl_error($ch);

    curl_close($ch);

    // Check for cURL errors
    if ($curlErrorNo > 0) {
        return ['success' => false, 'data' => "cURL Error ({$curlErrorNo}): {$curlError}"];
    }

    // Calculate the filename: extract the path component of the URL
    $path = parse_url($recordingFileURL, PHP_URL_PATH);

    // Calculate the filename: Get the filename from the path
    $filename = basename($path); // u5GV4dQwbThOApzBBP55xYmnulza9aAR_audio.mp4

    // Calculate the filename: Remove the '_audio.mp4' suffix
    $baseName = str_replace('_audio.mp4', '', $filename);

    // Check for HTTP errors
    if ($httpStatusCode >= 200 && $httpStatusCode < 300) {
      $responseData = json_decode($response, true);

      if (json_last_error() !== JSON_ERROR_NONE) {
          return ['success' => false, 'data' => "Failed to decode JSON response. Error: " . json_last_error_msg() . ". Response: " . $response];
      }

      // Check if the expected 'text' key exists. This is the resulted transcribed audio.
      if (isset($responseData['text'])) {
          // Write the JSON response from ElevenLabs disk
          file_put_contents($baseName.'.json', $response);
          
          //Return the transcribed text
          return ['success' => true, 'data' => $responseData['text']];
      } else {  
          return ['success' => false, 'data' => "Received successful status code ({$httpStatusCode}) but response format is unexpected (missing 'text' key?): " . $response];
      }

    } else {
      // Handle API errors (4xx, 5xx)
      $errorData = json_decode($response, true);
      $errorMessage = "API Error (HTTP {$httpStatusCode}): ";

      if ($errorData && isset($errorData['detail'])) {
          if (is_array($errorData['detail'])) {
              $errorMessage .= json_encode($errorData['detail']);
          } elseif (is_string($errorData['detail'])) {
              $errorMessage .= $errorData['detail'];
          } else {
              $errorMessage .= $response; // Fallback
          }
      } else {
          $errorMessage .= $response; // Fallback if JSON decoding fails or structure is different
      }
      return ['success' => false, 'data' => $errorMessage];
    }
}

//generate local webhook signature
function generateSignature($key, $url,  $jsonData){
  $data_to_sign = $url . $jsonData;
  return base64_encode(hash_hmac('sha1', $data_to_sign, $key, true));
}

?>

TODO:

Host this script on a web server in a location where it can be triggered by the Pipe processing servers.
Add your own ElevenLabs API key (see above), the Pipe webhook key (available after creating the webhook or when editing it ), and the Pipe webhook URL (see below) at the top of the script.

Things to note here:

I'm not specifying a language code, which means the language will be automatically detected. However, you can specify a language code via the language_code query parameter.
The recording file (passed to ElevenLabs through cloud_storage_url ) must be accessible trough a HTTPS call. When using Pipe's complimentary storage powered by Amazon S3 and Scaleway we'll give you proper links to the files on our storage. The links to are private for as long as you keep them private. With this tutorial the links are passed to ElevenLabs.
ElevenLabs will accept "files up to 1 GB in size and up to 4.5 hours in duration"

With the code implemented, all we need to do now is ensure that the webhook handler is called.

3.1 Audio Transcription Details

The ElevenLabs API returns a detailed JSON object.

Our PHP code will write the JSON object to disk but it will also save the transcription (the value of the text attribute) in the local execution log file and return it to the Pipe Platform. As a result, when viewing the webhook response headers and bodies in the Pipe account dashboard you'll get a glimpse of the transcription.

It is up to you what you want to do with the transcription at this point.

Here is an example of what the full response JSON object contains:

{
  "language_code": "en",
  "language_probability": 0.98,
  "text": "Hello world!",
  "words": [
    {
      "text": "Hello",
      "type": "word",
      "start": 0,
      "end": 0.5,
      "speaker_id": "speaker_1",
      "characters": [
        {
          "text": "text",
          "start": 0,
          "end": 0.1
        }
      ]
    },
    {
      "text": " ",
      "type": "spacing",
      "start": 0.5,
      "end": 0.5,
      "speaker_id": "speaker_1",
      "characters": [
        {
          "text": "text",
          "start": 0,
          "end": 0.1
        }
      ]
    },
    {
      "text": "world!",
      "type": "word",
      "start": 0.5,
      "end": 1.2,
      "speaker_id": "speaker_1",
      "characters": [
        {
          "text": "text",
          "start": 0,
          "end": 0.1
        }
      ]
    }
  ],
  "additional_formats": [
    {
      "requested_format": "requested_format",
      "file_extension": "file_extension",
      "content_type": "content_type",
      "is_base64_encoded": true,
      "content": "content"
    }
  ]
}

Step 4. Configure the Pipe Webhook

We will configure Pipe to execute our PHP webhook handler above whenever a new recording (together with all associated files, including the extracted audio file) finishes uploading to Pipe's complimentary storage.

Go to https://dashboard.addpipe.com/webhooks and click on Add New Webhook.
In the Webhook URL field, enter the URL for where you hosted the webhook-handler.php file (e.g., https://your_domain.com/webhook/webhook-handler.php).
For the event, make sure you select video_copied_pipe_s3.
For content type, it is recommended that you leave the default application/json.
Copy and paste the webhook key from here into the code above.
Make sure the Active box is checked.
Click Save Webhook.

You can further check out the Pipe docs on how to setup a webhook and how to handle webhook data, if you need more help.

Now you have all the necessary pieces to transcribe audio, video and screen recordings.

Go ahead and make a new recording using Pipe. Once the recording files are pushed to storage, the Pipe webhook triggers and makes a request to the ElevenLabs API endpoint to transcribe the extracted audio using Scribe v1.

Pushing Recordings From Your Own S3 Storage to ElevenLabs

As a Pipe PRO subscriber, you can push your recordings to your own Amazon S3 bucket or other S3-compatible services.

If you do so, you need to use the video_copied_s3 webhook instead:

when creating the webhook, choose the video_copied_s3 event instead of video_copied_pipe_s3
in your webhook handler script (PHP code) listen for the video_copied_s3 webhook event