September 2020 update: we’ve updated this blog post to make it more clear that:
- Up to 12 retries can be made for each of the 3 big processing steps (copy, conversion, all storage)
- The total processing time for a recording can not be longer than 75 hours since the recording was inserted in the database
- Every retry will be made X minutes after the point in time we became aware of the last failed retry
- There are 12 retry attempts across all storage options (ours + yours)
After finishing work on the new pages for browsing and searching storage logs one thing became clear: the retry mechanism for attempting to push videos to storage needed an overhaul as well.
Old Retry Mechanism
Until now any kind of failure in our pipeline (copying, transcoding or storing) triggered a retry every 2 minutes for a period of between 3 and 4 calendar days.
If the recording would not succeed in being copied between our servers, transcoded to .mp4 or pushed to your (S)FTP, S3 or Dropbox storage after many retries it would have been eliminated from the retry queue and marked as failed.
Any failure would thus trigger between 2160 and 2880 retries over the course of between 3 and 4 calendar days generating a lot of CPU and network usage and a lot of log entries when the push to storage attempt failed. There was also the chance of hosts banning our servers’ IPs for too many (S)FTP connection attempts.
New Retry Mechanism
Today we have updated our server side code with a new and revised retry mechanism.
When any of the 3 big processing steps (copy, transcoding or push to storage) fails, that step will be retried 12 times in the following sequence:
- 2 minutes after the 1st attempt failed
- 5 minutes after the last attempt failed
- 10 minutes after the last attempt failed
- 15 minutes after the last attempt failed
- 30 minutes after the last attempt failed
- 60 minutes after the last attempt failed
- 2 hours after the last attempt failed
- 4 hours after the last attempt failed
- 6 hours after the last attempt failed
- 12 hours after the last attempt failed
- 24 hours after the last attempt failed
- 24 hours after the last attempt failed
This will lead to a max of 12 retry attempts (on top of the initial live attempt to process the recording) for every failed step in the pipeline (copy, convert, store).
There's also a ceiling of 75 hours. If a recording will not be fully processed within a 75 hour window since the recording was inserted in our database we will not make more attempts to process it. This is important because some processing steps can take a long time (we've seen situations where the push to clients' storage is very slow).
For the storage step there's 12 retry attempts across all storage options combined. So, including the live attempt, there's max 13 attempts to push to both our storage and all of the clients' configured storage solutions.
Thus if there's a problem with one of the storage solutions, a recording will have a maximum of 13 failed storage logs, one for each attempt (you can view failed storage logs in the Pipe account area). In such cases you can now easily check when the next retry attempt will be made by looking at the date of your last failed (S)FTP, S3 or Dropbox log entry of a specific recording while taking into account the number of failed attempts. For example, if a recording failed to be pushed to (S)FTP at 12:31:06 and it is the 6th time it has failed, the next retry (7th) will be made 60 minutes after, at roughly 13:32.
As an added benefit, the CPU and network usage for our transcoding servers will be lower, resulting in a higher availability for processing newly made recordings.
Our retry mechanism looks for failed recordings to retry at the beginning of each minute so the above retry intervals might vary by +1 minute. For example, if a video fails to be pushed to your FTP at 12:47:02, the 1st retry will be at 12:50:00, 2 minutes and 58 seconds after.
Storage Errors Notification Emails
Because the maximum number of attemptsfor storage is now just 13, we are now sending an email, to the Pipe account email, for any storage error that occurred for a particular recording while Pipe tried to push the files to the client’s storage.
You’ll now be immediately aware of any kind of issues related to your storage, while easy access to the storage logs helps you quickly pinpoint the issue.
By default, these emails are enabled but you can disable them from the environment settings page in your dashboard.