The No.1 feature request we got this year is the ability to record audio instead of video so we set out to deliver just that, hopefully by the end of the year.

We started by researching just what kind of audio options are available with HTML5’s new promise based getUserMedia(), the almighty gateway to accessing the user’s webcam and microphone.

We were well aware of the audio recording capabilities in our legacy Flash based recorder. Ever since doing our first video recorder back in the 2010s we were able to record single channel audio using NellyMoser’s ASAO codec at up to 44kHz and with Speex at 16kHz using various bitrates.

We were not so sure about HTML5’s audio recording capabilities in Chrome & Firefox when using getUserMedia() and the Media Recorder API as we only saw Opus audio at 48kHz (and Vorbis @ 44.1 kHz in Firefox 47 and older) while developing both our 1st (WebRTC based) and 2nd generation HTML5 video recorder (Media Recorder API based).

Opus Audio Codec

Opus is now the default codec for encoding sound in Chrome, Firefox and Safari 11.

Opus is a great audio codec. It’s open, royalty free and so flexible it can encode (fullband) music better than the AAC encoder in iTunes but also encode narrowband voice with a latency that’s lower than any other codec’s. Tsahi Levent Levi has a great blog post on why Opus is so great.

getUserMedia() Audio Constraints

The Media Capture and Streams spec governs the cross browser audio options that should be implemented by all browsers and, in it’s latest Candidate Recommendation incarnation, it defines quite a few audio constraints.

Here’s the full list with explanations from both the spec and from my knowledge around how digital sound works:

  • sampleRate: specifies a desired sample rate, not sure if it should be used as an encoding setting or as a hardware requirement, higher is better (Audio CDs for example have 44000 samples/s or 44kHz)
  • sampleSize: each sample’s size in bits, higher is better (Audio CDs have a sample size of 16 bits/sample )
  • volume: it takes values between 0.0 (silence) and 1.0 (max volume), it’s used as a multiplier for each sample’s value
  • echoCancellation: whether or not to use echo cancellation to try and remove the audio that went to the speakers from the input that comes through the mic
  • autoGainControl: whether or not to modify the volume of the input from the mic
  • noiseSuppression: whether or not to try and remove the background noise from the audio signal
  • latency: specified in seconds, controls the time between the start of the sound processing and the data being made available to the next step, not so sure why you’d want higher latency but audio codecs do differ in latency
  • channelCount: specifies the number of channels to use with 1 being mono, 2 being stereo. It works with some webcams and laptops with dual mics.

A surprisingly comprehensive list – at least when comparing with the options Flash gave us – but just as surprising most of them are not widely supported by browsers. Understandably some of them are new (autoGainControl and noiseSuppression) but others (sampleRate, sampleSize, and echoCancellation) were added to the spec as early as February 2014. Adding support for these constraints would open up Opus’ flexibility to developers.

Browser Support (Including Safari 11)

I’ve used MediaStreamTrack.getSettings() to log to console the settings supported by each browser as MediaDevices.getSupportedConstraints() returns too many false positives . I also tested the latest versions of Chrome, Firefox and Safari to make sure changing the constraints has an effect.

Chrome 62 Firefox 56 Safari 11
echoCancellation supported, on by default supported, on by default supported, on by default
sampleRate fixed: 48000 fixed: 48000 fixed: 44100
noiseSuppression on by default, constraint not supported supported, true by default no
autoGainControl on by default, constraint not supported supported, false by default no
channelCount no supported, records
max avb. channels by
default
no
volume no no supported, defaults
to 1 (the maximum)

Use this code pen to check future browser support (does not work in Safari because Safari does not allow getUserMedia() in cross origin iframes).

Using Audio Constraints With getUserMedia()

All constraints can be sent to getUserMedia() as a property of the audio object inside the constraints object. Here’s an example using the newer promise based getUserMedia():

var constraints = {
    audio: {
        sampleRate: 48000,
        channelCount: 2,
        volume: 1.0
    },
    video: true
}
navigator.mediaDevices.getUserMedia(constraints).then(function(stream) {
    /* use the stream */
}).catch(function(err) {
    /* handle the error */
});

If you just want to use whatever defaults are set on the browser just pass true for the audio object:

var constraints = { audio: true, video:true }

Stereo Audio Recordings in Firefox 55 and 56

Firefox 55 added support for stereo recordings so I was able to record a video with dual channel (stereo) audio with:

  • the Logitech C925e (dual mics)
  • the older Logitech C920 (dual mics)
  • a 15″ Mid 2017 MacBook PRO (3 mics)

webm recording with 2 channel sound created using the media stream recorder api

The channel separation was very clear with the Logitech C925e where the mics are widely spaced apart as opposed to MacBook PRO’s 3 microphones which are closely grouped together on the right side of the keyboard.

Firefox 56 also added support for the channelCount constraint meaning I was able to switch from stereo recording – which seems to be the default when using webcams with dual mics – to mono recordings which should take up less bandwidth. Here’s how to request a mono stream:

{
    audio: {
        channelCount: 1
    },
    video: true
}

A value of 0, the default, tells getUserMedia() to capture all of the available, supported, channels.

Chrome 62 records just mono audio and setting the channelCount constraint (https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getSupportedConstraints lists it as supported) to 2 has no effect. There’s an open issue requesting & tracking channelCount, sampleRate and sampleSize implementations in Chrome.

UPDATE: Chrome 63 has implemented stereo audio recording.

Echo Cancellation, Noise Suppression and Auto Gain

The echoCancellation constraint works by default on Chrome, Firefox(supported since Firefox 46) and Safari 11. It’s especially noticeable when recording videos through this unmuted MediaRecorderAPI demo. It’s very aggressive on Firefox, much more than Chrome.

Here’s the constraint one can use with a user agent that supports it but does not turn it on by default:

{
    audio: {
        echoCancellation: true
    },
    video: true
}

noiseSuppression works beautifully in Firefox (it’s on by default), I could clearly hear the noise picked up by the mic with it turned off:

{
    audio: {
        noiseSuppression: false
    },
    video: true
}

Chrome definitely applies noise removal even if it does not support the constraint.

Safari 11 does not support noise suppression at this time.

autoGainControl is turned off by default on Firefox. Here’s how to turn it on:

{
    audio: {
        autoGainControl: true
    },
    video: true
}

After turning it on it clearly lowered the volume when (almost) screaming into the mic.

Chrome applies autoGainControl by default, but you can’t control it through the autoGainControl constraint, to turn it off you actually need to set echoCancellation to false or use the old

{
    mandatory: {
        googAutoGainControl: false
    }
}

constraint. When testing on macOS the input level notch moved to the left as I moved the microphone towards the sound source (but not when echoCancellation was set to false).

macOS Input Level Notch

Keywords

The keywords ( min, max, exact and ideal) should also work with these audio constraints. Per the spec, in the example above, the channelCount property should be treated as an ideal property, and it should produce the same result as:

{
    audio: {
        channelCount: {
            ideal: 1
        }
    },
    video: true
}

A more complex constraint which requests at least 1 channel audio but preferably 2 would be:

{
    audio: {
        channelCount: {
            ideal: 2,
            min: 1
        }
    },
    video: true
}

A constraint which requests exactly 2 channel audio would be:

{
    audio: {
        channelCount: {
            exact: 2
        }
    },
    video: true
}

The above constraint will result in an OverconstrainedError if the device does not have a dual mic webcam. For more details, check out common getUserMedia errors.