Creating a Cross Environment push-to-talk Application Using Wazo

12 October 2021

Creating a Cross Environment push-to-talk Application Using Wazo

During our last Hackathon in a wonderful Quebec, we've thought about fun projects we can make to improve our knowledge on WebRTC, ReactJS and all the libraries around it.

Here some pictures of the super place where our hackathon took place :

What a wonderful view

Our office (small) Can you spot Fraçois and Pascal ? (small) Looks like a album cover (small)

Finding an Idea

During our release phase at the end of each sprints, everyone involved in the process meet up using mumble. Why using Mumble in a company that develop telecommunication software ? The answer is simple: our software lacks this simple and useful feature for technical people: Push-to-talk.

In day-to-day tasks, our customer don't need this kind of feature, that's why it was not at the top of our backlog, we're focusing our effort on their needs. That's why we are currently developing a video conference application allowing external people to join our customer's conferences, simplifying the installation of a Wazo server through our administration application, improving our mobile application using native API, etc...

So the idea is simple: create an application that connects us to a conference, and add a push-to-talk feature that can be triggered by a keyboard shortcut.

We're Wazo, meaning «bird» in French; our colleague Pascal came up with an excellent name for the project: «Woodpecker» -- always pecking that tree the way we hit that push-to-talk button.

Bootstraping the Project

After drawing some mockups, we considered various tools and libraries, with the idea of discovering and testing new ways to expand our skillset.

We're currently using ReactJS, Redux and WebRTC to develop our web application. The desktop application uses electron to wrap the web app and add some features like incoming call modal or better integration with headset commands like Jabra. We're also developing a mobile application using react-native, react-native-webrtc and react-native-callkeep a library that we've open sourced to help developers display incoming and ongoing native call UI.

Though we use create-react-app and create-react-native-app to bootstrap our apps, we discovered that expo now offers a new way to create web and mobile applications with a single code base. Using expo-electron-adapter will be useful for us to create a desktop application with this same code base.

Catching User Keystrokes, a Difficult Challenge.

Our current application is using react-hotkeys to help users to answer, hangup, mute/unmute, pause or resume a call by pressing a single key. However the library only works when the desktop application is focused or when the browser tab is displayed.

The main goal of a push-to-talk feature is to be able to hit some keys everywhere and be able to talk with people, we have to find another way to catch user keystrokes outside of our application context.

For Web Application

Allowing developers to listen to keyboard events in other tab or when the browser is not focused is potentially a huge security flaw, as it can used to create a keylogger and send credit card numbers, passwords, or vital informations that you will type. That's why browsers don't let us listen to keystrokes outside of our application; we have to find another way.

One simple idea for the web application is to use the Media Session API. This API allows to catch the play/pause button in recent keyboard even if the tab is not focused. In order to make it work, we have to play a sound in an <audio /> tag :

// Check if we are in a browser environment
if (typeof document !== 'undefined' && navigator.mediaSession) {
  audio = document.createElement('audio');
  audio.src = require('../../assets/silence.mp3');
  audio.loop = true;

  onPlay = () => navigator.mediaSession.playbackState = 'playing';
  onPause = () => navigator.mediaSession.playbackState = 'paused';

  audio.addEventListener('play', onPlay);
  audio.addEventListener('pause', onPause);

  // Later in the return function of a `useEffect` :
  // audio.removeEventListener('play', onPlay);
  // audio.removeEventListener('pause', onPause);

  // We the user interracts with the page :
  audio.play();

  navigator.mediaSession.setActionHandler('play', async function() {
    // User pressed the play button
    await audio.play();
  });

  navigator.mediaSession.setActionHandler('pause', function() {
    // User pressed the pause button
    audio.pause();
  });
}

For desktop application

Electron provides a globalShortcut that could be great for our project as the doc says :

You can use the globalShortcut module to detect keyboard events even when the application does not have keyboard focus.

But the only event we receive is when the user presses the shortcut, not when he releases it. With such constraint, the user would have to press the shortcut twice, that's not what we want.

After digging a little, we found iohook, a node library that allows to listen to keyboard events. As it's a nodejs library, we can add it in the electron main file and send events back to the renderer process using ipc :

const ioHook = require('iohook');

ioHook.on('keyup', event => browserWindow.webContents.send('electron-keyup', event));
ioHook.on('keydown', event => browserWindow.webContents.send('electron-keydown', event));

ioHook.start();

In the react app context:

const isMobile = typeof navigator != 'undefined' && navigator.product === 'ReactNative';

// Electron
if (!isMobile && window.require) {
  // Trick to avoid requiring electron in mobile
  System.import('electron').then((electron: any) => {
    electron.ipcRenderer.on('electron-keyup', (event: Object, message: Object) => {
      // We got a keyup event
    });
    electron.ipcRenderer.on('electron-keydown', (event: Object, message: Object) => {
      // We got a keydown event
    });
  });
}

Using Wazo's JS SDK

We're using and improving our Javascript SDK frequently, so using it to log in the user and joining a conference was not the most challenging part :

// Setting the wazo's server
Wazo.Auth.setHost('wazo.my-company.com');

// Authenticating the user 
const session = await Wazo.Auth.logIn(username, password);

// Joining a conference
const room = await Wazo.Room.connect({ extension: '8000' });

// When joining the room, mute the user:
room.on(room.ON_JOINED, () => {
  room.mute();
});

// When pressing the push-to-tak button or when a shortcut is pressed, unmute the user :
const onPush = () => {
	room.unmute();
};

const onRelease = () => {
	room.mute();
};

That's all ! The SDK takes care of the rest: Setting up WebRTC (html elements, events, ...), connecting to the SIP signaling server (Asterisk), sending room events, ...

Going Further

The project is not finished yet, we've stumbled upon some drawback with the build of iohook on certain architectures. That's a issue we should improve, or maybe find another library to catch keystrokes outside of an electron app.

We had to eject our expo application in order to add react-native-webrtc, so we lost some cool features provided by expo during the process.

Another improvement would be to use the accelerometer so we can check when the mobile device is in vertical position and trigger the talk position. When the device will go in horizontal position (eg: put on the table) the call could be muted.

Here how the application looks on mobile: Screen (small-centered)

Leaking Our Own Source Code

So this is it, our own little push-to-talk app. Granted, not the most advanced feature, but considering the amount of time dedicated to it, not too shabby. The result of our hackathon can be found in the woodpecker repository. Feedback is welcome !