As I write this it's April of 2020, so you know what that means... I'm stuck inside of my house. And my extended family is stuck inside of their houses too.
So to help pass the time and keep those family bonds strong ... we decided to have a weekly family game night over FaceTime. (Because evidently Zoom was too difficult to figure out.)
The game of choice is bingo ... it's always bingo.
Sounds great, right? Family time, iOS devices, a game of chance with balls numbered between 1 and 75.
It usually is! But sometimes ... sometimes family can get ... how should I say ... annoying. (Mom, if you're reading this - not you.. I 💗 you!)
So I wanted to figure out a way to still partake in the family game night (I mean, it's awfully hard to come up with excuses when you can't leave the house) but yet kinda not have to pay attention (cuz I'm a terrible person) - BUT - still know if I won the game!
And that's where the idea for this app and blog post came from! (Download all the code so you can follow along!)
So here's what I built. It's a simple Xamarin.Forms app that displays a bingo card.
I could tap on the numbers as they're being called (y'know, if I happened to be paying attention).
And it tells me if I've won the game or not.
The problem with the app is - I have to pay attention to the numbers as they're called.
Or do I?!?
Cognitive Services Speech to Text
To overcome the pesky fact that I have to pay attention to the numbers as they're called - I thought why don't I try one of the lesser known Cognitive Services - that of Speech to Text.
What Speech to Text does is pretty simple. It uses the microphone to listen to a stream of words, then translates those words into a string, then returns it to you.
(It also does a whole lot more - the SDK has functionality to "listen" for wake words - so you can build your own Alexa if you want!)
And there's a free tier so I won't have to pay anything!
So all I would need to do is to write a little bit of code to "listen" for numbers, check if the card is displaying that number, then voila - I don't have to pay attention during family game night any longer! 😈
When I told my family what I was going to do - they liked the idea too!
Let's play bingo!
There's three parts to getting Speech to Text working.
- Setup the Speech API Cognitive Service in the Azure portal
- Install the NuGet package
- Make some changes to the
- Write the code!
Go to the Azure portal - then startup the CLI. Then enter these commands:
Install the NuGet package
Back into whatever Xamarin project you want to run the Speech to Text from.
Install the following NuGet: Microsoft.CognitiveServices.Speech
If you're doing Forms - you're gonna have to put that in the platform and the Forms projects.
Make the platform changes
There's a couple of things that you need to change on each platform project.
First you need to tell iOS what to display when it prompts the user for access to the microphone. In to the
Add the following key to it:
<key>NSMicrophoneUsageDescription</key> <string>Transcribe Bingo Cards</string>
Over on the Droid side of things - pop open the
AndroidManifest.xml file and add:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
Listen and transcribe!
Now you'd think this would be the difficult part. You have to prompt for the microphone permissions. Then somehow get the audio stream from the mic. Buffer all that up and send it to Azure. Rely on some kind of weird callback to get the transcribed text back... ugh.
Turns out it's not too bad!
Prompting for the permissions is a breeze thanks to
Xamarin.Essentials. It'll look something like this:
Call it before turning on the recording. If the
PermissionStatus is anything but
Granted then prompt the user to give the app microphone capabilities.
Now the listen and transcribe part. Here's the function I'm using to start up a continuous listen and transcribe:
recognizer.Recognized event. That's what's gonna get called every time some text comes back.
But notice what's not there... nothing about handling the mic... or handling the bytes and bytes of audio input. It's all done for you! Don't worry about it!
Then finally that event where the transcribed text comes back:
The text is inside of the
SpeechRecognitionEventArgs object that's passed in. It has a
The text that comes back is in nice bite size chunks. In other words - it won't return tons and tons of sentences all at once. The SDK is smart enough to send audio up to Azure in small batches to help speed the processing.
There's also a little bit of bingo logic going on here. First I'm taking that string of text and then looping through each character in it - looking for numbers.
The text that comes back will be full sentences and if people are talking, the bingo numbers will be buried within.
And I found that any G bingo numbers come across as 3. So G56 interprets as 356. So I'm actually looking for 3 numbers in a row - and throwing that first one out.
That's all there is to it. A pretty simple project that adds some really neat functionality with not a lot of code.
And the best part - it really wows your family who can't even figure out how to use Zoom! 🤣
So check out the code - and expand upon it - I'd love to see what you come up with! AND DON'T JUDGE MY GAME PROGRAMMING SKILLS! I brute forced everything!
Multiple arrays for each column - check the winning scenarios manually, and it's easy to trick - it's all hard-coded and brute force!! 💪
And the user interface is U.G.L.Y - ugly!
I mean - if I don't have time to play bingo properly - I don't have time to code a game properly either!!
(Seriously though - I'm just joking - of course I played ~for real~ during family game nights. The app was just a fun little way of showing off how cool Azure is to my family!)