Welcome to Global Azure Bootcamp 2018! All around the world user groups and communities want to learn about Azure and Cloud Computing! On April 22, 2017, all communities will come together once again in the fifth great Global Azure Bootcamp 2018 event! Each user group will organize their own one day deep dive class on Azure the way they see fit and how it works for their members. The result is that thousands of people get to learn about Azure and join together online under the social hashtag #GlobalAzure! Join hundreds of other organizers to help out and be part of the experience!
The 2018 Louisville Global Azure Bootcamp is a free one-day global training event on Azure, from the community to the community. See our event home page for more details.
This years format will be a blend of brief presentations, followed by hands-on and guided labs.
Our speakers include:
To get started you'll need the following pre-requisites. Please take a few moments to ensure everything is installed and configured.
Azure is big. Really big. Too big to talk about all things Azure in a single day.
We've assembled an exciting workshop to introduce you to several Azure services that cloud developers should know about:
In this year’s Global Azure Bootcamp, you’ll learn how to integrate Azure’s customizable speech recognition, text analytics, and intent analysis APIs into an Azure-hosted app. You’ll start by learning about the Custom Speech Service, a speech recognition API that can be trained to filter out background noise and recognize obscure words and phrases. After training the speech recognition model, you’ll integrate it into an Azure-hosted web app to recognize real-time speech. Finally, you’ll integrate and train the Language Understanding and Intelligence Service (LUIS) to analyze the intent of speech phrases you generate. With the intent identified, your app will be able to respond in real time.
You can find additional lab materials and presentation content at the locations below:
There are several ways to get an Azure subscription, such as the free trial subscription, the pay-as-you-go subscription, which has no minimums or commitments and you can cancel any time; Enterprise agreement subscriptions, or you can buy one from a Microsoft retailer. In this exercise, you'll create a trial subscription using the code you were given at the bootcamp.
Browse to https://aka.ms/gab18.
Use the Azure Code on the handout you were given to get started.
This concludes the exercise.
If you already have an Azure account
If you have an Azure account already, you can skip this section. If you have a Visual Studio subscription (formerly known as an MSDN account), you get free Azure dollars every month. Check out the next section for activating these benefits.
There are several ways to get an Azure subscription, such as the free trial subscription, the pay-as-you-go subscription, which has no minimums or commitments and you can cancel any time; Enterprise agreement subscriptions, or you can buy one from a Microsoft retailer. In exercise, you'll create a free trial subscription.
Browse to the following page http://azure.microsoft.com/en-us/pricing/free-trial/ to obtain a free trial account.
Click Start free.
Enter the credentials for the Microsoft account that you want to use. You will be redirected to the Sign up page.
Note
Some of the following sections could be omitted in the Sign up process, if you recently verified your Microsoft account.
Enter your personal information in the About you section. If you have previously loaded this info in your Microsoft Account, it will be automatically populated.
In the Verify by phone section, enter your mobile phone number, and click Send text message.
When you receive the verification code, enter it in the corresponding box, and click Verify code.
After a few seconds, the Verification by card section will refresh. Fill in the Payment information form.
A Note about your Credit Card
Your credit card will not be billed, unless you remove the spending limits. If you run out of credit, your services will be shut down unless you choose to be billed.
In the Agreement section, check the I agree to the subscription Agreement, offer details, and privacy statement option, and click Sign up.
Your free subscription will be set up, and after a while, you can start using it. Notice that you will be informed when the subscription expires.
Your free trial will expire in 29 days from it's creation.
If you happen to be a Visual Studio subscriber (formerly known as MSDN) you can activate your Azure Visual Studio subscription benefits. It is no charge, you can use your MSDN software in the cloud, and most importantly you get up to $150 in Azure credits every month. You can also get 33% discount in Virtual Machines and much more.
To active the Visual Studio subscription benefits, browse to the following URL: http://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits-details/
Scroll down to see the full list of benefits you will get for being a MSDN member. There is even a FAQ section you can read.
Click Activate to activate the benefits.
You will need to enter your Microsoft account credentials to verify the subscription and complete the activation steps.
You might be wondering how you can participate in a cloud development workshop and not need Visual Studio installed. Am I right?
Thanks to the Azure Resource Manager and some nifty templates I put together, we're going to provision a virtual machine (VM) with Visual Studio installed in your Azure subscription. From that point forward, you can work from the VM.
It takes about 10 minutes to get the VM deployed to your subscription, so let's get started!
Start by clicking the Deploy to Azure button below.
This opens the Azure portal in a new tab of your browser. If you're prompted to sign in, do so.
When the page loads, you'll see this custom deployment page:
Resource Groups
Formally, resource groups provide a way to monitor, control access, provision and manage billing for collections of assets that are required to run an application, or used by a client or company department. Informally, think of resource groups like a file system folder, but instead of holding files and other folders, resource groups hold azure objects like storage accounts, web apps, functions, etc.
WARNING
Do not forget your username and password. Write it down for today.
Scroll down to the bottom of the page and click two boxes:
Press the Purchase button.
After a few moments, the deployment of your VM will begin, and you'll see a status notification in the upper right:
...and a deployment tile on your dashboard:
Now, wait for about 10 minutes and your virtual machine will be deployed and ready to use.
That's it for the pre-requisites for today's workshop. Wait until your VM is created, and we'll be getting started soon!
Before we go any further, be sure you have all the pre-requisites downloaded and installed. You'll need the following:
NOTE
If you've been following along, you should have all of these above items.
One of the most important aspects of your Azure subscription and using the Azure portal is organization. You can create a lot of Azure resources very quickly in the portal, and it can become cluttered quickly. So, it's important to start your Azure subscription off right.
Our first stop will be to create a new Dashboard to organize our Azure resources we're building today.
We'll start by creating a dashboard.
Login to the Azure portal, click + New Dashboard, give the dashboard name, and click Done customizing.
That was easy! Dashboards are a quick way of organizing your Azure services. We like to create one for the workshop because it helps keep everything organized. You'll have a single place to go to find everything you build today.
Now that you have a new dashboard, let's put something on it. We'll be searching for the resource group you created in chapter 0 (the one that is holding your VM), and pinning it to this dashboard.
Resource Groups
You'll recall from the last chapter that resource groups provide a way to monitor, control access, provision and manage billing for collections of assets that are required to run an application, or used by a client or company department. Informally, think of resource groups like a file system folder, but instead of holding files and other folders, resource groups hold azure objects like storage accounts, web apps, functions, etc.
Start by searching for the resource group you created in chapter 0. My resource group was called workshop-test7.
Click in the search bar at the top. If you're lucky your resource group will be at the very top (like mine was). If not, type it's name and click on it.
This opens the resource group. Next, click the pin icon at the upper-right to pin the resource group to your dashboard:
Finally, close the resource group, by clicking the X in the upper right corner (next to the pin icon). You should see the resource group pinned to your dashboard:
Now that you have the VM's resource group pinned to your dashboard, it will be easy to locate the VM in later exercises.
Our last step will be to create a new Resource Group to house the non-VM resources we'll create in this workshop.
Start by clicking the + Create a resource button on the left.
Search for resource group by using the search box, selecting Resource Group when it appears.
Select Resource Group from the search results window:
Click Create at the bottom:
Give the Resource group a name, select your Azure subscription, and a location. Press Create when you're finished.
After it's created, you'll see a message in the notification area:
Pin it to your dashboard by clicking the Pin to dashboard button. Note that the resource group has been added to your dashboard.
That wraps up the basics of creating dashboard, creating resource groups, and pinning resources to a dashboard. We're not going to take a deep dive into Azure Resource Group. If you're interested in learning more, check out this article.
Next, let's get logged into the VM that we created in chapter 0.
Start by navigating to your Azure portal dashboard.
Locate the VM resource group you pinned earlier in this chapter and click on your virtual machine:
Click the Connect button.
This downloads a file to your computer that will open in your Remote Desktop program.
Click the downloaded file to open a connection to your VM. Enter your username and password you created earlier.
Click OK to connect.
If you're prompted by a security message, respond Yes:
You're now connected to your VM.
Download additional software
If you're like me, you have a standard toolset you like to use. Please, download software for your VM and don't forget your browser of choice, Notepad++, Visual Studio Code, etc.
Download Chrome/Firefox/Edge
It's important that you download an evergreen browser on your virtual machine, because the version of Internet Explorer installed on the VM is not compatible with some of the JavaScript we have in this workshop.
Before you can download files through Internet Explorer, you need to enable downloads. Go to Tools -> Internet Options -> Security -> Internet -> Custom Level. Find Downloads -> File download, then select Enabled. Close Internet Explorer, then re-open.
Now, you can download your favorite browser. And don't forget to set it as your default. Don't use IE.
This concludes the exercise.
Now that you're connected to your VM, you can continue to workshop from inside the VM.
Running a VM in Azure
If you're worried about excessive charges to your Azure subscription because you're running a VM constantly, don't worry. This VM is programmed to shut itself down every morning at 1:00 AM.
Let's get started by getting the master
branch.
Clone or download the master
branch from https://github.com/mikebranstein/global-azure-bootcamp-2018.
Use this link to download a zip file of the master
branch.
Unblock the .zip file!
Don't open the zip file yet. You may need to unblock it first!
If you're running Windows, right-click the zip file and go to the properties option. Check the Unblock option, press Apply, press Ok.
Now it's safe to unzip the file.
Open the solution in Visual Studio by double-clicking the Web.sln
file in the web folder of the extracted files:
Logging into Visual Studio the first time
When you open Visual Studio the first time, it may take a few minutes. Be patient. You'll probably be prompted to sign in. Use your Microsoft account to sign in (the same one you used to sign up for the Azure trial).
The opened solution should look like this:
Build and debug the solution. You should see the Speech Recognition site load in your browser.
This concludes the exercise.
That's it! You're up and running and ready to move on! In the next section, you'll learn how to deploy your website to Azure.
In the last part of this chapter, you'll learn how to create an Azure Web App and deploy the Speech Service website to the cloud. In short, I like to think of Azure Web Apps like IIS in the cloud, but without the pomp and circumstance of setting up and configuring IIS.
Web Apps are also part of a larger Azure service called the App Service, which is focused on helping you to build highly-scalable cloud apps focused on the web (via Web Apps), mobile (via Mobile Apps), APIs (via API Apps), and automated business processes (via Logic Apps).
We don't have time to fully explore all of the components of the Azure App Service, so if you're interested, you can read more online.
As we've mentioned, Web Apps are like IIS in the cloud, but calling it that seems a bit unfair because there's quite a bit more to Web Apps:
Websites and Web Apps: Web Apps let developers rapidly build, deploy, and manage powerful websites and web apps. Build standards-based web apps and APIs using .NET, Node.js, PHP, Python, and Java. Deliver both web and mobile apps for employees or customers using a single back end. Securely deliver APIs that enable additional apps and devices.
Familiar and fast: Use your existing skills to code in your favorite language and IDE to build APIs and apps faster than ever. Access a rich gallery of pre-built APIs that make connecting to cloud services like Office 365 and Salesforce.com easy. Use templates to automate common workflows and accelerate your development. Experience unparalleled developer productivity with continuous integration using Visual Studio Team Services, GitHub, and live-site debugging.
Enterprise grade: App Service is designed for building and hosting secure mission-critical applications. Build Azure Active Directory-integrated business apps that connect securely to on-premises resources, and then host them on a secure cloud platform that's compliant with ISO information security standard, SOC2 accounting standards, and PCI security standards. Automatically back up and restore your apps, all while enjoying enterprise-level SLAs.
Build on Linux or bring your own Linux container image: Azure App Service provides default containers for versions of Node.js and PHP that make it easy to quickly get up and running on the service. With our new container support, developers can create a customized container based on the defaults. For example, developers could create a container with specific builds of Node.js and PHP that differ from the default versions provided by the service. This enables developers to use new or experimental framework versions that are not available in the default containers.
Global scale: App Service provides availability and automatic scale on a global datacenter infrastructure. Easily scale applications up or down on demand, and get high availability within and across different geographical regions. Replicating data and hosting services in multiple locations is quick and easy, making expansion into new regions and geographies as simple as a mouse click.
Optimized for DevOps: Focus on rapidly improving your apps without ever worrying about infrastructure. Deploy app updates with built-in staging, roll-back, testing-in-production, and performance testing capabilities. Achieve high availability with geo-distributed deployments. Monitor all aspects of your apps in real-time and historically with detailed operational logs. Never worry about maintaining or patching your infrastructure again.
Now that you understand the basics of web apps, let's create one and deploy our app to the cloud!
Earlier in this chapter, you created a resource group to house resources for this workshop. You did this via the Azure Portal. You can also create Web Apps via the Azure portal in the same manner. But, I'm going to show you another way of creating a Web App: from Visual Studio.
Visual Studio 2017 Warning
This exercise assumes you're running Visual Studio 2017. The UI and screens in Visual Studio 2015 aren't the same, but similar. We're not going to include screen shots for 2015, but we think you can figure it out.
From Visual Studio, right-click the Web project and select Publish. In the web publish window, select Microsoft Azure App Service, Create New, and press Publish. This short clip walks you through the process:
On the next page, give your Web App a name, select your Azure subscription, and select the Resource Group you created earlier (mine was named workshop).
Unique Web App Names
Because a web app's name is used as part of it's URL in Azure, you need to ensure it's name is unique. Luckily, Visual Studio will check to ensure your web app name is unique before it attempts to create it. In other words, don't try to use the web app name you see below, because I already used it.
Click New... to create a new Web App plan.
Web App Plans
Web App plans describe the performance needs of a web app. Plans range from free (where multiple web apps run on shared hardware) to not-so-free, where you have dedicated hardware, lots of processing power, RAM, and SSDs. To learn more about the various plans, check out this article.
Create a new free plan.
After the plan is created, click Create to create the Web App in Azure.
When the Azure Web App is created in Azure, Visual Studio will publish the app to the Web App. After the publish has finished, your browser window will launch, showing you your deployed website.
Web App URLs
The deployed web app has a URL of Web App Name.azurewebsites.net. Remember this URL, because you'll be using it in later chapters.
One final note is to check the Azure Portal to see the App Service plan and Web App deployed to your resource group:
This concludes the exercise.
In this chapter you'll learn about the Custom Speech Service, how to provision one in the Azure Portal, and how to link your subscription to the Custom Speech Service portal.
Abbreviation
To save some time, you may see me refer to the Custom Speech Service as CSS. I know it can be confusing, especially if you're a web developer. But, let's pretend for a day that you're not, and use CSS in a different way. Thanks!
The Custom Speech Service enables you to create a customized speech-to-text platform that meets the needs of your business. With the service, you create customized language models and acoustic models tailored to your application and your users. By uploading your specific speech and/or text data to the Custom Speech Service, you can create custom models that can be used in conjunction with Microsoft’s existing state-of-the-art speech models. With these capabilities, you're able to filter out common background noise, adjust for localized dialects, and train the speech service to recognize non-standard/obscure words and phrases (like "Pokemon", scientific terms, and technical jargon).
For example, if you’re adding voice interaction to a mobile phone, tablet or PC app, you can create a custom language model that can be combined with Microsoft’s acoustic model to create a speech-to-text endpoint designed especially for your app. If your application is designed for use in a particular environment or by a particular user population, you can also create and deploy a custom acoustic model with this service.
Before you get started, it's important to understand how speech recognition systems work.
Speech recognition systems are composed of several components that work together. Two of the most important components are the acoustic model and the language model.
Acoustic Model
The acoustic model is a classifier that labels short fragments of audio into one of a number of phonemes, or sound units, in a given language. For example, the word “speech” is comprised of four phonemes “s p iy ch”. These classifications are made on the order of 100 times per second.
Phoneme
In short, a sound unit. Any of the perceptually distinct units of sound in a specified language that distinguish one word from another, for example p, b, d, and t in the English words pad, pat, bad, and bat.
Language Model
The language model is a probability distribution over sequences of words. The language model helps the system decide among sequences of words that sound similar, based on the likelihood of the word sequences themselves. For example, “recognize speech” and “wreck a nice beach” sound alike but the first hypothesis is far more likely to occur, and therefore will be assigned a higher score by the language model.
Both the acoustic and language models are statistical models learned from training data. As a result, they perform best when the speech they encounter when used in applications is similar to the data observed during training. The acoustic and language models in the Microsoft Speech-To-Text engine have been trained on an enormous collection of speech and text and provide state-of-the-art performance for the most common usage scenarios, such as interacting with Cortana on your smart phone, tablet or PC, searching the web by voice or dictating text messages to a friend.
Credits
This section was borrowed from Microsoft's official documentation. Thank you!
Throughout the next several chapters, you'll be building acoustic and language models. Don't worry if you don't understand everything right now, because you'll be learning as you go.
Bing Speech API
Microsoft has another speech-to-text service in Azure called the Bing Speech API. This API is like the Custom Speech Service, but it cannot be customized. I like to think of the Bing Speech API as a v1 product, and the Custom Speech Service as a v2 product. Both are highly capable, but when I need to account for background noise, custom words, etc. I choose the Custom Speech Service.
Now that you know what the Custom Speech Service can do, let's start using it! You'll start by creating a Custom Speech Service instance in the Azure portal.
Start by jumping back to the Azure portal, and create a new resource by clicking the Create a resource button.
Search for Custom Speech Service:
Fill out the required parameters as you create an instance:
West US Location
Normally, I recommend you keep resources in the same region, but the Custom Speech Service is in preview right now, so it's only available in West US.
When the Custom Speech Service instance is provisioned, it will appear in your resource group:
The final step is to navigate to the Custom Speech Service instance by clicking on it.
Locate the Keys area and take note of KEY 1:
You'll need this key in the next step, so don't forget it.
This concludes the exercise.
There's not much you can do with the Custom Speech Service in the Azure portal because the service is still in preview. Instead, a separate portal exists to perform customizations and work with the service. In the next section, you'll be introduced to the Custom Speech Service portal.
Start by navigating to the CSS web portal at https://cris.ai.
Click the Sign In link in the upper right and sign in with your Azure portal subscription login.
After logging in, click on your name the upper right, and select the Subscriptions option below it:
The Subscriptions page shows all of your connected CSS subscriptions.
Click the Connect existing subscription button. Add the CSS subscription you just created in the Azure portal. Give it a name and enter KEY 1 from the Azure portal.
You should see the subscription appear on the subscriptions page.
This concludes the exercise.
That's it. In the next chapter, you'll start to use the CSS by creating various data sets for training and testing.
In this chapter, you'll learn:
At the core of every artificial intelligence (or machine learning) problem is data. And that data is used in various capacities to train, build, and test the systems you develop. Because data is so critical to machine learning endeavors, you'll need to learn about the different ways data is used.
Thank you, StackExchange
This next section was adapted from a StackExchange post. Thank you to all that contributed, as you said it better than I could have.
In many machine learning processes, you need two types of data sets:
In one data set (your gold standard) you have the input data together with correct/expected output, This data set is usually duly prepared either by humans or by collecting some data in semi-automated way. But it is important that you have the expected output for every data row here, because you need to feed the machine learning algorithms the expected, or correct results for it to learn. This data set is often referred to as the training data set.
In the other data set, you collect the data you are going to apply your model to. In many cases this is the data where you are interested for the output of your model and thus you don't have any "expected" output here yet. This is often real-world data.
With these two data sets, the machine learning process adheres to a standard 3-phase process:
Training phase: you present your data from your "gold standard" (or training data set) and train your model, by pairing the input with expected output. Often you split your entire training data set into two pieces. Approximately 70% of the training data is used for training, and 30% reserved for validation/testing. The 30% reserved data is often referred to as test data. The result of this phase is a trained model.
Validation/Test phase: to estimate how well your trained model has been trained, you pass in the reserved 30% of your testing data and evaluate it's accuracy.
Application phase: now you apply your trained model to the real-world data and get the results. Since you normally don't have any reference value in this type of data, you can only speculate about the quality of your model output using the results of your validation phase. You perform additional accuracy tests.
Separation of Training, Test, and Real-World Data Sets
An easy mistake to make with your training, test, and real-world data sets is overlapping data (or reusing data from one set in another). Imagine that you training a model to answer true/false questions using a series of 10 questions and answers. After the model is trained, you use the same 10 questions to evaluate how well the model performs. Ideally, it should perform 100%, but you don't know how well it really performs because you tested with the training data. The only true test is to use other real-world questions, then re-evaluate its performance.
Now that you know about the different types of data, you'll be creating training data sets for acoustic, language, and pronunciation data, then testing acoustic data.
Acoustic, Language, and Pronunciation
Don't worry if you don't know the difference between these 3 types of data the CSS uses, you'll be learning about it next.
In a previous chapter, you learned about acoustic models.
Acoustic Model
The acoustic model is a classifier that labels short fragments of audio into one of a number of phonemes, or sound units, in a given language. For example, the word “speech” is comprised of four phonemes “s p iy ch”.
To build acoustic models, you need acoustic data sets. An acoustic data set consists of two parts:
To build testing acoustic audio data for the Custom Speech Service, you should adhere to the following guidelines:
Holy Audio Requirements, Batman!
Yeah. This is a lot to take in. Don't worry. I've already built the audio files for you. We'll take a look in a bit.
The second component of acoustic data is a text file containing transcripts of each audio file.
The transcriptions for all WAV files should be contained in a single plain-text file. Each line of the transcription file should have the name of one of the audio files, followed by the corresponding transcription. The file name and transcription should be separated by a tab (\t). Each line must end with a line feed and new line character (\r\n).
For example:
speech01.wav speech recognition is awesome
speech02.wav the quick brown fox jumped all over the place
speech03.wav the lazy dog was not amused
The transcriptions should be text-normalized so they can be processed by the system. However, there are some very important normalizations that must be done by the user prior to uploading the data to the Custom Speech Service. The normalization rules are too lengthy to cover here, so you should check them out on your own. It may seem like a lot at first, but i've found it fairly straight-forward and I was quickly able to learn and apply them regularly.
In the source code you downloaded from Github, you'll find the training audio files and an audio transcript of the files in the custom-speech-service-data/training folder:
Pokemon!
You may have noticed the file names of the acoustic data are Pokemon. My son and I have recently started to play Pokemon the Card Game together, so I thought this would be a fun way (and topic) to teach you about speech recognition. After all, Pokemon names are difficult to pronounce, and are a domain-specific language of their own. They're a perfect match for the capabilities of the Custom Speech Service.
Let's get started by uploading an acoustic data set to the CSS portal.
Start by locating the acoustic .wav audio files. Select the 17 audio files, zip them up, and name the zip file training-utterances.zip.
Next, navigate to the CSS web portal at https://cris.ai.
Click the Sign In link in the upper right and sign in with your Azure portal subscription login.
After logging in, click on the Custom Speech navigation option and navigate to Adpatation Data:
At the top of the Adaptation Data page, will be an area for Acoustic Datasets.
Click the Import button and complete the following fields:
Click Import to upload the acoustic data and build the data set.
When the data is uploaded, you'll navigate back to the Acoustic Datasets page and your data set will be displayed in the grid:
Note the Status of the acoustic dataset is NotStarted. In a few moments, it will change to Running:
When you upload acoustic data, the CSS will analyze the data, check it for errors, and ensure the transcription file matches the uploaded audio filenames. There are a variety of other checks that are performed that aren't important, but it's good to know that there is some post-processing that needs to occur before you can use the acoustic data set.
When the CSS finishes analyzing and validating the acoustic data, the Status will change to Succeeded:
Congratulations! You've created your first acoustic data set. We'll be using it later in this chapter.
Curious? ...and Challenge #1
If you're wondering what the audio files sound like, don't hesitate to download them to your computer and play them. Just remember that playing the audio files on the VM we've created for the workshop probably won't work, so you'll have to download the files to your actual computer.
If you're in the mood for a challenge, augment the training data by adding your own audio files. I've found the open-source software Audacity to be a great tool for recording, editing, and exporting audio files in the right format. I suggest creating a few sample audio utterances relating to your favorite Pokemon (or try Charizard).
If you do add to the acoustic data set, don't forget to transcribe your audio and add the transcription to the training-utterances.txt file!
This concludes the exercise.
Now that you've created an acoustic data set, let's build a language data set. As you'll recall from a previous chapter, language models and language data sets teach the CSS the likelihood of encountering certain words or phrases.
Language Model
The language model is a probability distribution over sequences of words. The language model helps the system decide among sequences of words that sound similar, based on the likelihood of the word sequences themselves. For example, “recognize speech” and “wreck a nice beach” sound alike but the first hypothesis is far more likely to occur, and therefore will be assigned a higher score by the language model.
To create a custom language data set for your application, you need to provide a list of example utterances to the system, for example:
The sentences do not need to be complete sentences or grammatically correct, and should accurately reflect the spoken input you expect the system to encounter in deployment. These examples should reflect both the style and content of the task the users will perform with your application.
The language model data should be written in plain-text file using either the US-ASCII or UTF-8, depending of the locale. For en-US, both encodings are supported. The text file should contain one example (sentence, utterance, or query) per line.
If you wish some sentences to have a higher weight (importance), you can add it several times to your data. A good number of repetitions is between 10 - 100. If you normalize it to 100 you can weight sentence relative to this easily.
More Rules!
Don't worry about these rules for now, because we've already assembled a collection of utterances appropriate for our needs today.
Before we get started, take a look at the utterances in the training-language-model-data.txt file. Here's a short excerpt:
ash's best friend should sit down
sit pikachu
sit on the floor pikachu
have a seat meowth
meowth please sit on the ground
i'd like to see ash's best friend act angry
get really mad pikachu
You'll notice that this is a collection of commands. This is of importance and significance. Later in the workshop, you'll be using the Language Understanding (LUIS) service to analyze the intent of spoken commands. So, it makes sense that the language model we'll be building contains commands.
Now that you know what is in a language data set, let's head over to the CSS portal and create one.
Start by navigating to the CSS web portal at https://cris.ai, then navigate back to the Adaptation Data page.
Scroll down past the Acoustic Datasets area, and you'll find the Language Datasets area:
Click the Import button and complete the following fields:
Click Import to upload the language data and build the data set.
When the data is uploaded, you'll navigate back to the Language Datasets page and your data set will be displayed in the grid:
Note the Status of the language data set is NotStarted. In a few moments, it will change to Running, the Succeeded, just like the acoustic data set did.
Congratulations! You've created your first language data set. We'll be using it later in this chapter.
Challenge #2
Just like you did for the acoustic data set, feel free to augment the utterances I built. I suggest continuing to create utterances related to the Pokemon you added in the last challenge.
This concludes the exercise.
Now that you've created acoustic and language data sets, you could be ready to move on to designing the models for each. But, there's another customization you can provide that helps to train you model in a special way: pronunciation data.
Custom pronunciation enables users to define the phonetic form and display of a word or term. It is useful for handling customized terms, such as product names or acronyms. All you need is a pronunciation file (a simple .txt file).
Here's how it works. In a single .txt file, you can enter several custom pronunciation entries. The structure is as follows:
Display form <Tab>(\t) Spoken form <Newline>(\r\n)
The spoken form must be lowercase, which can be forced during the import. No tab in either the spoken form or the display form is permitted. There might, however, be more forbidden characters in the display form (for example, ~ and ^).
Each .txt file can have several entries. For example, see the following screenshot:
The spoken form is the phonetic sequence of the display form. It is composed of letters, words, or syllables. Currently, there is no further guidance or set of standards to help you formulate the spoken form.
I've found it useful to use pronunciation in a variety of circumstances. In the above example, pronunciation helps transform see three pee oh to C3PO. I've also used it in the past to transform a t and t to AT&T, and microsoft dot com to Microsoft.com.
For your final data set, you'll create a pronunciation data set. Let's get to it!
Start by navigating to the CSS web portal at https://cris.ai, then navigate back to the Adaptation Data page.
Scroll down past the Language Datasets area, and you'll find the Pronunciation Datasets area:
Click the Import button and complete the following fields:
Click Import to upload the pronunciation data and build the data set.
When the data is uploaded, you'll navigate back to the Pronunciation Datasets page and your data set will be displayed in the grid:
Note the Status of the language data set is NotStarted. In a few moments, it will change to Running, the Succeeded, just like the acoustic data set did.
Congratulations! You've created your first pronunciation data set. We'll be using it in the next chapter.
Challenge #3
I bet you can't guess what this challenge is about... This is a more difficult challenge, probably. That's because you don't really know about your problem domain yet. Typically, you add pronunciation data sets once you know more about your problem domain that you're trying to train for. But, if you think you can add something to what we already have, go for it!
This concludes the exercise.
You'll recall earlier in this chapter that there are multiple types of data sets we'll need: training, testing, and real-world.
So far, you've created 3 training data sets: acoustic, language, and pronunciation. Next, you'll need to create a testing data set.
Here's a secret - testing data sets for the CSS are acoustic data sets. And here's why. Think about it - an acoustic data set provides audio files, with transcriptions of the audio file content. As a result, an acoustic data set is ideal for testing because it includes audio files, and their actual content.
Now, we have to be a bit careful, because it's easy to confuse your training and testing data sets because they are both acoustic data sets. So, as we create a second acoustic data set, we'll be sure to name it properly - with testing in it's name.
Start by locating the testing files we included in the workshop files. You'll find 6 .wav audio files in the custom-speech-service-data/testing folder:
Select the 6 audio files, zip them up, and name the zip file testing-utterances.zip.
Next, navigate to the CSS web portal at https://cris.ai, and navigate to Adpatation Data.
Click the Import button by Acoustic Datasets and complete the following fields:
Click Import to upload the acoustic data and build the data set.
When the data is uploaded, you'll navigate back to the Acoustic Datasets page and your data set will be displayed in the grid:
Note the Status of the acoustic dataset is NotStarted. In a few moments, it will change to Running, then Succeeded.
Congratulations! You've created your testing acoustic data set.
Challenge #4
Yes. Again. Feel free to augment the testing data set you just created. Remember - don't overlap training/testing data, and make the data similar enough. For example, if you added Charizard to your training data sets, it would be a good idea to test for Charizard. Likewise, if you didn't add another pokemon, like Chespin, you shouldn't expect the CSS to magically recognize it.
This concludes the exercise.
Phew! That was a long chapter! But, you learned quite a bit, like:
In this chapter, you'll learn how to:
After creating Custom Speech Service (CSS) data sets, you need to instruct CSS to train models based on these data sets.
Training acoustic and language models is easy to do in the CSS portal - point and click. But, before we do that, we'll take a pit stop and establish a baseline accuracy of the CSS capabilities using Microsoft's base models.
Base Models - What Are They?
The CSS comes with several pre-trained acoustic and language models. In fact, there are different models for conversations and search/diction. If you stop to think about it, this makes a lot of sense. We tend to speak differently when we converse with others, as compared to dictating text or speaking search terms for a search engine. See below for the base models Microsoft provides.
To understand the effect our CSS customization will have, it's important to establish a baseline accuracy for the CSS service against our testing data set.
Let's get started and see how the CSS does against some of these Pokemon names ;-)
Start by navigating to the CSS web portal at https://cris.ai, and navigate to Accuracy Tests.
This page shows the status of the past and ongoing accuracy tests performed by the CSS.
Click the Create New button to begin a test against an acoustic and language model.
Complete the following fields:
Click Create to begin the test run.
When the test run is saved, you'll navigate back to the Accuracy Test Results page:
Note the Status of the test run is NotStarted. In a few moments, it will change to Running, then Succeeded.
The test run may take some time to execute (up to 10 minutes). So, it's a good time to take a short break. Check back in 5.
Hi, welcome back. I wish you had just won $1700. Will you settle for a lousy accuracy test?
So, let's check back in on the accuracy test.
Ugh! 45% word error rate - not good.
Word Error Rate (WER)
'WER' (Word Error Rate) and 'Word Accuracy' are the best measurements to take when comparing two utterances, these are typically values in % and are derived by comparing a reference transcript with the speech-to-text generated transcript (or hypothesis) for the audio. In our case, the reference transcript is the transcript file we supplied for the testing data set, and the speech-to-text generated transcript (or hypothesis) is what the CSS generated when it processed the 6 audio files in the testing dataset.
The algorithm used is called the Levenshtein distance, it is calculated by aligning the reference with hypothesis and counting the words that are Insertions, Deletions, and Substitutions.
In general, WER is fairly complex to calculate. We won't dive much deeper, but if you're interested in learning more, check out this website.
Well, 45% error rate is still pretty high. Let's explore the results of the accuracy test.
Click the Details link to learn more. At the bottom of the page, you'll find the detailed transcription (we provided that) and the hypothesis (decoder output).
You should notice several mis-interpretations, as the CSS had trouble with:
What's interesting is that aside from the Pokemon names, the CSS did a pretty good job. It got confused a bit about winking, but perhaps I didn't annunciate very well in the test files. We'll see later on.
Another thing to note is our testing data set is SMALL. Really small. In fact, there are only ~30 words in the entire data set. That's really too small, and for each word missed, we add ~3% word error rate. In a production system, we'd want hundreds of utterances, and thousands of words in a testing data set. So, keep that in mind for future endeavors.
This concludes the exercise.
I know we can do better then 45%, and we will as we build our own acoustic and language models.
Let's get started building an acoustic model based on our acoustic data set we uploaded earlier.
Start by navigating to the CSS web portal at https://cris.ai, and navigate to Acoustic Models.
This page shows the various acoustic models you've trained for the CSS.
Click the Create New button and complete the following fields:
Click Create to train the model.
When the model is saved, you'll navigate back to the Acoustic Models page:
Note the Status of the test run is NotStarted. In a few moments, it will change to Running, then Succeeded.
The test run may take some time to execute (up to 10 minutes). So, it's a good time to take a short break. Check back in another 10.
Hi, welcome back. My son loves these videos.
So, let's check back in on the model training:
Excellent, it's finished.
This concludes the exercise.
There's not much more to do with the acoustic model, so let's do the same with our language data set and train a language model
Training language models is just like training acoustic models, so let's dive in.
Start by navigating to the CSS web portal at https://cris.ai, and navigate to Language Models.
This page shows the various language models you've trained for the CSS.
Click the Create New button and complete the following fields:
Click Create to train the model.
When the model is saved, you'll navigate back to the Language Models page:
Note the Status of the test run is NotStarted. In a few moments, it will change to Running, then Succeeded.
The training process may take some time to execute (up to 10 minutes). So, it's a good time to take yet another short break. Check back in another 5.
Welcome back, again. This video was for me. I love Tesla. Hopefully, I'll get one someday. Someday soon.
So, let's check back in on the model training:
Excellent, it's finished.
This concludes the exercise.
Now that you've built an acoustic model and language model that customizes the base models, let's test them! The original WER was 45%, so I think we can do better.
Start by navigating to the CSS web portal at https://cris.ai, and navigate to Accuracy Tests.
Click the Create New button to begin a test against an acoustic and language model.
Complete the following fields:
Click Create to begin the test run.
When the test run is saved, you'll navigate back to the Accuracy Test Results page:
Note the Status of the test run is NotStarted. In a few moments, it will change to Running, then Succeeded.
The test run may take some time to execute (up to 10 minutes). So, it's a good time to take a short break. Check back in 2.
This one was for everyone. And it's amazing.
So, let's check back in on the accuracy test.
Sweet! Look at that - 6% WER. I'm ok with that (for now). Feel free to explore the details of the accuracy test to learn more.
Challenge #5
Try to get the accuracy test WER down to 0%. Enough said.
This concludes the exercise.
In this chapter, you learned:
In this chapter, you'll learn how to:
Previously, you learned about the various data sets you need to train, test, and operationalize machine learning systems. Over the past 2 chapters, you created training and testing data sets, built customized acoustic and language models, then tested the customization accuracy.
The next step is to deploy your customization and test them with real-world data.
Let's get started!
You've already done the hard work of building the customized models, so let's use them to create a deployment.
PREREQUISITES
Before you proceed, you'll need a customized acoustic and language model that have a Succeeded status. If your models are still training, wait a few more minutes, then check back when the models are ready.
Start by navigating to the CSS web portal at https://cris.ai, and navigate to Deployments.
Click the Create New button to create a new deployment.
Complete the following fields:
Click Create to deploy the models to production.
When the test run is saved, you'll navigate back to the Deployments page:
Note the Status of the test run is NotStarted. In a few moments, it will change to Running, then Succeeded.
The deployment will not take long (up to 1 minute). That's it! You've deployed your models.
This concludes the exercise.
Now that you've deployed a customized CSS endpoint, you can consume it in an application. But how?
Let's take a closer look at your deployment.
Start by navigating to the CSS web portal at https://cris.ai, and navigate to Deployments.
Click the Details link next to your Pokemon deployment:
The deployment details page shows you a variety of details about your deployment. Scroll down to the Endpoints area:
The endpoints area shows a variety of URIs that you can use to access your customized deployment. You can interact with the CSS via a:
You'll notice that for each option, you have endpoints for short and long-form audio. In some cases, endpoints support punctuation detection.
Depending on the needs of your application, you may choose a different endpoint. I've used each of these previously and think it's good to walk through each at a high-level.
Use this option when you have a .wav file that you want to upload and get a single response back. You won't get real-time speech results back, but it works well when you want to do quick, bulk processing of a large collection of audio files.
If you need real-time processing of audio, when using a microphone that's embedded in software with a .NET app, Android app, or iOS app, this is the right choice for you. An important designation here is that you need to use this in conjunction with the SDKs/libraries provided by Microsoft. It's also important to note that these are intended for client-side applications, not a server-side process that will have a long life span.
When you need a long-running server-side process to interact in real-time with the CSS in a .NET app, use these endpoints. You'll also need to use the .NET SDK built for this purpose.
The last option is to interact with the CSS using a specific protocol called the Speech Protocol. This also has it's own SDK and API you need to adhere to when using these endpoints.
The Speech Protocol
I consider the first three options a legacy way of interacting with the CSS. The 4th option (Speech Protocol) is the new and recommended way of interfacing with the CSS. Eventually the first 3 options will be deprecated and the Speech Protocol will be the way to interact.
Right now, support for the new Speech Protocol is limited to a JavaScript SDK, but if you need a C# version, you can roll your own. To learn more about the protocol, check out the official protocol documentation.
Rolling your Own Speech Protocol Client
Don't do this, and I speak from experience. I've done it. It's not easy, the documentation isn't great, and unless you have a lot of experience writing web socket protocol code in C#, this can be really time consuming and difficult. I lost a month of my life to this. The end result was pretty cool. I didn't have an option of waiting for Microsoft's team to implement the C# SDK, but you probably will.
Ok, sorry for the tangent. Let's get back to the deployment. You'll need to keep track of a few pieces of data:
First, take note of the Subscription Key at the top. Second, you'll need the web socket protocol base URL from the WebSocket with the Speech Protocol/JavaScript WebSocket API (wss://610e08d3ae4b4d4eb7d45dcf2e877698.api.cris.ai) for my deployment.
Don't Use MY Endpoint Base URL
Please don't copy my endpoint base URL. If you do, you'll get errors later on. Please copy your own.
With these two values copied/saved, you're ready to move on to testing the endpoint.
This concludes the exercise.
Now, let's return to the web app you deployed to Azure earlier and test your Custom Speech Service deployment.
Don't use your VM for this exercise
It's important that you don't use your VM for this exercise, because you'll be using your computer's microphone. This just doesn't work well through a remote desktop connection.
Start by navigating to your deployed Azure web site. My URL was http://workshopwebapp.azurewebsites.net/.
After the page loads, paste your deployed CSS endpoint base URL into the Endpoint text box, and the subscription key into the Subscription Key text box:
Ignore the LUIS-related fields, change the Recognition Mode drop down to Dictation, and Format to Detailed Result:
Press the Start button and start speaking. The page may ask to access your microphone, and as you speak, the site will submit your speech to the CSS endpoint you created and return incremental speech results in real-time.
Try speaking the phrase, "Pikachu is a cool pokemon.":
Now, that's cool! As you speak, you'll see incremental results returned to you browser and displayed in the Current hypothesis area. Then, when the CSS recognizes the end of your utterance, it returns JSON-formatted result:
{
"RecognitionStatus": "Success",
"Offset": 0,
"Duration": 26300000,
"NBest": [
{
"Confidence": 0.923154,
"Lexical": "pikachu is a cool pokemon",
"ITN": "pikachu is a cool Pokémon",
"MaskedITN": "pikachu is a cool Pokémon",
"Display": "Pikachu is a cool Pokémon."
}
]
}
{
"RecognitionStatus": "EndOfDictation",
"Offset": 54210000,
"Duration": 0
}
The way this CSS endpoint works is that each time an utterance is detected, a JSON object is returned with "RecognitionStatus": "Success"
. Inside, it tracks the audio millisecond count that was sent, based on the Offset and Duration, meaning that at audio millisecond 0, the system detected an utterance beginning, and an utterance ending after 26300000 milliseconds.
The CSS also returns the speech hypothesis in a variety of formats. The most meaningful is the "Display": "Pikachu is a cool Pokémon."
result, which is the official transcription with a confidence % of 92.3154%.
Pretty cool.
Go ahead an try a few more phrases.
We're not going to dive into the JavaScript code that manages interacting with the Speech Protocol WebSocket endpoint. It's really complicated. You're welcome to dive into the details on your own, but it's out of scope for us today.
Challenge #6
Now that you have a full training to testing to real-world testing methodology for the CSS, try to stump your trained model. Then, return back to your data sets, models, and deployments. Update all of them and attempt to retrain the system to address the shortcomings you identified.
This concludes the exercise.
In this chapter, you learned:
In this chapter, you'll learn:
In the past chapters, you've been focused on building a customized speech recognition engine with the CSS. Now that you are equipped with the knowledge to build these on your own, we'll turn our attention to analyzing the results your CSS generates.
In most machine learning and speech-to-text projects, using a single product (like CSS) isn't common. Instead, you'll often chain the results of one service to the input of another. This process is referred to as building a machine learning pipeline.
In the final chapters of this workshop, you'll be expanding your speech-to-text pipeline by adding intent analysis with Language Understanding (LUIS).
Language Understanding (LUIS) allows your application to understand what a person wants in their own words. LUIS uses machine learning to allow developers to build applications that can receive user input in natural language and extract meaning from it. A client application that converses with the user can pass user input to a LUIS app and receive relevant, detailed information back.
A LUIS app is a domain-specific language model designed by you and tailored to your needs. You can start with a prebuilt domain model, build your own, or blend pieces of a prebuilt domain with your own custom information.
A model starts with a list of general user intentions such as "Book Flight" or "Contact Help Desk." Once the intentions are identified, you supply example phrases called utterances for the intents. Then you label the utterances with any specific details you want LUIS to pull out of the utterance.
Prebuilt domain models include all these pieces for you and are a great way to start using LUIS quickly.
After the model is designed, trained, and published, it is ready to receive and process utterances. The LUIS app receives the utterance as an HTTP request and responds with extracted user intentions. Your client application sends the utterance and receives LUIS's evaluation as a JSON object. Your client app can then take appropriate action.
Intents
An intent represents actions the user wants to perform. The intent is a purpose or goal expressed in a user's input, such as booking a flight, paying a bill, or finding a news article. You define and name intents that correspond to these actions. A travel app may define an intent named "BookFlight."
Utterances
An utterance is text input from the user that your app needs to understand. It may be a sentence, like "Book a ticket to Paris", or a fragment of a sentence, like "Booking" or "Paris flight." Utterances aren't always well-formed, and there can be many utterance variations for a particular intent.
Entities
An entity represents detailed information that is relevant in the utterance. For example, in the utterance "Book a ticket to Paris", "Paris" is a location. By recognizing and labeling the entities that are mentioned in the user’s utterance, LUIS helps you choose the specific action to take to answer a user's request.
Intent | Sample User Utterance | Entities | ||
---|---|---|---|---|
BookFlight | "Book a flight to Seattle?" | Seattle | ||
StoreHoursAndLocation | "When does your store open?" | open | ||
ScheduleMeeting | "Schedule a meeting at 1pm with Bob in Distribution" | 1pm, Bob |
Now that you know a little bit about LUIS, let's see how it'll be used in conjunction with CSS.
When you're finished integrating LUIS into your solution, you'll be able to speak commands into the web site you published, and ask a variety of Pokemon (Pikachu, Jigglypuff, Meowth, etc.) to perform a variety of actions (sit, jump, scratch, sing, etc.).
LUIS will be used to take the output of the CSS endpoint you created and identify the intent (the action) and entity (the Pokemon). Then, the web site will parse the LUIS response and act accordingly by displaying the appropriate image at the bottom of the page.
Before we can use LUIS, we'll need to provision a LUIS subscription in the Azure portal. Let's get started!
Start by jumping back to the Azure portal, and create a new resource by clicking the Create a resource button.
Search for Language Understanding:
Fill out the required parameters as you create an instance:
When the LUIS subscription is provisioned, it will appear in your resource group:
The final step is to navigate to the LUIS subscription by clicking on it.
Locate the Keys area and take note of KEY 1:
You'll need this key in the next chapter, so don't forget it.
This concludes the exercise.
There's not much you can do with LUIS in the Azure portal because it has a separate portal (like the Custom Speech Service).
The process for linking your LUIS subscription in the LUIS portal is a bit different than linking your CSS subscription. As a result, we'll revisit this in a later chapter.
For now, hold on to your subscription key.
In this chapter, you'll learn:
In the last chapter you learned about the basics of LUIS: intents, utterances, entities. You also learned how we'll be using LUIS and how it will integrate into our speech-to-text pipeline.
I don't want to delay us too much, so let's dive right in by creating your first LUIS app.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the My Apps area:
This page lists the various LUIS apps you have created.
Click the Create new app button. Name it Pokemon:
After creating the app, you'll be redirected to the Intent page automatically:
There's not much more to creating the initial LUIS app, so let's continue on with defining your app intents.
This concludes the exercise.
You'll recall that LUIS intents are represent actions the user wants to perform. The intent is a purpose or goal expressed in a user's input, such as booking a flight, paying a bill, or finding a news article. In our case, we'll be creating intents like sitting down, scratching, jumping, etc.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the Intents area by first navigating to the My Apps areas, drilling down into the Pokemon app:
This page lists the various intents you have created. You'll notice that a None intent was automatically created for you. All LUIS apps start with this intent - you should not delete it. The None intent is used as a default fall-back intent for your LUIS apps.
You'll be creating a variety of intents in the Pokemon app. To help you, we've already defined the intents for you. In the code you downloaded from Github, locate the language-understanding-data folder. It contains a file named luis-utterances.md:
Inside the markdown file, you'll find an intent followed by a series of utterances. For example:
# Sit
- ash's best friend should sit down
- sit pikachu
- sit on the floor pikachu
- have a seat meowth
- meowth please sit on the ground
In the snippet above, the heading Sit
is the name of an intent you'll create, followed by utterances that align to the intent.
LUIS is like the Custom Speech Service, as it needs to learn the types of phrases (or utterances) that map to an intent.
Referencing the luis-utterances.md file, create intents for each intent listed in the document.
Follow along below to create the first intent, then rinse and repeat for the remaining intents.
To add an intent, click the Create new intent button:
Name the intent Sit, as referenced in the luis-utterances.md file:
On the next screen, the Sit intent will be listed at the top. In the text box below the intent, enter in the related utterances from the luis-utterances.md file. After each utterance, press Enter to save the utterance.
You'll notice that the LUIS portal associated the utterance with the intent each time you press Enter.
When you've finished entering the associated utterances, you can scroll down to review and modify them:
You'll also notice each utterance has a drop down next to it allow you to re-associate it with a different intent, if you made a mistake.
When you're finished entering the utterances for this intent, navigate back to the list of intents by clicking the Intents link on the left:
Proceed to add the remaining intents listed in the luis-utterances.md file. When you're finished, your list of intents should look like these:
This concludes the exercise.
After adding intents to your LUIS app, it's time to add entities. As you'll recall, an entity represents detailed information that is relevant in an utterance. For example, in the utterance "Jigglypuff, stop singing", "Jigglypuff" is a Pokemon. By recognizing and labeling the entities that are mentioned in the user’s utterance, LUIS helps you choose the specific action to take to answer a user's request.
There are a few important things about entities that are relevant, but we hadn't covered them yet.
First, entities are optional but highly recommended.
While intents are required, entities are optional. You do not need to create entities for every concept in your app, but only for those required for the app to take action.
For example, as you begin to develop a machine learning pipeline that integrates LUIS, you may not have a need to identify details/entities to act upon. So, when starting off, don't add them. Then, as your app matures, you can slowly add entities.
Second, entities are shared across intents. They don't belong to any single intent. Intents and entities can be semantically associated but it is not an exclusive relationship. This allows you to have a detail/entity be applicable across various intents.
In the LUIS app you're building today, we'll be defining a Pokemon entity that can identify a Pokemon by name. Because each of our intents typically involves a particular Pokemon, the Pokemon entity will be shared across intents. This means that an intent/entity combination gives us a unique action to perform.
For example, the "Jigglypuff, sing." utterance yields the intent of Sing with an identified Pokemon entity of type Jigglypuff.
LUIS has a variety of entity types, and each has a specific use. There are too many to dive into here, but I encourage you to learn more by reading the LUIS documentationhttps://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-concept-entity-types#types-of-entities.
Now that you know about entities, let's add the Pokemon entity.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the Entities area by first navigating to the My Apps areas, drilling down into the Pokemon app:
This page lists the entities you have defined.
Click the Create new entity button to create the Pokemon entity. Select List as the entity type:
List Entities
A list entity is a fixed list of values. Each value is itself a list of synonyms or other forms the value may take. For example, a list entity named PacificStates include the values Washington, Oregon, California. The Washington value then includes both "Washington" and the abbreviation "WA".
After clicking the Done button, you're redirected to the Pokemon entity detail page. Add the following values:
After adding these Pokemon, add synonyms for each, as shown below.
By adding these synonyms, you train LUIS to recognize the synonyms as the entity list value. For example, bubble pokemon will be recognized as a Pokemon entity, with Jigglypuff as the specific type.
When you've finished adding the entities and synonyms, refresh your browser.
To validate that the entities are being properly recognized, navigate back to the Intents page, then open the detail page for an intent.
You'll notice that text in each utterance is now replaced with a generic intent Pokemon block. You can use the Entities view toggle switch to see how utterance text maps to an entity:
This concludes the exercise.
Now that you've added intents and entities, let's explore a quick way to improve the accuracy of your LUIS app with phrase lists.
A phrase list includes a group of values (words or phrases) that belong to the same class and must be treated similarly (for example, names of cities or products). What LUIS learns about one of them is automatically applied to the others as well. This is not a white list of matched words.
In other words, a phrase list can help you better identify an intent or entity in a more dynamic manner.
For example, in a travel agent app, you can create a phrase list named "Cities" that contains the values London, Paris, and Cairo. If you label one of these values as an entity, LUIS learns to recognize the others.
A phrase list may be interchangeable or non-interchangeable. An interchangeable phrase list is for values that are synonyms, and a non-interchangeable phrase list is intended for values that aren't synonyms but are similar in another way.
There are two rules of thumb to keep in mind when using phrase lists:
Use phrase lists for terms that LUIS has difficulty recognizing. Phrase lists are a good way to tune the performance of your LUIS app. If your app has trouble classifying some utterances as the correct intent, or recognizing some entities, think about whether the utterances contain unusual words, or words that might be ambiguous in meaning. These words are good candidates to include in a phrase list feature.
Use phrase lists for rare, proprietary, and foreign words. LUIS may be unable to recognize rare and proprietary words, as well as foreign words (outside of the culture of the app), and therefore they should be added to a phrase list feature. This phrase list should be marked non-interchangeable, to indicate that the set of rare words form a class that LUIS should learn to recognize, but they are not synonyms or interchangeable with each other.
A note on using phrase lists
A phrase list feature is not an instruction to LUIS to perform strict matching or always label all terms in the phrase list exactly the same. It is simply a hint. For example, you could have a phrase list that indicates that "Patti" and "Selma" are names, but LUIS can still use contextual information to recognize that they mean something different in "make a reservation for 2 at patti's diner for dinner" and "give me driving directions to selma, georgia".
After learning about phrase lists, you may be confused on whether to use these, or to use a list entity.
With a phrase list, LUIS can still take context into account and generalize to identify items that are similar to, but not an exact match, as items in a list. If you need your LUIS app to be able to generalize and identify new items in a category, it's better to use a phrase list. When you want to be able to recognize new instances of an entity, like a meeting scheduler that should recognize the names of new contacts, or an inventory app that should recognize new products, use another type of machine learned entity such as a simple or hierarchical entity. Then create a phrase list of words and phrases. This list guides LUIS to recognize examples of the entity by adding additional significance to the value of those words.
A list entity explicitly defines every value an entity can take, and only identifies values that match exactly. A list entity may be appropriate for an app in which all instances of an entity are known and don't change often, like the food items on a restaurant menu that changes infrequently.
LUIS Best practice
After the model's first iteration, add a phrase list feature that has domain-specific words and phrases. This feature helps LUIS adapt to the domain-specific vocabulary, and learn it fast.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the My Apps areas, drilling down into the Pokemon app.
Click on the Phrase lists navigation option:
Click the Create new phrase list button to create a phrase list.
You'll be using the phrase list feature to create a phrase list for each of our intents. This is a good idea because our intents like Sit and Jump are fairly generic, and you can describe each of these intents in a variety of ways. For example, "sit down" and "grab a chair" are different utterances that have the same intent: Sit.
Add a phrase list named Sit, and either type in a variety of synonyms for sit, or use the Recommend option to help identify a list of words that are related to sitting:
I added the following options to my phrase lists, feel free to create your own, add more, subtract some, but be sure to create a phrase list for each intent:
When you're finished, your phrase lists should look like this:
This concludes the exercise.
That's it - you've created intents and entities to help train your LUIS app to recognize Pokemon and an action you want the Pokemon to perform. You've also added phrase lists to help train your app to respond and identify intents more efficiently.
In the next chapter, you'll train, test, and publish your LUIS app.
In this chapter, you'll learn how to:
The concept of training a LUIS app is like training Custom Speech Service apps. So far, you've created a training data set including intents and entities, so the next step is to train your app. Then you'll test the trained app, and publish for consumption.
In a real-world scenario, you'll incrementally advance the functionality of a LUIS app. For example, you'll start with a few intents and entities, then as your app matures (or when you identify a deficiency), you'll add new intents and entities, re-train, and publish a new version of the LUIS app.
Let's jump into the training process.
Training a LUIS app is easy. Seriously. I don't like to refer to machine learning concepts as easy, but all you do is click a button.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the My Apps areas, drilling down into the Pokemon app.
In the upper-right corner, you'll see a Train button. Visually, the button will appear with a red dot when there are changes in the LUIS app that can be trained. Hovering over the button also indicates that it can untrained changes.
Click the Train button to train your LUIS app, integrating intents and entities into the trained app model:
When training is finished, the Train button has a green dot:
That's it. It was easy.
This concludes the exercise.
Now that your LUIS app is trained, you can test it. The LUIS portal offers a cool testing utility embedded, so it makes testing your app quick.
Let's test a few phrases that we may expect our app should handle.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the My Apps areas, drilling down into the Pokemon app.
In the upper-right corner, you'll see a Test button.
Click the Test button and a sidebar will pop out. In the textbox, enter in several utterances and see how your trained LUIS app performs.
Try these utterances:
That's it for testing in the LUIS portal.
This concludes the exercise.
With a tested LUIS app that you're satisfied with, let's move on to publishing your LUIS app.
Let's jump right into publishing.
Log in to the LUIS portal at https://www.luis.ai, and navigate to the My Apps areas, drilling down into the Pokemon app.
In the upper-right corner, you'll see a PUBLISH link. Click it:
On the publishing page, there are a few things you can do:
Intent Analysis Results
As you're getting started with LUIS, I recommend you have your LUIS apps return all of the identified intents, because it can aid in your debugging of complex intents and models. When you become more comfortable with LUIS and have a greater confidence in it's results, you'll want to disable returning ALL results. This is also especially important on lower-bandwidth apps (like mobile).
Bing spell checking
In many cases, I like to enable Bing spell checker, because I cannot guarantee that text coming into LUIS is valid. But, in your Pokemon app, you should not enable Bing spell checker because you're not guaranteed the CSS speech-to-text pipeline will return valid words, plus you don't know if Pikachu will be replaced by something else by Bing spell checker.
A good rule of thumb: if you're using a speech-domain that could have strange words present, don't use Bing spell checker.
We'll be publishing directly to production, so select Production and Eastern time zones.
Before you click the Publish to Production button, scroll down a bit further and click the Add Key button.
Select your Azure tenant, subscription, and the LUIS subscription you created in an earlier chapter and add the key.
This adds your LUIS subscription to the LUIS app you just created and allows you to deploy your app to that subscription.
After add the LUIS subscription, you can click the Publish to Production button:
It will take a few minutes for LUIS to publish your app to the LUIS subscriptions you entered.
When it's finished, take note of two values below: the LUIS endpoint base URL and the Key String. For my LUIS app deployment, mine are:
You'll need these values in the final chapter to test it all out!
This concludes the exercise.
In the final chapter of the workshop, you'll learn:
With a published app, you're ready to get LUIS into the Speech Recognition website you deployed to Azure earlier in the workshop.
Before you jump into testing, let's learn how to access LUIS endpoints.
LUIS is exposed as a REST API endpoint and can be accessed right from your web browser, or through your favorite REST API testing software. I like to use Postman.
Let's test out your LUIS endpoint.
Start by downloading and installing Postman on the Azure VM you've been using today.
After Postman is installed, close out of all the windows/popups it shows on startup, and you should see a screen like this:
Navigate back to the LUIS portal and copy the URL next to your LUIS subscription on the Publish page of your Pokemon app:
Paste the link into the HTTP GET textbox in Postman, then click the Params button. Next to the q query string parameter, type in pikachu sit down on the ground, then click the Send button:
The HTTP response will be displayed in the Body area below. My response is included below. I'll let you examine the HTTP response from LUIS on your own.
{
"query": "pikachu sit down on the ground",
"topScoringIntent": {
"intent": "Sit",
"score": 0.956479847
},
"intents": [
{
"intent": "Sit",
"score": 0.956479847
},
{
"intent": "ActAngry",
"score": 0.0324947946
},
{
"intent": "Sing",
"score": 0.00468421541
},
{
"intent": "Wave",
"score": 0.00439580763
},
{
"intent": "None",
"score": 0.00238293572
},
{
"intent": "Scratch",
"score": 0.00228275615
},
{
"intent": "Jump",
"score": 0.00204163464
},
{
"intent": "Wink",
"score": 0.00131020416
}
],
"entities": [
{
"entity": "pikachu",
"type": "Pokemon",
"startIndex": 0,
"endIndex": 6,
"resolution": {
"values": [
"Pikachu"
]
}
}
]
}
As you can see, LUIS returns the query we passed pikachu sit down on the ground, and all of the intents it believes are applicable, with associated confidence percentages. It also selects the top-ranking intent and displays it separately. Finally, LUIS returns any entities it identifies, and indicates where in the query the entity occurred.
Pretty cool, and easy to use.
This concludes the exercise.
Now that you understand how to send data to LUIS via an HTTP GET and what the output of a LUIS request looks like, let's jump into the web site you published to Azure.
Return to your laptop, not the VM you've bee using. Navigate to the Speech recognition web app you published to Azure earlier. My URL was https://workshopwebapp.azurewebsites.net/.
This time, enter your CSS endpoint URL, CSS Subscription key, check the LUIS checkbox, and add your LUIS endpoint base URL, and LUIS subscription key.
LUIS Endpoint base URL
The LUIS endpoint URL should be everything before the query string in the LUIS endpoint URL. For example, mine is https://eastus.api.cognitive.microsoft.com/luis/v2.0/apps/6f0e678b-212c-4d4e-b0cc-967cb57f0a3a. Do not include the query string.
With LUIS enabled on the web site, test it out. Using the Start and Stop buttons, speak a few phrases:
Check out the results!
Sweet!
Now, how does it all work? Well, I'll let you check out the JavaScript code on your own in the MVC app. Here's a hint: check out the UpdateRecognizedPhrase
JavaScript function in the Index.cshtml view.
This concludes the exercise.
Wow. So, we have a full end-to-end solution for performing speech-to-text analysis, identifying the intent of the text, and reacting to the identified intent. Congratulations on sticking it out.
I have a few more things to share, so you're not finished yet.
Final Challenge
Starting from the beginning with the Custom Speech Service (acoustic & language data sets, models, and deployments), add some more Pokemon to the mix. Then update the LUIS intents, entities, and phrase lists. Finally, add more images to the web site and deploy it all to Azure.
Super Double Secret Probation Challenge
As you can see, I desperately need help styling the web site that's part of this workshop. Submit a PR to my Github repo! Please. Someone.
Earlier in the workshop, you learned how to use phrase lists to increase the accuracy of your LUIS app. But that's not the only way. LUIS has adaptive learning capabilities and the ability to update the trained LUIS model by validating the generated intent and entity hypothesis of queries it has processed.
We're not going to cover this in the workshop, but you should check out the Review endpoint utterances area of your Pokemon app.
That's officially the end of the workshop. Thank you for your time! I hope you enjoyed learning some of the advanced artificial intelligence offerings on Azure!
Go forth and build some modern apps!