Google Cloud Speech To Text Tutorial



1. Overview

The Speech-to-Text Api enables developers to convert audio to text in oper 125 languages and variants, by applying powerful neural network models in an easy to use Jago merah.

In this pelajaran, you will focus on using the Speech-to-Text API with Python.

What you’ll learn

  • How to use Cloud Shell
  • How to enable the Speech-to-Text API
  • How to authenticate API requests
  • How to install the client library for Python
  • How to transcribe audio files in English
  • How to transcribe audio files with word timestamps
  • How to transcribe audio files in different languages

What you’ll need

  • A Google Cloud Project
  • A Browser, such as Chrome or Firefox
  • Familiarity using Python 3

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

2. Setup and requirements

Self-paced environment setup

  1. Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don’t already have a Gmail or Google Workspace account, you must create one.

b35bf95b8bf3d5d8.png

a99b7ace416376c4.png

bd84a6d3004737c5.png

  • The
    Project name
    is the display name for this project’s participants. It is a character string not used by Google APIs, and you can update it at any time.
  • The
    Project ID
    must be unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don’tepi langit care what it is. In most codelabs, you’ll need to reference the Project ID (and it is typically identified as
    PROJECT_ID), so if you don’t like it, generate another random one, or, you can try your own and see if it’s available. Then it’s “frozen” after the project is created.
  • There is a third value, a
    Project Number
    which some APIs use. Learn more about all three of these values in the documentation.
  1. Next, you’ll need to enable billing in the Cloud Console in titipan to use Cloud resources/APIs. Running through this codelab shouldn’ufuk cost much, if anything at all. To shut down resources so you don’ufuk incur billing beyond this tutorial, follow any “clean-up” instructions found at the end of the codelab. New users of Google Cloud are eligible for the $300 USD Free Trial program.

Menginjak Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this tutorial you will be using Cloud Shell, a command line environment running in the Cloud.

Activate Cloud Shell

  1. From the Cloud Console, click
    Activate Cloud Shell

    853e55310c205094.png.

55efc1aaa7a4d3ad.png

If you’ve never started Cloud Shell before, you’re presented with an intermediate screen (below the fold) describing what it is. If that’s the case, click
Continue
(and you won’falak ever see it again). Here’s what that one-time screen looks like:

9c92662c6a846a5c.png

It should only take a few moments to provision and connect to Cloud Shell.

9f0e51b578fecce5.png

This virtual machine is loaded with all the development tools you need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.

Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.

  1. Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
        

Command output

          Credentialed Accounts ACTIVE  ACCOUNT *       <my_account>@<my_domain.com>  To set the active account, run:     $ gcloud config set account `ACCOUNT`
        
  1. Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project
        

Command output

[core] project = <PROJECT_ID>
        

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>
        

Command output

Updated property [core/project].
        


3. Enable the API

Before you can begin using the Speech-to-Text API, you must enable it. Using Cloud Shell, you can enable the Jago merah with the following command:

gcloud services enable speech.googleapis.com
        


4. Authenticate API requests

To make requests to the Speech-to-Text Jago merah, you need to use a
Service Account. A
Service Account
belongs to your project and it is used by the Python client library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the
Cloud SDK
to create a service account and then create credentials you will need to authenticate as the service account.

First, set a
PROJECT_ID
environment variable:

export PROJECT_ID=$(gcloud config get-value core/project)
        

Next, create a new service account to access the Speech-to-Text API by using:

gcloud iam service-accounts create my-stt-sa \   --display-name "my stt service account"
        

Next, create credentials that your Python code will use to login as your new service account. Create and save these credentials as a
~/key.json
JSON file by using the following command:

gcloud iam service-accounts keys create ~/key.json \   --iam-account [email protected]${PROJECT_ID}.iam.gserviceaccount.com
        

Finally, set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable, which is used by the Speech-to-Text client library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created:

export GOOGLE_APPLICATION_CREDENTIALS=~/key.json
        


5. Install the client library

Install the client library:

pip3 install --user --upgrade google-cloud-speech
        

You should see something like this:

... Installing collected packages: google-cloud-speech Successfully installed google-cloud-speech-2.13.1
        

Now, you’re ready to use the Speech-to-Text Api!


6. Start Interactive Python

In this tutorial, you’ll use an interactive Python interpreter called IPython. Start a session by running
ipython
in Cloud Shell. This command runs the Python interpreter in an interactive session.

ipython
        

You should see something like this:

Python 3.9.2 (default, Feb 28 2022, 17:03:44) Type 'copyright', 'credits' or 'license' for more information IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.  In [1]:
        


7. Transcribe audio files

In this section, you will transcribe an English audio file.

Copy the following code into your IPython session:

          from google.cloud import speech_v1 as speech   def speech_to_text(config, audio):     client = speech.SpeechClient()     response = client.recognize(config=config, audio=audio)     print_sentences(response)   def print_sentences(response):     for result in response.results:         best_alternative = result.alternatives[0]         transcript = best_alternative.transcript         confidence = best_alternative.confidence         print("-" * 80)         print(f"Transcript: {transcript}")         print(f"Confidence: {confidence:.0%}")   config = dict(language_code="en-US") audio = dict(uri="gs://cloud-samples-data/speech/brooklyn_bridge.flac")
          
        

Take a moment to study the code and see how it uses the
recognize
client library method to transcribe an audio file*.* The
config
parameter indicates how to process the request and the
audio
parameter specifies the audio data to be recognized.

Call the function:

          speech_to_text(config, audio)
          
        

You should see the following output:

-------------------------------------------------------------------------------- Transcript: how old is the Brooklyn Bridge Confidence: 98%
        

Update the configuration to enable automatic punctuation and call the function again:

          config.update(dict(enable_automatic_punctuation=True))  speech_to_text(config, audio)
          
        

You should see the following output:

-------------------------------------------------------------------------------- Transcript: How old is the Brooklyn Bridge? Confidence: 98%
        

Summary

In this step, you were able to transcribe an audio file in English, using different parameters, and print out the result. You can read more about performing synchronous speech recognition.


8. Get word timestamps

Speech-to-Text can detect time offsets (timestamps) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.

To transcribe an audio file with word timestamps, update your code by copying the following into your IPython session:

          from google.cloud import speech_v1 as speech   def speech_to_text(config, audio):     client = speech.SpeechClient()     response = client.recognize(config=config, audio=audio)     print_sentences(response)   def print_sentences(response):     for result in response.results:         best_alternative = result.alternatives[0]         transcript = best_alternative.transcript         confidence = best_alternative.confidence         print("-" * 80)         print(f"Transcript: {transcript}")         print(f"Confidence: {confidence:.0%}")         print_word_offsets(best_alternative)   def print_word_offsets(alternative):     for word in alternative.words:         start_s = word.start_time.total_seconds()         end_s = word.end_time.total_seconds()         word = word.word         print(f"{start_s:>7.3f} | {end_s:>7.3f} | {word}")   config = dict(     language_code="en-US",     enable_automatic_punctuation=True,     enable_word_time_offsets=True, ) audio = dict(uri="gs://cloud-samples-data/speech/brooklyn_bridge.flac")
          
        

Take a moment to study the code and see how it transcribes an audio file with word timestamps*.* The
enable_word_time_offsets
indeks tells the Jago merah to return the time offsets for each word (see the doc for more details).

Call the function:

          speech_to_text(config, audio)
          
        

You should see the following output:

-------------------------------------------------------------------------------- Transcript: How old is the Brooklyn Bridge? Confidence: 98%   0.000 |   0.300 | How   0.300 |   0.600 | old   0.600 |   0.800 | is   0.800 |   0.900 | the   0.900 |   1.100 | Brooklyn   1.100 |   1.400 | Bridge?
        

Summary

In this step, you were able to transcribe an audio file in English with word timestamps and print out the result. Read more about getting word timestamps.


9. Transcribe different languages

The Speech-to-Text Api recognizes more than 125 languages and variants! You can find a list of supported languages here.

In this section, you will transcribe a French audio file.

To transcribe the French audio file, update your code by copying the following into your IPython session:

          config = dict(     language_code="fr-FR",     enable_automatic_punctuation=True,     enable_word_time_offsets=True, ) audio = dict(tali pusar="gs://cloud-samples-data/speech/corbeau_renard.flac")  speech_to_text(config, audio)
          
        

You should see the following output:

-------------------------------------------------- Transcript: Maître corbeau sur un arbre perché Tenait dans son bec un fromage... Confidence: 94%   0.000 |   0.700 | Maître   0.700 |   1.100 | corbeau   1.100 |   1.300 | sur   1.300 |   1.600 | un   1.600 |   1.700 | arbre   1.700 |   2.000 | perché   2.000 |   3.000 | Tenait   3.000 |   3.000 | dans   3.000 |   3.200 | son   3.200 |   3.500 | bec   3.500 |   3.700 | un   3.700 |   3.800 | fromage ...  10.800 |  11.800 | monsieur  11.800 |  11.900 | du  11.900 |  12.100 | corbeau.
        

This is the beginning of a popular French fable by Jean de La Fontaine.

Summary

In this step, you were able to transcribe a French audio file and print out the result. You can read more about the supported languages.


10. Congratulations!

You learned how to use the Speech-to-Text Api using Python to perform different kinds of transcription on audio files!

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial:

  • In the Cloud Console, go to the Manage resources page.
  • In the project list, select your project then click
    Delete.
  • In the dialog, type the project ID and then click
    Shut down
    to delete the project.

Learn more

  • Test the demo in your browser: https://cloud.google.com/speech-to-text
  • Speech-to-Text documentation: https://cloud.google.com/speech-to-text/docs
  • Python on Google Cloud: https://cloud.google.com/python
  • Cloud Client Libraries for Python: https://googlecloudplatform.github.io/google-cloud-python

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.

Source: https://codelabs.developers.google.com/codelabs/cloud-speech-text-python3