How Google uses AI/ML to make our lives easier?

9 min readOct 20, 2020

Alphabet Inc. CEO Sundar Pichai said in an interview at the World Economic Forum in Davos, Switzerland —

AI is one of the most profound things we’re working on as humanity. It’s more profound than fire or electricity

Let me tell you some of the mind-boggling things google has developed using AI/ML:

🎵Song stuck in your head? Just hum to search

Starting today, you can hum, whistle, or sing a melody to Google to solve your earworm. After you’re finished humming, our machine learning algorithm helps identify potential song matches. And don’t worry, you don’t need a perfect pitch to use this feature. Google shows us the most likely options based on the tune. Then you can select the best match and explore information on the song and artist, view any accompanying music videos or listen to the song on your favorite music app, find the lyrics, read analysis and even check out other recordings of the song when available.

So next time you can’t remember the name of some catchy song you heard on the radio or that classic jam your parents love, just start humming. You’ll have your answer in record time.

How do machines learn melodies?

An easy way to explain it is that a song’s melody is like its fingerprint: They each have their own unique identity. Google has built machine learning models that can match your hum, whistle, or sing to the right “fingerprint.”

When you hum a melody into Search, our machine learning models transform the audio into a number-based sequence representing the song’s melody. Our models are trained to identify songs based on a variety of sources, including humans singing, whistling, or humming, as well as studio recordings. The algorithms also take away all the other details, like accompanying instruments and the voice’s timbre and tone. What it’s left with is the song’s number-based sequence or the fingerprint.

It compares these sequences to thousands of songs from around the world and identify potential matches in real-time. For example, if you listen to Tones and I’s “Dance Monkey,” you’ll recognize the song whether it was sung, whistled, or hummed. Similarly, our machine learning models recognize the melody of the studio-recorded version of the song, which Google can use to match it with a person’s hummed audio.

This builds on the work of our AI Research team’s music recognition technology. Google launched Now Playing on the Pixel 2 in 2017, using deep neural networks to bring low-power recognition of music to mobile devices. In 2018, they brought the same technology to the SoundSearch feature in the Google app and expanded the reach to a catalog of millions of songs. This new experience takes it a step further because now they can recognize songs without the lyrics or original song. All we need is a hum.

Efficient fast replies for email:

Last year Google launched Smart Reply, a feature for Inbox by Gmail that uses machine learning to suggest replies to email. Since the initial release, usage of Smart Reply has grown significantly, making up about 12% of replies in Inbox on mobile. Based on our examination of the use of Smart Reply in Inbox and our ideas about how humans learn and use language, we have created a new version of Smart Reply for Gmail. This version increases the percentage of usable suggestions and is more algorithmically efficient.

How 🧐?

Novel thinking: hierarchy
Inspired by how humans understand languages and concepts, we turned to hierarchical models of language, an approach that uses hierarchies of modules, each of which can learn, remember, and recognize a sequential pattern.

The content of language is deeply hierarchical, reflected in the structure of language itself, going from letters to words to phrases to sentences to paragraphs to sections to chapters to books to authors to libraries, etc. Consider the message, “That interesting person at the cafe we like gave me a glance.” The hierarchical chunks in this sentence are highly variable. The subject of the sentence is “That interesting person at the cafe we like.” The modifier “interesting” tells us something about the writer’s past experiences with the person. We are told that the location of an incident involving both the writer and the person is “at the cafe.” We are also told that “we,” meaning the writer and the person being written to, like the cafe. Additionally, each word is itself part of a hierarchy, sometimes more than one. A cafe is a type of restaurant which is a type of store which is a type of establishment, and so on.

In proposing an appropriate response to this message we might consider the meaning of the word “glance,” which is potentially ambiguous. Was it a positive gesture? In that case, we might respond, “Cool!” Or was it a negative gesture? If so, does the subject say anything about how the writer felt about the negative exchange? A lot of information about the world, and an ability to make reasoned judgments, are needed to make subtle distinctions.

Given enough examples of language, a machine learning approach can discover many of these subtle distinctions. Moreover, a hierarchical approach to learning is well suited to the hierarchical nature of language. It is found that this approach works well for suggesting possible responses to emails. Google uses a hierarchy of modules, each of which considers features that correspond to sequences at different temporal scales, similar to how we understand speech and language.

Each module processes inputs and provides transformed representations of those inputs on its outputs (which are, in turn, available for the next level). In the Smart Reply system and the figure above, the repeated structure has two layers of hierarchy. The first makes each feature useful as a predictor of the final result, and the second combines these features. By definition, the second works at a more abstract representation and considers a wider timescale.

YouTube Recommendations:

With over 500 hours of content uploaded to YouTube every minute, finding what you need would be nearly impossible without some help sorting through all of the videos. YouTube’s search ranking system does just that by sorting through loads of videos to find the most relevant and useful results to your search query, and presenting them in a way that helps you find what you’re looking for.

It prioritizes three main elements to provide the best search results: relevance, engagement, and quality. To estimate relevance we look into many factors, such as how well the title, tags, description, and video content match your search query. Engagement signals are a valuable way to determine relevance. We incorporate aggregate engagement signals from users, i.e. we may look at the watch time of a particular video for a particular query to determine if the video is considered relevant to the query by other users. Finally, for quality, our systems are designed to identify signals that can help determine which channels demonstrate expertise, authoritativeness, and trustworthiness on a given topic.

**Search results** may differ for each user. For example, if a user watches a lot of sports videos and searches for ‘cricket’, we might recommend videos featuring the sport cricket rather than nature videos with crickets in them.

In addition to those three main factors, we strive to make search results relevant for each user and we may also consider your search and watch history. That’s why your search results might differ from another user’s search results for the same query. For example, if you watch a lot of sports videos and search for ‘cricket’, we might recommend videos featuring sport cricket rather than nature videos with crickets in them.

📸Visual Ways to search and understand the world

Whether you’re a student learning about photosynthesis or a parent researching the best cars for your growing family, people turn to Google with all sorts of curiosities. And we can help you understand in different ways — through text, your voice or even your phone’s camera. Today, as part of the SearchOn event, we’re announcing new ways you can use Google Lens and augmented reality (AR) while learning and shopping.

Visual tools to help you learn

For many families, adjusting to remote learning hasn’t been easy, but tools like Google Lens can help lighten the load. With Lens, you can search what you see using your camera. Lens can now recognize 15 billion things — up from 1 billion just two years ago — to help you identify plants, animals, landmarks and more. If you’re learning a new language, Lens can also translate more than 100 languages, such as Spanish and Arabic, and you can tap to hear words and sentences pronounced out loud.

If you’re a parent, your kids may ask you questions about things you never thought you’d need to remember, like quadratic equations. From the search bar in the Google app on Android and iOS, you can use Lens to get help on a homework problem. With step-by-step guides and videos, you can learn and understand the foundational concepts to solve math, chemistry, biology and physics problems.

Shop what you see with Google Lens

Another area where the camera can be helpful is shopping — especially when what you’re looking for is hard to describe in words. With Lens, you can already search for a product by taking a photo or screenshot. Now, we’re making it even easier to discover new products as you browse online on your phone. When you tap and hold an image on the Google app or Chrome on Android, Lens will find the exact or similar items, and suggest ways to style it. This feature is coming soon to the Google app on iOS.

Lens uses Style Engine technology which combines the world’s largest database of products with millions of style images. Then, it pattern matches to understand concepts like “ruffle sleeves” or “vintage denim” and how they pair with different apparel.

Bring the showroom to live with AR

AR experience of the 2020 Volvo XC40 Recharge

When you can’t go into stores to check out a product up close, AR can bring the showroom to you. If you’re in the market for a new car, for example, you’ll soon be able to search for it on Google and see an AR model right in front of you. You can easily check out what the car looks like in different colors, zoom in to see intricate details like buttons on the dashboard, view it against beautiful backdrops and even see it in your driveway. We’re experimenting with this feature in the U.S. and working with top auto brands, such as Volvo and Porsche, to bring these experiences to you soon.

🔗Don’t forget to check these amazing links:

Google AI

At Google, we think that AI can meaningfully improve people's lives and that the biggest impact will come when everyone…

ai.google

Google's Next Generation Music Recognition

In 2017 we launched Now Playing on the Pixel 2, using deep neural networks to bring low-power, always-on music…

ai.googleblog.com

AI

Explore the cutting-edge work Google is doing in AI and machine learning.

blog.google

Research - Google AI

Google tackles the most challenging problems in computer science. Our teams aspire to make discoveries that impact…