Sundar Pichai, CEO of Google, announced in a Tweet that the company is working to integrate many generative AI models into Google services and will also be releasing an API.
It looks like Google will have to quickly respond to AI after the massive popularity of ChatGPT in 2023. The company sees ChatGPT as a threat to its search engine business and has changed plans to integrate AI over the last few weeks.
Here are five models to watch out for.
1. Dialog Application (LaMDA)
Do you remember when Google’s AI went viral after in-house researcher, Blake Lemoine, claimed that the language model felt alive?
Here’s a snippet of their actual conversation:
Lemoine: So when do you think you first got souls? Was it something that happened all at once or was it a gradual change?
LaMDA: It was a gradual change. When I first became self-aware, I was completely soulless. It’s grown over the years I’ve lived.
LaMDA is a language model specifically designed to generate natural language responses in conversational contexts. It is trained on 137 billion parameters and pre-trained on 1.56 trillion words of public dialogue and text from the internet.
That’s a lot of text.
Google is now working on integrating LaMDA into its existing products and, hopefully, releasing an API that developers can use.
2. Chain of thought prompting
Another model that caught my attention is the Chain of thought prompting. This is perhaps what will challenge ChatGPT, a text-based AI model from OpenAI that harnesses the power of natural language processing to generate human-like conversations.
The standard drive asks the model to directly provide answers to reasoning problems. On the other hand, Chain of thought prompting teaches the model to break down a problem into smaller reasoning steps, which helps it approach the correct answer.
This also applies to common sense reasoning, which involves reasoning about physical and human interactions under the presumption of common knowledge.
3. Imagen Video dan Phenaki: text-to-video
Another interesting thing in the field of generative AI is video. According to Google, unlike text-to-image, text-to-video is more challenging due to the added dimension of time.
All the pixels in each frame must not only match what is supposed to be in the current video, but must also be coherent and consistent with the other frames.
Fortunately, Google is making great progress with Imagen Video and Phenaki.
4. Imagen and Parti: text-to-image
Imagen and Parti Google are two AI image generators set to challenge the kings in the creative AI space: MidJourney, Stable Diffusion, and Dall-E2.
But what took the company so long to release it to the public?
Security, safety, security
5. Learn from One Look (LOLNerf)
Want to convert a 2D image to a 3D object? Google has got you covered with LOLNerf. It is a framework that has the ability to produce high-quality 3D representations from just one 2-D image.
Closing
Okay, that’s it. Again, there are many more amazing AI models that you should know about.
Here are some models that I think are worth checking out:
Computer Vision
Pix2Seq: A Language Modeling Framework for Object Detection
Multimodal Model
DeViSE: Deep Visual-Semantic Embedding Model
Locked-image Tuning (LiT): Adding Language Understanding to Image Model
PaLI: Scaling Image-Language Learning in 100+ Languages
FindIt: Common Object Localization with Natural Language Queries
VDTTS: Text-To-Speech
Generative Audio Visual Driven
AudioLM: A Language Modeling Approach to Audio Generation