The future of AI for smartphones is on-device. Or make as many AI processes local as possible. Why? Well, you don’t need an internet connection to get the job done. Whether it’s asking a chatbot to proofread and fix grammatical mistakes, doing a brief research, editing images, or explaining the world around you through the camera.
Second, none of your personal data has to leave the device and get processed on a remote server. And third, it’s going to be faster. The smaller a model gets, the faster it can produce results. It’s a bit of a give-and-take situation. A light AI model means its capabilities are limited.
A bigger AI model, like Gemini or ChatGPT, can understand text, image, audio, and even generate video. These are large models, and they require a heck of a ton of processing power on custom chips. In a nutshell, you need an internet connection to make that happen. But something pretty cool is brewing, and that something comes from Google.
What is this AI app all about?
Nadeem Sarwar / Digital Trends
A few months ago, the company introduced an app called Google AI Edge Gallery. After staying on GitHub for a while, it finally made its way to the Play Store. Ideally, it’s an app for developers seeking to build AI experiences within their apps, but you can try it out without losing your sanity.
Think of it as a marketplace or store. But instead of finding apps, you can pick AI models to run on your phone. If you buy an Android phone today, like the Pixel 10 Pro, all the AI features are powered by Gemini. You can separately download apps such as ChatGPT or Claude, but they all require an internet connection and send your data to servers.
Nadeem Sarwar / Digital Trends
Google AI Edge Gallery is specifically made for running AI models offline. So, if you want to make sense of an image or summarize a long report, you can do it all offline. And here’s the best part. You can get it done using any AI model of your choice, without installing a dedicated app for it.
In a nutshell, this app is a one-stop shop for running AI experiences, totally free, and without any internet connection requirement. Now, why would you want to do that? Well, I can think of a few situations.
How is this app useful?
Nadeem Sarwar / Digital Trends
Let’s say you run into your cellular data limit, find yourself in a place with limited to no internet connection, or you simply don’t want to feed confidential reports to an online AI. Maybe you want a specialized AI that only does a specific task, such as turning a PDF file into a one-pager with bullet points. Or feeding images and getting an AI to write academic material based on them.
For all such scenarios, and more, you can simply turn to Google AI Edge Gallery, run the AI model of your choice, and get stuff done. At the moment, all the “compatible” models you need can be downloaded from the HuggingFace LiteRT Community library.
Nadeem Sarwar / Digital Trends
Here, you will find some fairly powerful AI models developed by Google in the Gemma series. These come with multimodal capabilities, which means they can handle text, image, and audio generation. You can, howevever, experiment with other AI models, as well, as such as DeepSeek, SmolVLM, Microsoft’s Phi-4 Mini, and Meta’s Llama.
Now, let me give a brief technical overview. All these AI models available for Google AI Edge Gallery are optimized for the high-performance runtime known as LiteRT, which is tailored specifically for on-device AI tasks. Just like the AI models mentioned above, LiteRT is also an open-source runtime for large language models (LLMs).
Nadeem Sarwar / Digital Trends
If you are well-versed with tools such as TensorFlow or PyTorch, you can even import any fittingly “compact” AI model stored on your PC. But first, you must convert the files into the .litertlm or .task file format. Once there, all you need to do is push the package into the “download” folder of the phone, and import it into the Google AI Edge gallery with a few taps.
How is the experience?
I mostly played around with the Gemma 3n model, since this one is the most versatile of the bunch. Aside from chats, it can also process images and generate audio, as well. You can select whether a model runs on CPU or GPU, adjust the sampling, and the temperature.
Nadeem Sarwar / Digital Trends
The latter, in the simplest terms, is a measure of how diverse an AI’s answers can be. Lower temperature produces outputs that are more predictable, definitive, and a bit repetitive. Higher temperature essentially produces answers that are accurate, but with extra creative input and higher chances of error.
Now, you don’t necessarily need to play with these fields too much. Just experiment with how well an AI model runs on CPU or GPU in terms of response rate, and accordingly keep it that way. I experimented with roughly nine models, and the takeaway has been mixed.
Nadeem Sarwar / Digital Trends
Let’s start with the differences. I shared a picture of my cat and asked Gemini to identify the species. It did so in three seconds. When the same query was pushed before Gemma 3n, it took 11 seconds. The response was accurate, but a bit short. If you prefer on-point answers, you might even like this approach. On occasions, you might run into errors, especially with multi-modal queries, so you want to change the accelerator (CPU and GPU) and see if it speeds things up.
Likewise, text processing can also be a tad slow. When I pushed an article worth around 900 words and asked AliBaba’s Qwen 2.5 model to summarize it as bullet points, it took its own sweet time of around 20 seconds to get started. Microsoft’s Phi-4 mini was noticeably faster at the job, but I liked the thoughtful formatting of Qwen 2.5 more.
Nadeem Sarwar / Digital Trends
Gemma 3n-E2B model was the fastest at the task, and also delivered the highest quality response in less than eight seconds. The more powerful Gemma-3n-E4B managed to reformat and formalize the tone of the same article in roughly seven seconds while running on the CPU.
Audio transcription, though limited to 30-second clips, is simply fantastic. Google’s Gemma 3n-E2B model did not make a single mistake and did a great job summarizing the transcribed audio clip. All that happened in less than 10 seconds.
Nadeem Sarwar / Digital Trends
Not all models work well with GPU acceleration, so you have to run them off the CPU. Gemma3-1B was stuck on processing for minutes. Trying to change the acceleration format crashes the app, especially with Qwen and Phi-4 mini. On the positive side, Phi-4 mini was nearly as fast as Gemma at certain article formatting tasks while running on the CPU.
A peek into the future
Now, this app won’t run on all phones. At the very least, it needs a processor with a powerful NPU or AI accelerator chip, and preferably, 8GB or more RAM. I ran my tests on the Google Pixel 10 Pro, and it didn’t get toasty. Additionally, you will need some technical knowledge if you want to run AI models that are not currently available in the LiteRT gallery.
Nadeem Sarwar / Digital Trends
Overall, Google’s AI Edge Gallery app is not quite a replacement for Gemini or any other internet-connected chatbot application on your phone. At least not yet. But it’s a sign of bright things to come. Look no further than the HuggingSnap app, which runs on an open-source model, fully offline, but enables Visual Intelligence capabilities on an iPhone.
As mobile processors get more AI-friendly at the hardware level, and we get more AI models optimized for on-device tasks, apps like Google AI Gallery might actually serve as a hub of useful AI chores. A more private hub, one that runs fully online, and doesn’t charge any fee while at it.
