On-Device Machine Learning — MediaPipe

Pankaj Rai 🇮🇳
4 min readMay 25, 2023

Incorporating machine learning or AI capabilities is a straightforward approach to enhance the intelligence of any application, as they excel in performing intricate tasks with remarkable efficiency, often appearing almost magical. The fundamental question revolves around the difficulty of integrating such capabilities into applications or even edge devices.

MediaPipe
MediaPipe is an open-source framework for building cross-platform, hardware-accelerated machine learning pipelines. It can be used to perform a variety of tasks, such as face detection, hand tracking, object detection, text classification, image segmentation, gesture recognition, interactive image segmentation and many more. MediaPipe is designed to be easy to use, and it provides a variety of pre-trained models that can be used to quickly get started.

Here are some of the key features of MediaPipe:

  • Cross-platform: MediaPipe can be used on a variety of platforms, including Android, Web, Python.
  • Hardware-accelerated: MediaPipe uses hardware acceleration to improve performance.
  • Easy to use: MediaPipe is designed to be easy to use, even for beginners.
  • Pre-trained models: MediaPipe provides a variety of pre-trained models that can be used to quickly get started.

Three Main Components
MediaPipe basically have three components they are

  1. Tasks
  2. Model Maker
  3. Studio

Let’s have a look at each of them and first let’s get started with Tasks.

Tasks
It includes the set of libraries for deploying machine learning solutions onto the devices. It supports multiple platforms which includes Android, Web, Python and support for iOS is coming soon.
Following are the features of Tasks —

  1. Ease of use: The APIs offered are so easy to use that you can just focus on adding the ML capabilities to your app rather than focusing on how to write the logic for an inference.
  2. High Performance: MediaPipe Tasks provides optimized ML pipelines with end-to-end acceleration on CPU, GPU, and TPU to meet the needs of real-time on-device use cases by using a number of techniques, including Model optimization, Hardware acceleration and Pipeline optimization.

In the case of Android, the task is divided into three main packages — text, vision and audio.

Model Maker
The Model Maker is the second element, enabling you to personalize the model. In certain situations, the available options may not meet your specific needs. For example, you might desire to differentiate between images of dogs and cats, or perhaps you wish to incorporate your own gesture into a gesture recognition model. Fortunately, the Model Maker allows for this type of customization.

The model maker employs a machine learning training technique known as transfer learning. This technique involves retraining existing models using new data. If you’re unfamiliar with transfer learning, it essentially involves leveraging a substantial portion of the existing model’s logic. As a result, training requires less time compared to training a completely new model, and it can be accomplished with less data. In essence, transfer learning entails removing the final layer of an existing model and reconstructing it using new data.

What is the basic requirement while customizing the model?
With the Model Maker, it is possible to retrain models using significantly less data compared to training a brand new model. When retraining a model with new data, it is recommended to have approximately 100 data samples for each class that the model is trained on. For instance, if you intend to retrain an image classification model to recognize bus, card, and bike, you should ideally have around 100 images of bus, 100 images of car and 100 images of bike. It’s important to note that depending on your specific application, it may be feasible to retrain a useful model with even fewer data samples per category, although having a larger dataset generally enhances the accuracy of the model.
When creating your training dataset, it’s crucial to consider that the data will be divided during the retraining process. Typically, around 80% of the data is allocated for training, 10% for testing, and the remaining portion for validation.

Studio
MediaPipe Studio is an online tool designed for assessing and personalizing on-device machine learning (ML) models and pipelines for your applications. This web-based application enables you to conveniently evaluate MediaPipe solutions directly in your browser, using your own data and customized ML models. Additionally, within each solution demo, you have the flexibility to experiment with various model settings, such as adjusting the total number of results or setting a minimum confidence threshold for reporting results. With MediaPipe Studio, you can swiftly test and refine your ML-based solutions to meet your specific requirements.

Try out various capabilities from the studio today!
https://mediapipe-studio.webapps.google.com/home

To begin your journey with MediaPipe, you can visit the following website:
https://developers.google.com/mediapipe/solutions/guide

By accessing the official MediaPipe website, you can explore the available resources, documentation, tutorials, and other relevant information to get started with using MediaPipe for your projects.

--

--

Pankaj Rai 🇮🇳

Software Engineer | GDE Android & Firebase | YouTuber — All Techies