MnasNet: Revolutionizing Mobile AI with Efficiency and Accuracy

Allan Kouidri
-
3/12/2024
MnasNet classification on taxi

Google's MnasNet represents a pivotal advancement in mobile artificial intelligence, targeting the creation of models that are both powerful and efficient for mobile use. Traditional convolutional neural networks (CNNs), while effective for a range of applications like image classification and face recognition, often struggle to balance size, speed, and accuracy when adapted for mobile devices. 

Previous efforts, including MobileNet and MobileNetV2, have made strides towards optimizing mobile models, yet manually achieving efficiency remains a complex challenge.

MnasNet introduces an AutoML-based approach, leveraging reinforcement learning in neural architecture search to craft models specifically designed for mobile constraints. By incorporating mobile speed requirements directly into its reward function, MnasNet efficiently navigates the trade-off between accuracy and speed. 

This post explores the key features of MnasNet, showcasing its potential to revolutionize mobile computing.

What is MnasNet?

MnasNet, which stands for Multi-Objective Neural Architecture Search Network, is a framework designed to create neural network models that optimize both accuracy and efficiency.

The development of MnasNet by Google’s AI team was motivated by the need for deep learning models that can operate within the constraints of mobile devices, such as limited processing power, memory, and energy. Traditional neural networks, while powerful, often require substantial computational resources, making them impractical for mobile applications. 

MnasNet addresses this challenge through an innovative approach called Neural Architecture Search (NAS), which automates the design of models that are both accurate and lightweight.

How MnasNet Works?

The MnasNet architecture is specifically designed to optimize for both high accuracy and efficiency, particularly suited for mobile devices. It adopts a multi-objective approach that not only seeks to improve performance metrics such as accuracy but also considers the real-world constraints of mobile deployment, such as latency and computational resources. 

MnasNet employs a reinforcement learning-based NAS framework that searches for the optimal architecture by balancing a dual-objective function: maximizing prediction accuracy while minimizing computational cost. The framework evaluates potential architectures based on their performance on a specific task (e.g., image recognition) and their efficiency (measured in terms of latency or power consumption on mobile devices).

Overview of Platform-Aware Neural Architecture Search (NAS)
Overview of Platform-Aware Neural Architecture Search (NAS) for Mobile. [1]

MnasNet Architecture 

  • Baseline Structure: The MnasNet model partitions a Convolutional Neural Network (CNN) into a sequence of predefined blocks. These blocks reduce input resolutions gradually while increasing the filter sizes, a common strategy in many CNN models. Each block consists of layers that are identical in operations and connections, determined by a per-block sub search space.
  • Sub Search Space: For each block, the sub search space includes choices for convolutional operations (regular convolution, depthwise convolution, and mobile inverted bottleneck convolution), kernel sizes (3x3, 5x5), squeeze-and-excitation ratios (0, 0.25), skip operations (pooling, identity, residual, or no skip), output filter size, and the number of layers per block.
  • Layer and Block Configuration: The architecture of a layer is defined by the convolution operation, kernel size, squeeze-and-excitation ratio, skip operation, and output filter size, while the number of repetitions for the layer within a block is also specified. For instance, a layer in block 4 might have an inverted bottleneck with a 5x5 convolution and an identity residual skip path, repeated a certain number of times.

Key Features

  • Factorized Hierarchical Search Space: This innovative search space balances layer diversity and the overall size of the search space. It divides the network into blocks, each with its sub search space, significantly reducing the total search space size compared to a flat per-layer search space approach.
MnasNet Factorized Hierarchical Search Space
[1]

  • Reinforcement Learning-Based Search Algorithm: MnasNet uses a reinforcement learning approach to find optimal solutions that balance accuracy and efficiency. It employs a recurrent neural network (RNN) based controller, a trainer for model accuracy evaluation, and a mobile phone-based inference engine for latency measurement. The architecture search process is guided by a reward function that accounts for both accuracy and latency, aiming to find models that perform well under mobile constraints.

Architectural Specifics

  • MnasNet-A1 Model: A representative model selected from the search, showcasing a variety of layer architectures. It includes both 3x3 and 5x5 convolutions, indicating a departure from previous mobile models that predominantly used only 3x3 convolutions. The model demonstrates the importance of layer diversity, with performance comparisons showing better accuracy-latency trade-offs than variants with a single type of layer repeated throughout the network.

MnasNet-A1 Model
[1]

Features and Benefits of MnasNet

- Efficiency and Accuracy: MnasNet architectures achieve a remarkable balance between efficiency and accuracy. They can run faster on mobile devices without significant compromises in performance. This is particularly beneficial for applications requiring real-time processing, such as augmented reality (AR) and voice assistants.

Accuracy vs. Latency Comparison
Accuracy vs. Latency Comparison. [1]

MnasNet only match the ImageNet top 1 accuracy of leading designs but do so with significantly enhanced speed, running up to 1.5x faster than MobileNetV2 and 2.4x faster than NASNet.

The real-world impact of MnasNet is significant. For instance, in image classification tasks on the ImageNet dataset, MnasNet models have demonstrated superior performance to traditional models with a fraction of the computational cost. This efficiency enables more advanced AI features on mobile devices, enhancing user experiences across a variety of applications.

- Flexibility: The MnasNet framework is adaptable to various tasks beyond image recognition, such as natural language processing and video analysis. Its flexible design allows it to be tailored to a wide range of applications and devices.

- Scalability: MnasNet models are scalable, meaning they can be adjusted to fit the computational budget of different devices. This makes MnasNet suitable for a spectrum of mobile devices, from high-end smartphones to more modest hardware.

Applications of MnasNet 

MnasNet, with its efficient and powerful architecture, has found a wide range of applications across various fields of technology. Its ability to balance high accuracy with low computational costs makes it particularly suitable for deployment in mobile and embedded systems. Here, we explore some of the key applications of MnasNet in modern technology:

Mobile Vision Applications

MnasNet's architecture is optimized for mobile devices, making it an ideal choice for mobile vision applications such as real-time image and video analysis. Its efficiency enables applications like augmented reality (AR), facial recognition, and live video content analysis to run smoothly on smartphones and tablets without draining the battery or requiring high-end hardware.

Edge Computing

In edge computing, data is processed close to the source rather than being sent to distant cloud servers. MnasNet's low latency and high efficiency make it perfect for edge devices, which often have limited computational resources. It enables smarter IoT devices capable of complex tasks like surveillance with real-time object detection, smart home devices with visual recognition capabilities, and autonomous drones or vehicles that require instant decision-making based on visual inputs.

Wearable Devices

Wearable technology, such as smart glasses and health monitoring devices, benefits significantly from MnasNet's efficiency. It allows these devices to perform tasks like activity recognition, health monitoring through image-based diagnostics, and providing visual assistance without compromising battery life.

Environmental Monitoring

MnasNet can also be applied in environmental monitoring systems, where drones or stationary cameras capture real-time data on wildlife, vegetation, or pollution. Its ability to quickly process images on-device helps in tracking changes in the environment, identifying illegal activities like poaching or logging, and monitoring endangered species.

Manufacturing and Quality Control

In manufacturing, MnasNet can be employed for quality control by analyzing images of products on the assembly line to detect defects or irregularities. Its fast processing speeds allow for real-time feedback and minimal disruption to the production process.

Healthcare

MnasNet's application in healthcare includes portable diagnostic devices, where it can help in analyzing medical images such as X-rays or ultrasound images on-the-go, providing support in remote areas or emergency situations where access to sophisticated medical imaging equipment might be limited.

Easily run MnasNet for image classification 

With the Ikomia API, you can effortlessly classify your image with MnasNet with just a few lines of code.

Setup

To get started, you need to install the API in a virtual environment [2].


pip install ikomia

Run MnasNet image classification with a few lines of code

You can also directly charge the notebook we have prepared.


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display


# Init your workflow
wf = Workflow()

# Add algorithm
algo = wf.add_task(name="infer_torchvision_mnasnet", auto_connect=True)
algo.set_parameters({"input_size": "224"})

# Run directly on your image
wf.run_on(url="https://images.pexels.com/photos/17040817/pexels-photo-17040817.jpeg?cs=srgb&dl=pexels-alex-agrico-17040817.jpg&fm=jpg&w=1280&h=853")

# Inspect your result
display(algo.get_image_with_graphics())


MnasNet classification cabs
MnasNet classification output. Results: 'cab, hack, taxi, taxicab': 0.97

List of parameters:

  • input_size (int) - default '224': Size of the input image.

If you are using a MnasNet custom model: 

  • model_weight_file (str, optional): Path to model weights file.
  • class_file (str, optional): Path to text file (.txt) containing class names.

Train your own MnasNet model

This article has explored MnasNet, a powerful deep learning model, and demonstrated the Ikomia API's role in facilitating the application of MnasNet algorithms.

To dive deeper, explore how to train a classification model like MnasNet your custom dataset →

Resources

  • Ikomia HUB displays a range of algorithms, featuring accessible code snippets for straightforward experimentation and evaluation.
  • Detailed guidance on leveraging the API to its fullest is available through Ikomia documentation.
  • For a more intuitive, visual approach to image processing, Ikomia STUDIO enhances the ecosystem with a user-friendly interface, mirroring the API's functionalities.

References

‍[1] MnasNet: Platform-Aware Neural Architecture Search for Mobile

[2] How to create a virtual environment in Python

Arrow
Arrow
No items found.
#API

Build with Python API

#STUDIO

Create with STUDIO app

#SCALE

Deploy with SCALE