Tensor in Machine Learning

In machine learning, especially in deep learning, tensors are the fundamental data structures used to store and manipulate data. Whether you are training a neural network, passing inputs through layers, or computing gradients, you are working with tensors.

1. What is a Tensor?

A tensor is a multi-dimensional array—a container that can hold numbers in more than two dimensions. It is a generalization of scalars, vectors, and matrices.

Mathematically, a tensor is an object that can be represented as an array of components that are functions of coordinates and obey transformation rules under coordinate changes.

In programming and machine learning, we simplify this to:

A tensor is a structured collection of numbers arranged across one or more dimensions.

2. Scalars, Vectors, Matrices, and Tensors

To understand tensors, it helps to first understand lower-dimensional data structures:

Object	Description	Tensor Rank	Shape Example
Scalar	Single number	0	`5` or `π`
Vector	1D array of numbers	1	[1, 2, 3]
Matrix	2D array of numbers	2	[[1, 2], [3, 4]]
Tensor	3D or higher-dimensional array	3 or more	[[[1, 2], [3, 4]]]

So:

A scalar is a rank-0 tensor
A vector is a rank-1 tensor
A matrix is a rank-2 tensor
Anything with 3 or more dimensions is considered a higher-rank tensor

3. Tensor Rank and Dimensions

The rank (also called order or degree) of a tensor is the number of dimensions (axes).

For example:

import numpy as np

a = np.array(5)                 # Scalar: rank 0
b = np.array([1, 2, 3])         # Vector: rank 1
c = np.array([[1, 2], [3, 4]])  # Matrix: rank 2
d = np.array([[[1], [2]], [[3], [4]]])  # Tensor: rank 3

a.shape → ()
b.shape → (3,)
c.shape → (2, 2)
d.shape → (2, 2, 1)

The number of elements along each axis defines the shape of the tensor.

4. Why Tensors Matter in Machine Learning

Tensors are essential in machine learning because:

All inputs, outputs, weights, and activations in ML/DL models are stored as tensors
They can represent structured data of any shape: sequences, images, audio, etc.
ML frameworks are optimized for tensor computations on GPUs/TPUs
Gradient computation (backpropagation) is done using tensor calculus
The efficiency of tensor operations affects the training time and scalability of models.

5. Real-World Example: Image Classification using Deep Learning

Suppose you’re building a machine learning model to classify animal images (e.g., cats, dogs, horses). You’ve collected thousands of images, and you want to train a neural network using them.

Step 1: A Single Image as a Tensor

Let’s say each image is:

Width: 64 pixels
Height: 64 pixels
Color channels: 3 (Red, Green, Blue – RGB)

This means every image is stored as a 3-dimensional tensor of shape:

(64, 64, 3)

Axis 0: 64 rows (height)
Axis 1: 64 columns (width)
Axis 2: 3 values for RGB channels

Sample Pixel Tensor for One Image:

image = [
  [ [255, 0, 0], [254, 1, 0], ..., [0, 0, 255] ],  # row 1 (64 pixels)
  ...
  [ [34, 67, 89], [10, 20, 30], ..., [200, 200, 200] ]  # row 64
]

Each innermost list like [255, 0, 0] represents the color values for a pixel.

Step 2: A Batch of Images

You rarely feed just one image to a neural network. Typically, you use a batch of images.

Suppose you use a batch size of 32. Now, the input becomes a 4-dimensional tensor:

(32, 64, 64, 3)

32 images
Each with height 64
Each with width 64
Each pixel with 3 RGB channels

This is the input tensor your model will receive in one forward pass.

Step 3: Corresponding Labels

Let’s say your dataset has 3 classes: Cat, Dog, and Horse.

You might represent the labels as one-hot encoded vectors:

Cat → [1, 0, 0]
Dog → [0, 1, 0]
Horse → [0, 0, 1]

For a batch of 32 images, your label tensor will be:

(32, 3)

Step 4: Tensor Summary in the ML Model

Data Component	Description	Tensor Shape	Rank
Single image	RGB image	`(64, 64, 3)`	3
Batch of images	32 RGB images	(32, 64, 64, 3)	4
Label for one image	One-hot vector	`(3,)`	1
Labels for batch	One-hot for 32 images	`(32, 3)`	2
Output of model	Predicted probabilities	`(32, 3)`	2

Visual Representation (Shape Only)

Input Tensor:
┌────────────────────────────┐
│ 32 Images                  │
│ ┌────────────────────────┐│
│ │64 x 64 x 3 (each image)││
│ └────────────────────────┘│
└────────────────────────────┘
Shape: (32, 64, 64, 3)