In machine learning, especially in deep learning, tensors are the fundamental data structures used to store and manipulate data. Whether you are training a neural network, passing inputs through layers, or computing gradients, you are working with tensors.
1. What is a Tensor?
A tensor is a multi-dimensional array—a container that can hold numbers in more than two dimensions. It is a generalization of scalars, vectors, and matrices.
Mathematically, a tensor is an object that can be represented as an array of components that are functions of coordinates and obey transformation rules under coordinate changes.
In programming and machine learning, we simplify this to:
A tensor is a structured collection of numbers arranged across one or more dimensions.
2. Scalars, Vectors, Matrices, and Tensors
To understand tensors, it helps to first understand lower-dimensional data structures:
Object | Description | Tensor Rank | Shape Example |
Scalar | Single number | 0 | 5 or π |
Vector | 1D array of numbers | 1 | [1, 2, 3] |
Matrix | 2D array of numbers | 2 | [[1, 2], [3, 4]] |
Tensor | 3D or higher-dimensional array | 3 or more | [[[1, 2], [3, 4]]] |
So:
- A scalar is a rank-0 tensor
- A vector is a rank-1 tensor
- A matrix is a rank-2 tensor
- Anything with 3 or more dimensions is considered a higher-rank tensor
3. Tensor Rank and Dimensions
The rank (also called order or degree) of a tensor is the number of dimensions (axes).
For example:
import numpy as np a = np.array(5) # Scalar: rank 0 b = np.array([1, 2, 3]) # Vector: rank 1 c = np.array([[1, 2], [3, 4]]) # Matrix: rank 2 d = np.array([[[1], [2]], [[3], [4]]]) # Tensor: rank 3
a.shape
→()
b.shape
→(3,)
c.shape
→(2, 2)
d.shape
→(2, 2, 1)
The number of elements along each axis defines the shape of the tensor.
4. Why Tensors Matter in Machine Learning
Tensors are essential in machine learning because:
- All inputs, outputs, weights, and activations in ML/DL models are stored as tensors
- They can represent structured data of any shape: sequences, images, audio, etc.
- ML frameworks are optimized for tensor computations on GPUs/TPUs
- Gradient computation (backpropagation) is done using tensor calculus
- The efficiency of tensor operations affects the training time and scalability of models.
5. Real-World Example: Image Classification using Deep Learning
Suppose you’re building a machine learning model to classify animal images (e.g., cats, dogs, horses). You’ve collected thousands of images, and you want to train a neural network using them.
Step 1: A Single Image as a Tensor
Let’s say each image is:
- Width: 64 pixels
- Height: 64 pixels
- Color channels: 3 (Red, Green, Blue – RGB)
This means every image is stored as a 3-dimensional tensor of shape:
(64, 64, 3)
- Axis 0: 64 rows (height)
- Axis 1: 64 columns (width)
- Axis 2: 3 values for RGB channels
Sample Pixel Tensor for One Image:
image = [ [ [255, 0, 0], [254, 1, 0], ..., [0, 0, 255] ], # row 1 (64 pixels) ... [ [34, 67, 89], [10, 20, 30], ..., [200, 200, 200] ] # row 64 ]
Each innermost list like [255, 0, 0]
represents the color values for a pixel.
Step 2: A Batch of Images
You rarely feed just one image to a neural network. Typically, you use a batch of images.
Suppose you use a batch size of 32. Now, the input becomes a 4-dimensional tensor:
(32, 64, 64, 3)
- 32 images
- Each with height 64
- Each with width 64
- Each pixel with 3 RGB channels
This is the input tensor your model will receive in one forward pass.
Step 3: Corresponding Labels
Let’s say your dataset has 3 classes: Cat
, Dog
, and Horse
.
You might represent the labels as one-hot encoded vectors:
Cat
→[1, 0, 0]
Dog
→[0, 1, 0]
Horse
→[0, 0, 1]
For a batch of 32 images, your label tensor will be:
(32, 3)
Step 4: Tensor Summary in the ML Model
Data Component | Description | Tensor Shape | Rank |
Single image | RGB image | (64, 64, 3) | 3 |
Batch of images | 32 RGB images | (32, 64, 64, 3) | 4 |
Label for one image | One-hot vector | (3,) | 1 |
Labels for batch | One-hot for 32 images | (32, 3) | 2 |
Output of model | Predicted probabilities | (32, 3) | 2 |
Visual Representation (Shape Only)
Input Tensor: ┌────────────────────────────┐ │ 32 Images │ │ ┌────────────────────────┐│ │ │64 x 64 x 3 (each image)││ │ └────────────────────────┘│ └────────────────────────────┘ Shape: (32, 64, 64, 3)