Introduction
Linear Algebra forms the mathematical backbone of data science, machine learning, artificial intelligence, and many modern computational systems.
Whenever we work with datasets, train models, or visualize information, we are implicitly working with mathematical structures such as scalars, vectors, and matrices.
In this tutorial, we will focus on two foundational concepts: scalars and vectors, which together form the basis for almost everything that follows in data science.
Before moving toward advanced topics like matrices, transformations, and optimization, it is essential to develop a deep and intuitive understanding of these building blocks.
Why Scalars and Vectors Are Important in Data Science
Every dataset that we use in data science is ultimately represented using numbers, and those numbers are organized using scalars and vectors.
A single value such as age or salary is a scalar, while a collection of values describing a person or an observation forms a vector.
Machine learning models operate on vectors rather than raw data, and understanding this fact allows you to better understand how predictions, classifications, and optimizations actually work.
1. Scalars
Definition of a Scalar
A scalar is a single numerical value that represents magnitude only and does not include any notion of direction. In simpler terms, a scalar tells us how much of something exists, but it does not tell us where or in which direction it exists.
Everyday Examples of Scalars
A speed of 60 km/h is a scalar because it only tells us how fast something is moving, not where it is moving.
A temperature of 25°C is also a scalar because it only expresses intensity, not direction.
These examples help us understand that scalars are extremely common in daily life, even if we do not consciously think about them.
Scalars in Data Science
In data science, scalars appear constantly in the form of statistical values and summary metrics. Some common examples include:
- The total number of records in a dataset
- The average value of a feature
- The minimum or maximum value in a column
Each of these represents a single numeric quantity, making them perfect examples of scalars.
Scalars in Machine Learning Models
Consider the simple linear regression equation:y=mx+c
In this equation, the value c is a scalar because it represents a constant numerical offset applied uniformly to all predictions. It does not depend on direction or dimensionality; it simply shifts the output up or down.
2. Vectors
Definition of a Vector (Conceptual View)
A vector is a quantity that has both magnitude and direction. This definition originates from physics but is also extremely important in data science.
For example:
- Speed with direction (e.g., 60 km/h towards the east) is a vector.
- Wind velocity is a vector.
- Force applied to an object is a vector.
Definition of a Vector in Data Science
In data science, a vector is best understood as an ordered list of numbers representing a data point in a multi-dimensional space.
Each value in the vector corresponds to one feature.
Example: Representing Data as Vectors
Suppose we have a dataset with the following features:
- IQ score
- Number of study hours
A single data point might look like:
This means the person has an IQ of 90 and studies for 3 hours, and together these values form a vector.
Each row in a dataset is therefore a vector describing one observation.
Understanding Vectors Using Coordinate Geometry
Two-Dimensional Space
If we represent the vector:
This means:
- Move 1 unit along the X-axis
- Move 2 units along the Y-axis
The resulting point represents a vector starting from the origin and ending at the coordinate (1, 2).
Magnitude of a Vector
The length or magnitude of a vector can be calculated using the Pythagorean theorem.
For example:
This value represents how far the vector extends from the origin.
Negative Coordinates and Direction
If a vector is:
This means the vector moves three units left and two units upward. Even though the direction is different, it is still a valid vector with a measurable magnitude.
High-Dimensional Vectors
In real-world data science applications, vectors often have hundreds or even thousands of dimensions.
For example:
Even though we cannot visualize such vectors, mathematical operations on them remain consistent and reliable.
Vectors in Machine Learning
Every data point used in machine learning is represented internally as a vector.
For example:
- Input features → input vector
- Model weights → weight vector
- Predictions → result of vector operations
Models learn by finding mathematical relationships between these vectors.
Classification Example
Imagine a dataset where:
- X-axis represents IQ
- Y-axis represents study hours
Each point in this space represents a student.
A machine learning model learns a boundary (usually a line or curve) that separates different classes such as “pass” and “fail.”
This boundary is often expressed using an equation like:
If a new data point lies:
- Above the line → Pass
- Below the line → Fail
Unit Vectors
A unit vector is a vector with a magnitude of exactly one.
Unit vectors are important because they represent direction without scaling.
Common unit vectors include:
- î, representing direction along the X-axis
- ĵ, representing direction along the Y-axis
Vector Representation Using Unit Vectors
A vector such as:
Can be written as:
This means the vector moves three units along the X-axis and three units along the Y-axis.
Vector Addition and Movement
When multiple vectors act together, their combined effect is calculated using vector addition.
For example:
- Two people pushing a box in different directions
- Forces acting on a moving vehicle
- Motion in video games
Even if the starting position changes, the direction and magnitude of the vector remain consistent, which is a fundamental property of vectors.
Vectors in Games and Simulations
Modern games rely heavily on vectors to calculate motion, collisions, gravity, and physics interactions.
Examples include:
- Vehicle movement and collisions
- Projectile motion
- Character movement in 3D environments
Every realistic animation is powered by vector mathematics.
