Category: Machine learning
-
Bag of Words (BoW) intuition
1. Introduction to Bag of Words In Natural Language Processing, we often need to convert text data into numerical form so that machine learning models can understand it.The Bag of Words (BoW) model is one of the simplest and most intuitive techniques for this purpose. BoW represents a text (like a sentence, paragraph, or document)…
-
One-Hot Encoding
1. Introduction to One-Hot Encoding One-Hot Encoding (OHE) is a data preprocessing technique used to convert categorical data into a numerical format that can be fed into machine learning models. Machine learning algorithms like Linear Regression, Logistic Regression, SVM, or Neural Networks cannot directly understand text labels such as “Red” or “Blue”.They require numerical input.…
-
Named Entity Recognition
1. Introduction to Named Entity Recognition (NER) Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) that identifies specific types of entities in text and classifies them into predefined categories such as: For example: 2. Why NER Is Important Most real-world data—news, reports, and emails—is unstructured. NER transforms this text into structured…
-
Text Preprocessing: Part-of-Speech (POS) Tagging using NLTK
1. Introduction In Natural Language Processing (NLP), Part-of-Speech (POS) Tagging is the process of assigning grammatical labels (like noun, verb, adjective, etc.) to each word in a sentence. For example, in the sentence:“Arjun is learning NLP.”the words can be tagged as: POS tagging provides important syntactic information that helps NLP systems understand how words relate…
-
Text Preprocessing: Stopwords Removal using NLTK
1. Introduction When working with text data, not all words contribute equally to understanding the meaning of a sentence.Words like “is”, “the”, “in”, “at”, “of”, “an”, and “and” occur very frequently in text but usually do not carry important semantic meaning for NLP tasks such as classification, sentiment analysis, or topic modeling. These words are…
-
Text Preprocessing: Lemmatization using NLTK
1. Introduction When working with textual data in Natural Language Processing (NLP), it’s crucial to reduce words to their base or root form. This process ensures that words with similar meanings are treated alike by algorithms. While stemming reduces words by chopping off prefixes or suffixes using rules, it often produces non-dictionary stems (like “studying”…
-
Text Preprocessing: Stemming using NLTK
1. Introduction In Natural Language Processing (NLP), text preprocessing is a crucial step before feeding data into machine learning or deep learning models. One important part of preprocessing is stemming, which is the process of reducing words to their root or base form. For example: Notice that the stemmed form may not always be a…
-
Using Tokenization in NLTK
NLTK provides multiple tokenizers that behave differently when handling punctuation and special characters. Let’s look at three alternatives: 1. WordPunctTokenizer The WordPunctTokenizer splits text more aggressively based on punctuation. Output: Notice that “Arjun’s” has been split into [“Arjun”, “‘”, “s”], unlike word_tokenize(), which kept “‘s” together. 2. TreebankWordTokenizer TreebankWordTokenizer follows conventions used in the Penn…
-
Understanding Basic NLP Terminologies and Tokenization
Natural Language Processing (NLP) is a fundamental area of Artificial Intelligence that focuses on enabling computers to understand, interpret, and generate human language. Before diving into complex NLP concepts, it’s essential to understand some foundational terms and one of the most important preprocessing steps — tokenization. This tutorial introduces basic terminologies used throughout NLP and…
-
Streamlit Tutorial – A Complete Guide from Setup to Running Your First App
1. Introduction to Streamlit Streamlit is an open-source Python library that allows you to build interactive web applications for data science and machine learning projects—without needing any web development experience. It transforms Python scripts into beautiful, shareable web apps in just a few lines of code. Key features of Streamlit: 2. Why Use Streamlit? Streamlit…
