Natural Language Processing using Python

Project-based Learning

Traditional Machine Learning projects use numeric and textual data stored in conventional databases. Developing intelligent applications based on purely text data is extremely challenging? Why is it so? In the first place, the available text data in this world is millions of times more than the numeric data available to us in the conventional databases. So, the question is can we extract some useful information from this huge corpus of text data – which can run into several terabytes or rather petabytes. The moment you talk about these sizes for the data, the whole perspective of machine learning changes. In the traditional databases, the number of columns is quite low and thus the number of features for machine learning too is very small – generally goes in tens and at the most few hundreds, max. In NLP applications, as there are no columns like structured databases, each word in the text corpus becomes a probable candidate to be considered as a feature for model training. It is impossible to train a model with millions of features. So, to develop ML applications, the first and the major requirement is to reduce this features count by reducing the vocabulary. The other major requirement is to convert the text data into binary format as our dumb machine understand only binaries. That is where the NLP learning becomes distinct from model development on structured databases. Once the text data is pre-processed to get a minimal number of features that represent the entire text corpus, the rest of the model development process remains same as the traditional one – popularly known as Good Old Fashioned AI.

What you’ll learn

  • Text pre-processing techniques on humongous datasets.
  • Real-life project-based NLP development using Good Old Fashioned AI..

Course Content

  • Introduction –> 16 lectures • 4hr 33min.

Natural Language Processing using Python

Requirements

  • Python.
  • Some knowledge of classical ML algorithms.

Traditional Machine Learning projects use numeric and textual data stored in conventional databases. Developing intelligent applications based on purely text data is extremely challenging? Why is it so? In the first place, the available text data in this world is millions of times more than the numeric data available to us in the conventional databases. So, the question is can we extract some useful information from this huge corpus of text data – which can run into several terabytes or rather petabytes. The moment you talk about these sizes for the data, the whole perspective of machine learning changes. In the traditional databases, the number of columns is quite low and thus the number of features for machine learning too is very small – generally goes in tens and at the most few hundreds, max. In NLP applications, as there are no columns like structured databases, each word in the text corpus becomes a probable candidate to be considered as a feature for model training. It is impossible to train a model with millions of features. So, to develop ML applications, the first and the major requirement is to reduce this features count by reducing the vocabulary. The other major requirement is to convert the text data into binary format as our dumb machine understand only binaries. That is where the NLP learning becomes distinct from model development on structured databases. Once the text data is pre-processed to get a minimal number of features that represent the entire text corpus, the rest of the model development process remains same as the traditional one – popularly known as Good Old Fashioned AI.

In this course, you will learn many text pre-processing techniques to make the huge text datasets ready for machine learning. You will learn many text-preprocessing techniques such as stemming, lemmatization, removing stop words, position-of-speech (POS) tagging, bag-of-words, and tf-idf.

You will then learn to apply the traditional statistics based algorithms for training the models. You will develop five industry standard real-life NLP applications. These applications would cover a wide span of NLP domain. You will learn binary and multi-class classifications. You will use both supervised and unsupervised learning. You will learn to use unsupervised clustering on text data. You will use LDA (LatentDirichletAllocation) algorithm for clustering. You will use support vector machines for classifying text.

On the business side, you will learn sentiment analysis, classifying research articles, ranking hotels based on customer reviews, news summarization, topic modeling and a quick start to Natural Language Understanding (NLU).

This course helps in getting a quick start on NLP and mastering several NLP techniques through a very practical approach. Each lesson has code to practice that makes your learning easy and quick.

Get Tutorial