# Getting Started with Natural Language Processing, for Developers (Part 1)

Not very long ago, NLP was limited to mathematical equations, research papers and very large organizations. However, at the present time, an abundance of open source software, libraries, corpora and learning material are available for the masses to harness. I had a hard time getting started with natural language processing, and I went lost in the piles of information I found online.

So I decided to put together a series of articles on NLP for developers. Thus I skipped mathematics, calculus and analytical geometry along with rocket science unless absolutely necessary. The series will focus on applications, rather than theories. However, I do encourage readers to get a technical understanding of how and why things work, the way they do. Let’s get started.

### The Setup (for now)

First of all, we will get our feet wet with holy NLP water through our friend [TextBlob](https://web.archive.org/web/20161109121649/https://textblob.readthedocs.io/en/dev/). It is a Python library for NLP. Why Python? Because Python is very popular for the purpose of NLP, AI, ML and scientific computing in general. TextBlob has a very easy and expressive API, making it easy for others to grasp and use.

> *You have probably heard of NLTK and how it saved the world from impending doom. So why are we not using NLTK? We are. TextBlob is a wrapper over NLTK, abstracting away many fine-tuned details. We are not here to get intimidated by NLTK, we are here to experience NLP in action.*

So, we will be needing:

* A computer running Mac OS X, BSD or a variant of Linux (I don’t know how things work in Windows)
    
* Working knowledge of Python 2.x or 3.x
    
* Gallons, Buckets, Containers or entire ship shipping ship full of patience
    

Start off by installing TextBlob.

```bash
pip install textblob
```

We are done (for now).

### Hello, NLP!

Let’s get into processing some sentences.

```python
from textblob import TextBlob

text = "Aniruddha did not attend class because he was sick. The lecturer marked him absent."
blob = TextBlob(text)
```

Okay, let’s break things down. The first line imports a class, nothing fancy. The second line declares a string, nothing fancy. Ah, the third line - we are creating a TextBlob object from our text string. Our blob of text is now ready for some cool NLP! Let’s figure break the whole text down into sentences.

#### Breaking Down (aka Tokenization)

Let’s try this. (BTW, we are on the Python Shell)

```python
blob.sentences
```

Output:

```nginx
[Sentence("Aniruddha did not attend class because he was sick."), Sentence("The lecturer marked him absent.")]
```

There are two sentences as we can see, broken down or **tokenized** just as we wanted. Let’s try something more granular, let’s break down the `Sentences` into `Words`.

```python
blob.sentences[0].words
```

![](https://web.archive.org/web/20161109121649im_/http://adhikary.net/forestryio/images/Screen%20Shot%202016-09-09%20at%207.15.05%20AM.png align="left")

That worked out quite well! Let’s get back to some grammar lessons from elementary school.

#### Parts of Speech Tagging

Parts of Speech (PoS) are the individual words of a sentence, they are classified into categories like nounds, pronouns, adjectives, adverbs, prepositions etc. Identifying PoS is an essential task in NLP. Let’s see how it all works.

```python
blob.sentences[0].pos_tags
```

Output:

```python
[('Aniruddha', u'NNP'),
 ('did', u'VBD'),
 ('not', u'RB'),
 ('attend', u'VB'),
 ('class', u'NN'),
 ('because', u'IN'),
 ('he', u'PRP'),
 ('was', u'VBD'),
 ('sick', u'JJ')]
```

Weirdo acronyms? Well, `('Aniruddha', 'NNP')` wants to tell us that, `Aniruddha` is an `NNP`, AKA a singular proper noun. Similarly, `JJ` indicates an adjective. How do I know?

```python
import nltk

nltk.help.upenn_tagset('NNP')
```

Now you can figure out the mysterious acronyms as well. Let’s try some transformations.

#### Transformers!

Let’s try something even cooler. Remember our `sentence[1]`, `Sentence("The lecturer marked him absent.")`? Let’s try to make `lecturer` plural.

```python
lecturer_word = blob.sentences[1].words[1]
lecturer_word.pluralize()
```

![](https://web.archive.org/web/20161109121649im_/http://adhikary.net/forestryio/images/Screen%20Shot%202016-09-09%20at%207.27.16%20AM.png align="left")

There are some similar operations in TextBlob, try them! Read the documentation.

#### Synonyms and related words

You could just build a very simple Dictionary of Synonyms when you have a Python interpreter. Let’s take a look at the synonyms of `marked`.

```python
marked_word = blob.sentences[1].words[2]
marked_word.get_synsets()
```

```python
[Synset('tag.v.01'),
Synset('mark.v.02'),
Synset('distinguish.v.03'),
Synset('commemorate.v.01'),
Synset('mark.v.05'),
Synset('stigmatize.v.01'),
Synset('notice.v.02'),
...]
```

That will enough for this part. Let me know in the comments if you liked TextBlob, and/or whether you want more stuff of NLP. Thanks for dropping by.
