πŸ’» Activity 3.2: Profanity & Word Count Detector#

Task 1: Detect & Replace Profanity#

The growth of public forums has required automated filters to remove profanity and other inappropriate content from the web. We have provided you with two emails from a newsgroup dataset. We would like you to find and remove the profanity using string tools.

Since the articles selected do not have profane content we will assume the word β€œphilosopher” is profane.

from sklearn.datasets import fetch_20newsgroups
newsgroups_train = fetch_20newsgroups(subset='train', categories = ['sci.med'])
Article_1 = newsgroups_train.data[0]
Article_2 = newsgroups_train.data[1]
print(Article_1)
From: nyeda@cnsvax.uwec.edu (David Nye)
Subject: Re: Post Polio Syndrome Information Needed Please !!!
Organization: University of Wisconsin Eau Claire
Lines: 21

[reply to keith@actrix.gen.nz (Keith Stewart)]
 
>My wife has become interested through an acquaintance in Post-Polio
>Syndrome This apparently is not recognised in New Zealand and different
>symptons ( eg chest complaints) are treated separately. Does anone have
>any information on it
 
It would help if you (and anyone else asking for medical information on
some subject) could ask specific questions, as no one is likely to type
in a textbook chapter covering all aspects of the subject.  If you are
looking for a comprehensive review, ask your local hospital librarian.
Most are happy to help with a request of this sort.
 
Briefly, this is a condition in which patients who have significant
residual weakness from childhood polio notice progression of the
weakness as they get older.  One theory is that the remaining motor
neurons have to work harder and so die sooner.
 
David Nye (nyeda@cnsvax.uwec.edu).  Midelfort Clinic, Eau Claire WI
This is patently absurd; but whoever wishes to become a philosopher
must learn not to be frightened by absurdities. -- Bertrand Russell
print(Article_2)
From: koreth@spud.Hyperion.COM (Steven Grimm)
Subject: Re: Opinions on Allergy (Hay Fever) shots?
Organization: Hyperion, Mountain View, CA, USA
Lines: 7
NNTP-Posting-Host: spud.hyperion.com

I had allergy shots for about four years starting as a sophomore in high
school.  Before that, I used to get bloody noses, nighttime asthma attacks,
and eyes so itchy I couldn't get to sleep.  After about 6 months on the
shots, most of those symptoms were gone, and they haven't come back.  I
stopped getting the shots (due more to laziness than planning) in college.
My allergies got a little worse after that, but are still nowhere near as
bad as they used to be.  So yes, the shots do work.
  1. Determine if there is a profane word in the article?

# Article 1
...
Hide code cell content
"philosopher" in Article_1
True
# Article 2
...
Hide code cell content
"philosopher" in Article_2
False
  1. Replace the profane word with ****

# Replace 
...
Hide code cell content
Article_1 = Article_1.replace("philosopher", '****')
# check both articles visually
print(Article_1)
print(Article_2)
From: nyeda@cnsvax.uwec.edu (David Nye)
Subject: Re: Post Polio Syndrome Information Needed Please !!!
Organization: University of Wisconsin Eau Claire
Lines: 21

[reply to keith@actrix.gen.nz (Keith Stewart)]
 
>My wife has become interested through an acquaintance in Post-Polio
>Syndrome This apparently is not recognised in New Zealand and different
>symptons ( eg chest complaints) are treated separately. Does anone have
>any information on it
 
It would help if you (and anyone else asking for medical information on
some subject) could ask specific questions, as no one is likely to type
in a textbook chapter covering all aspects of the subject.  If you are
looking for a comprehensive review, ask your local hospital librarian.
Most are happy to help with a request of this sort.
 
Briefly, this is a condition in which patients who have significant
residual weakness from childhood polio notice progression of the
weakness as they get older.  One theory is that the remaining motor
neurons have to work harder and so die sooner.
 
David Nye (nyeda@cnsvax.uwec.edu).  Midelfort Clinic, Eau Claire WI
This is patently absurd; but whoever wishes to become a ****
must learn not to be frightened by absurdities. -- Bertrand Russell

From: koreth@spud.Hyperion.COM (Steven Grimm)
Subject: Re: Opinions on Allergy (Hay Fever) shots?
Organization: Hyperion, Mountain View, CA, USA
Lines: 7
NNTP-Posting-Host: spud.hyperion.com

I had allergy shots for about four years starting as a sophomore in high
school.  Before that, I used to get bloody noses, nighttime asthma attacks,
and eyes so itchy I couldn't get to sleep.  After about 6 months on the
shots, most of those symptoms were gone, and they haven't come back.  I
stopped getting the shots (due more to laziness than planning) in college.
My allergies got a little worse after that, but are still nowhere near as
bad as they used to be.  So yes, the shots do work.

Task 2: Evaluate Word Limit#

Some forums may like to impose a word limit on posts.

Use what you have learned about methods that operate on strings to

  1. count the number of words, and

  2. determine if the number of words in each article is greater than the word limit of 200.

...
Ellipsis
Hide code cell content
print(f"Article 1 has {len(Article_1.split(' '))} words")
print(f"Article 2 has {len(Article_2.split(' '))} words")
Article 1 has 180 words
Article 2 has 107 words