πŸ’» Activity 4.3: Analyzing Text#

A common tasks for a computer program is to analyze text for structural characteristics. This can be used to determine readability. We have provided you with a text document from a news article. Build a set of functions, and use the map function to count

  1. The number of words

  2. The average word length

  3. The average number of words per sentence

  4. As a bonus plot a histogram of the word length. Use the matplotlib function hist

import requests

response = requests.get('https://raw.githubusercontent.com/DrexelEngineering/ENGR131_W2023/main/jupyterbook/week4/lecture4/assets/article.txt')
article = response.text

Print the article to read it

...
Ellipsis
print(article)
PHILADELPHIA - Drexel held Delaware without a field goal in overtime as the Dragons held on to defeat their long-time rival, 77-74, on Homecoming in front of a raucous crowd at the Daskalakis Athletic Center on Saturday. The Dragons (12-8) stayed unbeaten at home in Colonial Athletic Association play and moved to 6-2 in conference. Freshman Justin Moore scored a career-high 21 points, while Luke House added 19 points. Delaware fell to 11-10 (3-5).

The Blue Hens scored the first point of the overtime period just 17 seconds in, but that would be the last point they would score. Drexel took the lead for good with 3:15 to play when Amari Williams scored on a short jumper and was fouled, making it 75-74. Williams would miss the free throw, but Drexel came up with stops on the defensive end the rest of the way. In the final minute, Delaware had possession and with the ball in the hands of Jameer Nelson, Jr. He made a move in the paint, but his driving attempt was blocked by Williams. House rebounded the ball and was immediately fouled. House calmly sank both free throws, giving Drexel a three-point cushion. On the Hens final possession, Christian Ray drove towards the basket, but missed. LJ Owens came up with the rebound and fed Nelson. His long off balance 3-point attempt from the right corner came up short and Drexel escaped with the win.
 
Both teams had their chances to win in regulation. The Dragons held a seven-point lead with just under five minutes to play, but the Hens scored eight straight points to take the lead at 68-67 with 2:13 to play on a Cavan Reilly 3-pointer. The lead went back and forth as Nelson made consecutive fade away's and Moore countered with four free throws. Jyare Davis went to the free throw line with just 25 ticks left on the clock and made one of two to tie it at 73-73. The Dragons had what looked like the last possession. Moore made a nice move to get to the basket, but stepped on the baseline with 2.9 seconds left and the teams would go to the extra frame.
 
Moore had perhaps the best game of his young career. The freshman from Philadelphia was 8-for-8 from the line and was able to find his way to the basket throughout the game. House showed a different part of his game. He was 7-for-11 from the floor and did not miss on six attempts from inside the arc. He repeatedly found his way past defenders to score in the lane.
 
Williams had another outstanding all around game. He had five rejections, including two late when the game was in the balance. Williams scored 17 points, was 7-for-8 from the floor and grabbed six rebounds. Drexel shot 52 percent from the floor and made 19 of its 23 free throw attempts.
 
Davis and Nelson, Jr. combined for 49 of the Blue Hens 74 points. Davis scored from inside and out, finishing with 28 points. Nelson Jr. added 21 points. Each player added seven rebounds.
 
The Dragons will head to North Carolina for a pair of games next week. They face North Carolina A&T for the first time ever on Thursday night. Drexel will then visit Elon before the nationally-ranked College of Charleston comes to the DAC on February 2.   
print(article)
PHILADELPHIA - Drexel held Delaware without a field goal in overtime as the Dragons held on to defeat their long-time rival, 77-74, on Homecoming in front of a raucous crowd at the Daskalakis Athletic Center on Saturday. The Dragons (12-8) stayed unbeaten at home in Colonial Athletic Association play and moved to 6-2 in conference. Freshman Justin Moore scored a career-high 21 points, while Luke House added 19 points. Delaware fell to 11-10 (3-5).

The Blue Hens scored the first point of the overtime period just 17 seconds in, but that would be the last point they would score. Drexel took the lead for good with 3:15 to play when Amari Williams scored on a short jumper and was fouled, making it 75-74. Williams would miss the free throw, but Drexel came up with stops on the defensive end the rest of the way. In the final minute, Delaware had possession and with the ball in the hands of Jameer Nelson, Jr. He made a move in the paint, but his driving attempt was blocked by Williams. House rebounded the ball and was immediately fouled. House calmly sank both free throws, giving Drexel a three-point cushion. On the Hens final possession, Christian Ray drove towards the basket, but missed. LJ Owens came up with the rebound and fed Nelson. His long off balance 3-point attempt from the right corner came up short and Drexel escaped with the win.
 
Both teams had their chances to win in regulation. The Dragons held a seven-point lead with just under five minutes to play, but the Hens scored eight straight points to take the lead at 68-67 with 2:13 to play on a Cavan Reilly 3-pointer. The lead went back and forth as Nelson made consecutive fade away's and Moore countered with four free throws. Jyare Davis went to the free throw line with just 25 ticks left on the clock and made one of two to tie it at 73-73. The Dragons had what looked like the last possession. Moore made a nice move to get to the basket, but stepped on the baseline with 2.9 seconds left and the teams would go to the extra frame.
 
Moore had perhaps the best game of his young career. The freshman from Philadelphia was 8-for-8 from the line and was able to find his way to the basket throughout the game. House showed a different part of his game. He was 7-for-11 from the floor and did not miss on six attempts from inside the arc. He repeatedly found his way past defenders to score in the lane.
 
Williams had another outstanding all around game. He had five rejections, including two late when the game was in the balance. Williams scored 17 points, was 7-for-8 from the floor and grabbed six rebounds. Drexel shot 52 percent from the floor and made 19 of its 23 free throw attempts.
 
Davis and Nelson, Jr. combined for 49 of the Blue Hens 74 points. Davis scored from inside and out, finishing with 28 points. Nelson Jr. added 21 points. Each player added seven rebounds.
 
The Dragons will head to North Carolina for a pair of games next week. They face North Carolina A&T for the first time ever on Thursday night. Drexel will then visit Elon before the nationally-ranked College of Charleston comes to the DAC on February 2.   

Write a function that counts the number of strings split by a substring.

Note: for map to work you need to set a default string to split by

...
Ellipsis
def counter(article, split=" "):
    return len(article.split(split))
def counter(article, split=" "):
    return len(article.split(split))

Write a function that counts the length of a string

...
Ellipsis
def string_len(word):
    return len(word)
def string_len(word):
    return len(word)

Write your function that does all of the computation and then prints the results

...
Ellipsis
import numpy as np
word_counts = counter(article, " ")
sentence_counts = counter(article, ". ")
word_lengths = np.array(list(map(string_len, article.split(" "))))
average_word_length = word_lengths.sum() / word_counts

sentence_list = article.split(". ")
word_in_sentence = np.array(list(map(counter, sentence_list)))
average_sentence_length = word_in_sentence.sum() / len(sentence_list)

print(f"Word count = {word_counts}")
print(f"Average Word Length = {average_word_length:0.2f}")
print(f"Average Sentence Length = {average_sentence_length:0.2f}")
Word count = 563
Average Word Length = 4.58
Average Sentence Length = 17.06
word_counts = counter(article, " ")
sentence_counts = counter(article, ". ")
word_lengths = np.array(list(map(string_len, article.split(" "))))
average_word_length = word_lengths.sum() / word_counts

sentence_list = article.split(". ")
word_in_sentence = np.array(list(map(counter, sentence_list)))
average_sentence_length = word_in_sentence.sum() / len(sentence_list)

print(f"Word count = {word_counts}")
print(f"Average Word Length = {average_word_length:0.2f}")
print(f"Average Sentence Length = {average_sentence_length:0.2f}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 3
      1 word_counts = counter(article, " ")
      2 sentence_counts = counter(article, ". ")
----> 3 word_lengths = np.array(list(map(string_len, article.split(" "))))
      4 average_word_length = word_lengths.sum() / word_counts
      6 sentence_list = article.split(". ")

NameError: name 'np' is not defined

Code to make a histogram of the word length

...
Ellipsis
import matplotlib.pyplot as plt

plt.hist(word_lengths, 10)
(array([ 14., 205., 183.,  48.,  75.,  20.,  14.,   3.,   0.,   1.]),
 array([ 0. ,  1.7,  3.4,  5.1,  6.8,  8.5, 10.2, 11.9, 13.6, 15.3, 17. ]),
 <BarContainer object of 10 artists>)
../../../_images/8_activity_4_3_16_1.png
import matplotlib.pyplot as plt

plt.hist(word_lengths, 10)