September 14, 2009 · homework

Overall most students did quite well. Comments are in your svn direcotries.

mean 54.4
standard deviation 6.24
  1. Print out the entries (orthography only) from the celex.txt file which were taken from GOOGLE. Hint: You will need to use a pipe. (6 points)

    grep GOOGLE celex.txt | cut -f2 -d '\'
  2. Print out the 50 most frequent words from the celex.txt file which were taken from GOOGLE. Hint: You will need to combine the answers from the last 2 questions. (9 points)

    grep GOOGLE celex.txt | cut -f2,3 -d '\' | sort -t '\' -k 2,2rn | head -n 50
    OR
    grep GOOGLE celex.txt | sort -t '\' -k 3,3rn | head -n 50 | cut -f 2,3 -d '\'
  3. Use unix commands to count the number of entries (not definitions) in the devil’s dictionary that begin with a vowel. Your output should be a single number. (7 points)
    grep -Ec '^[AEIOU][A-Z-]*,' devilsDictionary.txt
    227
  4. Use unix commands to calculate the average number of letters per word for each entry (not the definitions) in the Devil’s Dictionary. The output should simply be a number. HINT: You will need to use subshells, and bc (10 points)
    entries=`grep -E '^[A-Z]+,' devilsDictionary.txt |cut -f1 -d ','|wc -l`
    letters=`grep -E '^[A-Z]+,' devilsDictionary.txt |cut -f1 -d ','|wc -c`
    echo "$letters/$entries"|bc -l

    OR, in one fell swoop

    echo "`grep -E '^[A-Z]+,' devilsDictionary.txt |cut -f1 -d ','|wc -c`/`grep -E '^[A-Z]+,' devilsDictionary.txt |cut -f1 -d ','|wc -l`"|bc -l
  5. Count the number of adjectives, nouns, and verbs in the devil’s dictionary. (10 points)
    noun=`grep -cE '^[A-Z]+, n\.' devilsDictionary.txt`
    verb=`grep -cE '^[A-Z]+, v\.' devilsDictionary.txt`
    adj=`grep -cE '^[A-Z]+, adj\.' devilsDictionary.txt`
  6. Print out all the entries (not the definitions), which are not adjectives, nouns, or verbs. HINT: use grep more than once. (10 points)
    grep -E '^[A-Z]+, ' devilsDictionary.txt |grep -vE '^[A-Z]+, (v|n|adj)\.' | cut -f1 -d '.'
  7. Write a unix pipeline which will print the number of words in the celex.txt file that contain a q not followed by a u (look only at the orthography of each entry). (8 points)
    cut -f2 -d '\' celex.txt |grep -Eic 'q[^u]'
    EVEN BETTER
    cut -f2 -d '\' celex.txt |grep -Eic 'q([^u]|$)'
  8. Extra credit

    Write a unix pipeline which will print the total number of points in this assignment. Don’t include the points for the extra credit (3 extra points) (Hint: use dc)

    echo "`grep -oE '[0-9]+ points' hmwk2.solution |cut -d ' ' -f1` ++++++ p"|dc
Written by Robert Felty


Leave a Reply

You must be logged in to post a comment.

Subscribe without commenting