Comments for Linguistics 5200 Fall 2009 http://robfelty.com/teaching/ling5200Fall2009 Introduction to computational corpus linguistics Mon, 16 Nov 2009 18:31:36 -0500 http://wordpress.org/?v=2.9-rare hourly 1 Comment on language identification by Robert Felty http://robfelty.com/teaching/ling5200Fall2009/2009/11/language-identification/comment-page-1/#comment-33 Robert Felty Mon, 16 Nov 2009 18:31:36 +0000 http://robfelty.com/teaching/ling5200Fall2009/?p=161#comment-33 I fixed the link. I fixed the link.

]]>
Comment on language identification by pohawpat http://robfelty.com/teaching/ling5200Fall2009/2009/11/language-identification/comment-page-1/#comment-32 pohawpat Fri, 13 Nov 2009 15:41:21 +0000 http://robfelty.com/teaching/ling5200Fall2009/?p=161#comment-32 Try the link, but received an error message: The requested URL /crubadan/ index.html was not found on this server. Calvin Try the link, but received an error message:
The requested URL /crubadan/ index.html was not found on this server.
Calvin

]]>
Comment on Homework 10 – Advanced function usage by Robert Felty http://robfelty.com/teaching/ling5200Fall2009/2009/11/homework-10-advanced-function-usage/comment-page-1/#comment-31 Robert Felty Mon, 09 Nov 2009 03:31:00 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/11/homework-10-advanced-function-usage/#comment-31 I wouldn't call it a contradiction. However, you are correct that if ignore_stop is False, then the value of use_set is meaningless. I wouldn’t call it a contradiction. However, you are correct that if ignore_stop is False, then the value of use_set is meaningless.

]]>
Comment on Homework 10 – Advanced function usage by ash_v http://robfelty.com/teaching/ling5200Fall2009/2009/11/homework-10-advanced-function-usage/comment-page-1/#comment-30 ash_v Sun, 08 Nov 2009 23:03:37 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/11/homework-10-advanced-function-usage/#comment-30 In Q2, if ignore_stop has the value False, and use_set is True, then we get a contradiction. In Q2, if ignore_stop has the value False, and use_set is True, then we get a contradiction.

]]>
Comment on Homework 9 – Finding novel words by Robert Felty http://robfelty.com/teaching/ling5200Fall2009/2009/10/homework-9-finding-novel-words/comment-page-1/#comment-29 Robert Felty Thu, 29 Oct 2009 15:22:12 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/10/homework-9-finding-novel-words/#comment-29 Some people had a problem with the get_wiki function. I have updated the function above so that it should work for everyone now. Sorry about that. A couple other questions people had: <dl> <dt>What do I do with all these html tags?</dt> <dd>Strip them out. See chapter 2 of the NLTK book for an example (or look in the notes)</dd> <dt>In question 5, everything is of unicode type. What do you want here?</dt> <dd>Type here means lemma, as in type-token ratio.</dd> <dt>Won't removing proper names based on capitalization also remove the first of word of sentences?</dt> <dd>True, but we are trying to find novel words. Any words that are in our dictionary that are the first word of a sentence will have already been taken out. If there are novel words which happen to start a sentence, they will be grouped in with the proper names. This is another example of how we have to use a combination of automated techniques and manual checking</dd> </dl> Some people had a problem with the get_wiki function. I have updated the function above so that it should work for everyone now. Sorry about that.

A couple other questions people had:

What do I do with all these html tags?
Strip them out. See chapter 2 of the NLTK book for an example (or look in the notes)
In question 5, everything is of unicode type. What do you want here?
Type here means lemma, as in type-token ratio.
Won’t removing proper names based on capitalization also remove the first of word of sentences?
True, but we are trying to find novel words. Any words that are in our dictionary that are the first word of a sentence will have already been taken out. If there are novel words which happen to start a sentence, they will be grouped in with the proper names. This is another example of how we have to use a combination of automated techniques and manual checking
]]>
Comment on Homework 9 – Finding novel words by Robert Felty http://robfelty.com/teaching/ling5200Fall2009/2009/10/homework-9-finding-novel-words/comment-page-1/#comment-28 Robert Felty Tue, 27 Oct 2009 02:42:43 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/10/homework-9-finding-novel-words/#comment-28 Well, it doesn't really matter, since you want to strip out all html tags anyways. Well, it doesn’t really matter, since you want to strip out all html tags anyways.

]]>
Comment on Homework 9 – Finding novel words by ash_v http://robfelty.com/teaching/ling5200Fall2009/2009/10/homework-9-finding-novel-words/comment-page-1/#comment-27 ash_v Mon, 26 Oct 2009 21:45:44 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/10/homework-9-finding-novel-words/#comment-27 In Q2, what is meant by remove all text after "see also"? Does it mean just text or text+html tags? In Q2, what is meant by remove all text after “see also”? Does it mean just text or text+html tags?

]]>
Comment on A note on homework 7 by robfelty http://robfelty.com/teaching/ling5200Fall2009/2009/10/a-note-on-homework-7/comment-page-1/#comment-26 robfelty Thu, 15 Oct 2009 22:16:29 +0000 http://robfelty.com/teaching/ling5200Fall2009/?p=129#comment-26 Also note that this homework does not involve the use of stdin, even though we did talk about that in class some today. Also note that this homework does not involve the use of stdin, even though we did talk about that in class some today.

]]>
Comment on Homework 5 – Using the NLTK to investigate corpora and word frequency by robfelty http://robfelty.com/teaching/ling5200Fall2009/2009/09/homework-5-using-the-nltk-to-investigate-corpora-and-word-frequency/comment-page-1/#comment-25 robfelty Thu, 01 Oct 2009 18:09:56 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/09/homework-5-using-the-nltk-to-investigate-corpora-and-word-frequency/#comment-25 Sam asks: <blockquote>I was wondering on Question 10, are we supposed to perform that operation in Python, and if so, how do we access svn with Python?</blockquote> Yes. You should do it in python. However, you don't have to interact with svn at all. All you have to do is update your working copy of the class repository. Then the devilsDictionary.txt file is simply a regular file on your computer. Sam asks:

I was wondering on Question 10, are we supposed to perform that operation in Python, and if so, how do we access svn with Python?

Yes. You should do it in python. However, you don’t have to interact with svn at all. All you have to do is update your working copy of the class repository. Then the devilsDictionary.txt file is simply a regular file on your computer.

]]>
Comment on Homework 4 – More regular expressions, Python lists and word frequency by kelleya http://robfelty.com/teaching/ling5200Fall2009/2009/09/homework-4-more-regular-expressions-python-lists-and-word-frequency/comment-page-1/#comment-23 kelleya Fri, 25 Sep 2009 17:28:11 +0000 http://robfelty.com/teaching/ling5200Fall2009/2009/09/homework-4-more-regular-expressions-python-lists-and-word-frequency/#comment-23 Hi Rob, For questions 2 and 3 do you want the first 10 words or tokens? Thanks, Arrick Hi Rob,

For questions 2 and 3 do you want the first 10 words or tokens?

Thanks,
Arrick

]]>