Your goal in this project is to extend the work you did in the first part to support conjunction queries. Given string "storm gulf winds coast", query() to return the list of document IDs that contain each of those words. The list should be sorted by highest to lowest document score, where the score is the sum of the individual query term TFIDF weights:
For example, here's a simple unit test:
My solution gets the following document scores and ordering:
When I perform query("storm gulf winds coast"), I get the following documents back.
Do not generate any output (no debugging or otherwise). Your query() method should simply return the list of document IDs in the right order.
So you can test your TF IDF scores, here's my scoring for document 7:
You will create a jar file called index2.jar containing *.class files and place it in a directory called index2/dist under your cs680 dir:
Pur your source Java code in index/src:
To jar your stuff up, you will "cd" to the directory containing your source code and create the jar in the index2 dir:
All classes must be in the default package!
To learn more about submitting your project with svn, see Resources.
You must submit your source code for credit.
I will run unit tests that extend the BabyTestIndex.java file again but will also test your code on the articles directory.
You may discuss this project in its generality with anybody you want and may look at any code on the internet except for a classmate's code. You should physically code this project completely yourself but can use all the help you find other than cutting-n-pasting or looking at code from a classmate or other Human being.
I will deduct 10% if your program is not executable exactly in the fashion mentioned in the project; that is, class name, methods, lack-of-package, and jar must be exactly right. For you PC folks, note that case is significant for class names and file names on unix! All projects must run properly under linux at amazon.