labHashing: Sequence Database IV

Due Date

See the calendar for due date.

Objectives:

Use a hash table and a hash function to store and retrieve items

Description:

You guessed it, you get to extend your previous labs, but this time use a data structure that has more efficient methods to insert, delete and retrieve the items stored by using a hash table. In addition to the necessary classes from previous labs, you'll create a HashTable class for this lab. A description of the relevant classes follows:

class Sequence:
Use your previous Sequence class, and add a getKey() method that returns a copy of the label.

class SequenceDatabase:
Use your previous SequenceDatabase class, but have it store a HashTable object.
Additionally, importEntries() needs to:

Parse the first line of the commands file for the total number of buckets in the hash table. This value can range from just a single entry to millions.
Process the S to print out the total number of elements in the hash table.

class HashTable:
This templated class should store pointers to objects that have a getKey() method in a hash table, using separate chaining to resolve collisions and adding new items onto the front of the chain.
This class should contain:

Appropriate constructor(s)
A hash function (which takes a string object and returns an unsigned int. You will be graded on your choice of hash functions and if it meets the requirements for a "good hash function".
Methods to insert, obliterate, and retrieve a pointer to an object based on the value returned by getKey(). For each insertion and each successful obliterate, print out the load factor and the hash table size.
Always use a double when working with the load factor.
The ability to rehash when the load factor is greater than 1.0. Make the new hash table size the prime number just larger than twice the previous hash table size.
Data members to store:
1. A hash table

Requirements:

Use the following labHashing-main.cpp file, which has the main() function in it. Do not modify it.
Name the project for this assignment labHashing-yourlastname.
If attempting to obliterate or print an item that doesn't exist, print out an error statement and continue processing commands.
Your program must compile and run in both Visual Studio 2012 and on ranger. On ranger, it must compile with the following command:

g++ -std=c++0x *.cpp -o labHashing
Remove any irrelevant (extra) .cpp files.
You can assume that information in the commands files is syntactically correct and that they each have a newline at the end of the file.
You can use labHashing-inputA.tab and labHashing-inputB.tab to assist your development.
Please produce your own input file(s) for testing.

Output needs to match the format below:

Entries: 0

Printing CISY_EMENI_DNA ...
Can not find item (CISY_EMENI_DNA), NOT found!

Obliterating CISY_EMENI_DNA ...
Can not delete item (CISY_EMENI_DNA), NOT found!

Adding CISY_EMENI_DNA ...
Load factor: 0.333333
Hash table size: 3

Adding CISY_EMENI_RNA ...
Load factor: 0.666667
Hash table size: 3

Adding CISY_EMENI_AA ...
Load factor: 1
Hash table size: 3

Adding O0197_HUMAN_DNA ...
Load factor: 0.571429
Hash table size: 7

Printing CISY_EMENI_DNA ...
DNA:	Label: CISY_EMENI_DNA	ID: 17	Sequence: cttggacagcatggagttctcgagccaaatttgtcggtacactaccttaaacagccctagcgaatctcggcaacatgtgacggttggtatcgtgcgcgcttgtctgggccgggccgatgcgctatag	Length: 127	cDNAStartIndex: 10

Printing CISY_EMENI_RNA ...
RNA:	Label: CISY_EMENI_RNA	ID: 117	Sequence: auggaguucucgagccaaauuugucgguacacuaccuuaaacagcccuagcgaaucucggcaacaugugacgguugguaucgugcgcgcuugucugggccgggccgaugcgcua	Length: 114	Type: mRNA

Printing CISY_EMENI_AA ...
AA:	Label: CISY_EMENI_AA	ID: 217	Sequence: MEFSSQICRYTTLNSPSESRQHVTVGIVRACLGRADAL	Length: 38	ORF: 1

Obliterating CISY_EMENI_DNA ...
Load factor: 0.428571
Hash table size: 7

Entries: 3

Obliterating CISY_EMENI_RNA ...
Load factor: 0.285714
Hash table size: 7

Printing DOES_NOT_EXIST ...
Can not find item (DOES_NOT_EXIST), NOT found!

Obliterating DOES_NOT_EXIST ...
Can not delete item (DOES_NOT_EXIST), NOT found!

Entries: 2

Make a file named rubric-yourlastname.txt in your project directory with a completed rubric. Specify estimated points for each entry including the number of hours spent.

Submission

Before submitting your project, remove the debug and ipch directories (if present) and any .sdf files. Zip the whole project folder (e.g., by right-clicking on the folder and choosing send to::Compressed (zipped folder) in Windows Explorer). Name your zip file as labHashing.zip. Make sure that all of the source code files (especially labHashing-main.cpp) are in your zipped file. You can verify this by coping your zip file to a temporary location and uncompressing it. Submit your zip file at https://3110.cs.mtsu.edu/. For further instructions, please see the Miscellaneous page.

Rubric:

Points      Item
----------  --------------------------------------------------------------
_____ / 10  Documentation:
            + All source code files (.h & .cpp) include file name, author, description etc. 
            + Functions and variables
            + All non-trival variables are commented 
            + All functions preceded by brief comments 
            + Comments included before major portions of code 
            + Functions should be no longer than 1 page
_____ /     Compiles and runs in Visual Studio and on ranger
_____ / 45  Requirements met:
_____ /     + Templated HashTable class (25 points)
_____ /     + Hash function (20 points)
_____ / 43  Correct output (matches the format of the example & demonstrates correct execution)
_____ /     + Load factor (25 points)
_____ /  2  Completed rubric (estimates for each line including hours spent)
	    
_____ /100  Total

_____  Approximate number of hours spent

Notes

Remember that abstract base classes can not have instantiations. This may come up when instantiating the HashTable.
If you're trying to declare an iterator that based off of a templated type, then that type is a dependent type and therefore must be proceeded by typename to tell the compiler to interpret it as a type. For example:
```
typename list< T >::iterator iter;	  
```
One test of a good hash function is to calculate the largest index that it can produce for a large table size (e.g., 10,009) if the maximum length of the name is 20 characters.
Part of the grading process is automated by using diff. You should verify that your output matches the above output directly with the following commands on ranger:
```
g++  -std=c++0x  *.cpp  -o labHashing-yourlastname
cp  /nfshome/hcarroll/public_html/3110/private/labs/labHashing-inputA.tab  .
(echo ""; echo "") | ./labHashing-yourlastname
diff  /nfshome/hcarroll/public_html/3110/private/labs/labHashing-answerKeyA.txt  labHashing-outputA.txt
```
If the two files match exactly (which is what you want) then there should be NO output from diff. If diff shows one or more differences, fix them and run it again. To get side-by-side output (with the answer key on the left and your output on the right), replace the last line with:
```
diff  --side-by-side  /nfshome/hcarroll/public_html/3110/private/labs/labHashing-answerKeyA.txt  labHashing-outputA.txt
```
For details about interpreting the output of diff, see the Using diff section on the Misc. webpage.

To help you calculate prime numbers, feel free to use the following code:

bool isPrime( unsigned int num) const {
    int i = 2;
    while( i < num ){
        if( num % i == 0){
            return false;
        }
        i++;
    }
    return true;
}

CSCI 3110

Algorithms and Data Structures