labParser: Sequence Database Parser

    Next >


Note: To submit this lab, you need to use your personal CS account. The username for your personal CS account is the same as your pipeline ID (two to three letters, a single digit followed by a letter). If you don't have a CS account that matches your pipeline ID, request one at https://mgt.cs.mtsu.edu/aru/. It may take a day or two to create your account!

Due Date

See the calendar for due date.

Objectives

Description

Recently, a research team collected genetic samples from around the Great Smoky Mountains in search of new species. You've enlisted to help the team by making a database of the genetic sequences. For this lab, you get to parse the input and store it in DNA objects (in preparation for inserting the information into a linked list). The parsing will be done in a class called SequenceDatabase. A description of these classes follows:
class DNA:
This class represents a database entry for a single DNA sequence and relevant information. The class should contain:
  1. Appropriate constructor(s)
  2. Data members to store:
    1. Label
    2. Accession ID (which is unique)
    3. Sequence
    4. Length of the sequence
    5. Index of the coding region (or -1 if not applicable)
  3. A print() method that prints the above information (used in labLinkedLists)
  4. Appropriate "get" and "set" methods
class SequenceDatabase:
This class should contain:
  1. Appropriate constructor(s)
  2. Appropriate destructor
  3. Data member to store:
    1. An ofstream object for the output file
  4. Method to process commands from a specified file. Commands are as follows (fields are separated by tabs):
    D <label> <ID> <sequence> <length> <cDNA start index> (allocates memory for a new DNA object, which in labLinkedLists will be added to a linked list; for now, allocate memory and print out "Adding <ID> ...", where <ID> is the ID number (see the example output below))
    P <ID> (in labLinkedLists, prints the specified DNA entry; for now, print out "Printing <ID> ...")
    O <ID> (in labLinkedLists, obliterates the specified DNA entry; for now, print out "Obliterating <ID> ...")
    S (in labLinkedLists, displays the number of DNA entries; for now, print out "Entries: NYI")

Requirements

  1. Name the project for this assignment labParser-yourlastname.
  2. Use the following labParser-main.cpp file, which has the main() function in it. Do not modify it.
  3. You must have a separate header and implementation file for each of the above classes (i.e, sequenceDatabase.h, sequenceDatabase.cpp, DNA.h, and DNA.cpp.). Notice that the SequenceDatabase class is stored in the files named sequenceDatabase.
  4. Your program must compile and run in both Visual Studio 2012 or newer and on ranger. On ranger, it must compile with the following command:
    g++ -std=c++0x *.cpp -o labParser
  5. Do not include spaces in directory names or file names.
  6. You can assume that the information in labParser-inputA.tab and labParser-inputB.tab is syntactically correct and that each file has a newline at the end of the file.
    Please produce your own input file(s) for testing.
  7. Output needs to be stored in a file and match the format given in labParser-answerKeyA.txt:
    Importing labParser-inputA.tab ...
    Printing 9999 ...
    
    Obliterating 9999 ...
    
    Entries: NYI
    
    Adding 1234567890 ...
    
    Entries: NYI
    
    Printing 1234567890 ...
    
    Obliterating 1234567890 ...
    
    Entries: NYI
    
    Printing 1234 ...
    
    Obliterating 1234 ...
    
    
  8. Make a file named rubric-yourlastname.txt in your project directory with a completed rubric. Specify estimated points for each entry including the number of hours spent.

Submission

Before submitting your project, remove the debug and ipch directories (if present) and any .sdf files. Zip the whole project folder (e.g., by right-clicking on the folder and choosing send to::Compressed (zipped folder) in Windows Explorer). Name your zip file as labParser.zip. Make sure that all of the source code files (especially labParser-main.cpp) are in your zipped file. You can verify this by coping your zip file to a temporary location and uncompressing it. Submit your zip file at https://3110.cs.mtsu.edu/. For further instructions, please see the Miscellaneous page.

Rubric

Points       Item
----------   --------------------------------------------------------------
_____ / 13   Documentation:
             All source code files (.h & .cpp) include file name, author, description etc.
             Functions and variables
             + All non-trival variables are commented
             + All functions preceded by brief comments
             + Comments included before major portions of code
             + Functions should be no longer than 1 page
_____ / 10   Appropriate methods
_____ / 25   Correct output (matches the format of the example & demonstrates correct execution)
_____ /  0   Compiles and runs in Visual Studio AND on ranger
_____ /  2   Completed rubric (estimates for each line, including hours spent)
             
_____ / 50   Total


_____  Approximate number of hours spent

Notes

  1. Yes, this lab creates memory leaks. That's ok for this lab. In labLinkedLists, you'll keep track of the allocated memory, and deallocate it with an obliterate command or at the end of the program.
  2. Start small. Plan out what the role of each object and then the outline of each function. Add some code, compile it and test it. Consider getting labParser-main.cpp to be able to compile with a minimal SequenceDatabase object (meaning, just enough code in sequenceDatabase.h and sequenceDatabase.cpp to compile, but no more). Then, work on the method to parse the commands. For example, have it open the file and test that it opened correctly. Make sure it compiles and works appropriately (verify that it at least matches the appropriate parts of the required output). Then, add in some more code, for example to perse what type of command it is.
  3. Always download files (especially the input files) instead of copying and pasting them. This will preserve the newline(s) at the end of the files.
  4. Lost on how all the classes fit together? Can you draw the relationship between the three classes and the struct?
  5. Visual Studio is available on shemp. You can use your CS account (with CSD\ before it) to login into shemp.cs.mtsu.edu.
  6. Part of the grading process is automated by using diff. You should verify that your output matches the above output exactly with the following commands on ranger:
    g++  -std=c++0x *.cpp  -o labParser-yourlastname
    cp /nfshome/hcarroll/public_html/3110/private/labs/labParser-inputA.tab  labParser-inputA.tab
    (echo ""; echo "") | ./labParser-yourlastname
    diff  /nfshome/hcarroll/public_html/3110/private/labs/labParser-answerKeyA.txt  labParser-outputA.txt
    
    If the two files match exactly (which is what you want) then there should be NO output from diff. If diff shows one or more differences, fix them and run it again. To get side-by-side output (with the answer key on the left and your output on the right), replace the last line with:
    diff --side-by-side /nfshome/hcarroll/public_html/3110/private/labs/labParser-answerKeyA.txt labParser-outputA.txt
    For details about interpreting the output of diff, see the Using diff section on the Misc. webpage.
  7. Put variables at the scope that they're used. For example, temporary variables used in a function should NOT be class data members in the .h file.

Questions and Answers

  1. Q: When I run the program, the output flashes up and disappears before it can be read. What can I do to prevent this?
    A: Two options: 1) Run the program without debugging. You can do this by selecting Start Without Debugging from the Debug menu ( or by pressing CTRL F5). 2) Add code to read from stdin (e.g., cin) in main(), at the end of the program.
    If the first option does not work for you, try the following advice from a stackoverflow.com post: "If you have a C++ app and Run Without Debugging and the console window still closes, you need to remember to explicitly set the Subsystem to Console under Configuration Properties / Linker / System. This can happen if you start with an Empty Project, which leaves Subsystem unset."

Last Modified: