Project 5 - Autograder Feedback

This project is OPTIONAL and will only help your grade in the project category

Objectives:

Apply dynamic programming concepts
Contribute to a real-world project

Background

Autograders allows students to submit their code and receive immediate feedback of how it was evaluated using test cases. Autograders, such as codePost.io, can evaluate submissions based on exact matching (every character needs to match an answer key), flexible matching (every character needs to match an answer key, but spaces and/or capitalization are ignored), with regular expressions or with a (shell) script. When autograders use regular expressions (also know as regex) for evaluation, they are looking for patterns in the user's submission. This can allow for more flexible evaluation, but computer science students (especially introductory students) are often confused and/or overwhelmed by the regular expressions.

Over the last two years, I have, along with some students, developed a (greedy) algorithm to provide feedback that is intended to help students identify what is not right in the output. This project is available on github.com at https://github.com/hyrumcarroll/RefinedFeedback/blob/main/ with both a Java and a Python implementation. The current algorithm for identifying matches is a greedy algorithm that, "Finds the indices of the first non-overlapping match in the supplied text for each regex in" a supplied list of regular expressions. This algorithm is implemented in the getMatchingIndices method in the project. Currently, given the following regular expressions:

"account manager", "Checking", "balance", "checking123", "690\\.68", "(check|number)", "\\b2124\\b", "Savings", "balance", "savings124", "\\b1,?122.00\\b", "APR", "\\b1\\.0\\b", "Thank"

and the following submission text:

Welcome to my manager of accounts!
Checking Account:
Balance for account checking123: $0.0
Last processed check number:2124
Savings Account: 
Balance for account savings124: $0.0
APR: 0.01%
Balance for account checking123: $801.02
Exiting the Account Manager
Thank-you

the getMatchingIndices method will greedily find the first regex near the end of the submission and getAnnotatedView will produce the following output:

Annotated Matches View (2 of 14 matches found)
(Matches are uppercased and indicated with *** before and after the match)
(Terms that are missing are identified with '<<< Missing: [the missing item] >>>')
==========================================================================
Welcome to my manager of accounts!¶
Checking Account:¶
Balance for account checking123: $0.0¶
Last processed check number:2124¶
Savings Account: ¶
Balance for account savings124: $0.0¶
APR: 0.01%¶
Balance for account checking123: $801.02¶
Exiting the ***ACCOUNT MANAGER***

<<< Missing: Checking >>>

<<< Missing: Balance >>>

<<< Missing: checking123 >>>

<<< Missing: 690.68 >>>

<<< Missing: check >>>

<<< Missing: 2124 >>>

<<< Missing: Savings >>>

<<< Missing: Balance >>>

<<< Missing: savings124 >>>

<<< Missing: 1,122.00 >>>

<<< Missing: APR >>>

<<< Missing: 1.0 >>>

***THANK***-you

Description

For this project, you get to write an improved algorithm for finding matches in the output of a student submission in the Refined Feedback project. A simple definition of "improvement" here would be increasing (or better yet maximizing) the number of matches. Your algorithm needs to be implemented in the getMatchingIndices method in either the Java or Python version and use one of the major algorithm approaches discussed in this class. For example, a dynamic programming approach could determine the most number of a regexes that match a student's submission (allowing for some of the regexes to not be matched in favor of more being matched later). This could be considered a variant of the Longest Common Substring problem (see Section 14.4 in Introduction to Algorithms, 4th Edition) or the Smith-Waterman algorithm, where instead of genetic characters, regular expressions are being aligned with the student's output. Furthermore, there would be no penalty for not matching a portion of the student's output.

After completing your solution, write a short report (not more than 1-page) about your approach. Include in your report:

A description of your algorithm (including what type of algorithm it is and why it is that type of algorithm)
The hardest part of this project for you personally
Advantages and disadvantages of your algorithm (compared with the existing solution)
A discussion about the time complexity of your algorithm

You may find it helpful to look at and use RefinedFeedbackTest.java, as it sets up for and calls the getMatchingIndices method (performing the first part of unit testing). I recorded a video executing part of RefinedFeedbackTest.java and walking through the current execution of getMatchingIndices: https://youtube.com/embed/foFpc-ZzFGE. Additionally, RefinedFeedback.sh is a bash script that executes RefinedFeedback (performing the first part of black box testing). I will evaluate your submission manually, utilizing RefinedFeedback.sh and potentially other resources.

Example

The following is an example of the output that getAnnotatedView could produce (using the inputs above) if an improved algorithm was implemented in the getMatchingIndices method:

Annotated Matches View (10 of 14 matches found)
(Matches are uppercased and indicated with *** before and after the match)
(Terms that are missing are identified with '<<< Missing: [the missing item] >>>')
==========================================================================
<<< Missing: account manager>>>

***CHECKING*** Account:¶
***BALANCE*** for account ***CHECKING123***: $0.0¶
Last processed 

<<< Missing: 690.68>>>

***CHECK*** number:***2124***¶
***SAVINGS*** Account: ¶
***BALANCE*** for account ***SAVINGS124***: $0.0¶


<<< Missing: 1,122.00>>>

***APR***: 0.01%
Balance for account checking123: $801.02
Exiting the Account Manager


<<< Missing: 1.0>>>

***THANK***-you

Rubric:

Points      Item
----------  --------------------------------------------------------------
_____ / 49  getMatchingIndices Implementation
            Comments before major sections of code and non-trival variables

_____ / 29  Report
            Description
            Hardest part
            Advantages and disadvantages
            Time complexity
            Appropriate length (<= 1 page)
		       
_____ /  2  Completed rubric

_____ / 80  Total


_____  Approximate number of hours spent

I, (REPLACE WITH YOUR FULL NAME), affirm that the code that I submitted is my own work and that I did not receive help that was not authorized by the instructor.

Copy and paste this rubric into a rubric-project5AutograderFeedback.txt file (not a .docx, .doc nor .rtf file). Think of this as a checklist to ensure that you receive the maximum points possible. For each grade item, fill in your estimate for the grade you deserve. Additionally, include your estimate of how many hours your spent. Lastly, replace, "(REPLACE WITH YOUR FULL NAME)" with your full name to indicate that what you are submitting is entirely your own work.

Submission

Submit your RefinedFeedback.java or RefinedFeedback.py file, report and rubric to the appropriate Assignment tab/folder in CougarVIEW.

Academic Integrity

Academic integrity means that you submit for a grade work that is entirely yours. The following figure indicates where you can get help:

Algorithms Analysis and Design

CPSC 6109