Project 2: Shortest-path word-melt solver

Due: Sunday, 9/18/2022, by 12:00pm noon

You will receive 100% for turning this project in and passing the functionality test and style check before the deadline. After the deadline is 25% off.

Total points: 50

Before you start this project, you should have read the project submission guidelines and the coding style rules.

After that, get started on the project immediately! You will submit your solution to Upload Site.

Project Description

The goals of this project are to:

Practice the use of data abstraction,
Practice the invocation of methods and passing of parameters,
Use input output functions, which are used a lot in this class
Test your program in the command line environment with input/output redirection and diff utilities, and
Get used to how this course uses Upload Site for submissions.

This project will be similar in purpose to Project 1, but will take a different approach and have a different application. Project 1 solved mazes by searching for any path that led from the starting location to the ending location, but that path may not have been the shortest path (that is, the one with the least number of locations). As you recall, this search strategy is called depth-first search, or DFS.

In this project, your program will use a different search algorithm, Breadth-First Search, to find shortest paths. However, instead of solving a maze, your program will solve word melt puzzles.

This project makes use of the STL map and set data structures, which may be new to you. This page describes some of the details of what you will need of these data structures for this project; you may also want to look at this STL cheatsheet about maps and sets. You should also take a look at the STL string class, in particular the methods insert(size_t, size_t, char) and erase(size_t, size_t). Note that if you try to call erase with only one parameter (e.g. s.erase(3)), it will erase from that character to the end of the string. Probably not what you want.

Word melt puzzles

A word melt puzzle has three components: a start word, a goal word, and a dictionary of valid words. Starting with the start word, the goal is to repeatedly change the current word by one character (by changing a letter, inserting a letter, or deleting a letter) so that it is still valid, until you reach the goal word. No valid word in a word melt puzzle will be less than two characters long (otherwise, deleting one letter from a 1-letter word would result in an empty string).

Let's consider an example. If the dictionary contains (only) the words "but, bat, bar, barn, yarn, yearn." If the start word is "but," and the end word is "yearn," then the following sequence would be a solution: "but, bat (change u to a), bar (change t to r), barn (insert n at the end), yarn (change b to y), yearn (insert e before the second letter)". Here the letters that changed have been emphasized.

We are going to adapt our Maze solver to solve word melt puzzles. The two problems might seem unrelated, but you'll find that the only substantial changes are to the Location class (since it now represents words instead of row/column positions), and to the driver (since it must use a different type of search). We don't have to change much because the Location class is a good example of information hiding in an ADT: the implementation has changed, but the interface remains the same.

By the way, you might be interested that this project is related to something called the Levenshtein distance, which measures the smallest sequence of changes that can turn one string into another. While the Levenshtein distance is interesting in its own right, with an efficient solution, for this project we don't use it because we only consider changes that result in valid words (Levenshtein distance considers all changes to be valid).

Breadth-first search

In order to find the shortest path (i.e. sequence of words) that solves the puzzle, we will breadth-first search. In the last project, one stack kept track of the current path. In this project, we will instead use a queue to keep track of the "frontier" of unknown territory, and an STL map to keep track of all the shortest paths at the same time.

Breadth-first search (BFS) is a common algorithm in computer science, and it is guaranteed to find a shortest path in our puzzle. This is different from the depth-first search (DFS) algorithm we used for project 1. BFS usually uses a FIFO structure (like a queue) to explore broadly, and it never backs up or reconsiders previous decisions. DFS usually uses a LIFO structure (like a stack) to explore down one path as far as possible, then it backs up when necessary and makes other choices. Both search methods have advantages; for example DFS is not guaranteed to find the shortest path, but it can often be implemented to use less memory than BFS. The basic idea for BFS is this:

* start with the starting location on the queue
* while the queue is not empty and the ending location has not been found, do the following:
- - - pull one location off the queue
- - - look at all valid neighbor locations, and if they haven't been visited, then put them on the queue

Note that if we implement this algorithm with the stack instead of a queue, we get DFS instead of BFS. That's pretty amazing!

BFS always finds a shortest path in this puzzle, since we assume that each location change "costs" the same. We'll discuss this more later this semester when we learn about graph algorithms. Note that there might be more than one shortest path, so the order of exploration is important.

Why is it guaranteed to find a shortest path? Because of the queue ordering -- FIFO. The BFS algorithm ensures that first time we pull a location off the queue we must have found the shortest path to that location. Think about this inductively: the first location pulled off the queue is the starting location (since it's the only one initially on the queue), so we have found the shortest path to the starting location. Then we put on the back of the queue the neighbor locations of the one we just pulled off. These will be the next things we pull off the front of the queue, and we have found the shortest paths to those locations, and so on. The key idea is: each time we find a new unvisited location (call it L), the location that led us to L must be on the shortest path from the starting location to L.

Keeping track of all the shortest paths

For this project, we will not keep a single path from the start location to a current location. Instead, we will keep a STL map which, given any location L that has been visited, will store which was the location immediately before L on the shortest path from the start location to L. Then if your program finds the end location, it can reconstruct the path from the start to the end based on this map. This map is also useful for knowing if we have already visited some location, so that we don't visit it twice. This eliminates the exponential-time search we saw in project 1.

An STL map represents any function — for any input (of a type called the Key type), it can produce an output (of a type called the Value type, possibly different than the Key type). For example, we can make a map of string (Key) to int (Value) that is useful for looking up student IDs based on name, so that if we look up "Bobby Baylor", we get back 123456789. The STL map has several methods that are useful for this project:

— operator[](const Key &) — to look up a Key type in the map, and useful for assigning a Value to associate with a Key, like this: myMap["Bobby Baylor"] = 123456789;. You can also look up a Value (if you are sure it's in the map!): cout << myMap["Bobby Baylor"]; would print 123456789. If you're not sure the Key is in the map, then you should first use find() — otherwise, the operator[] will automatially create an entry in the map with the given key, which is probably not what you want!
— find(const Key &) — to find if a given Key is in the map. If so, find returns an iterator pointing at the Key/Value pair. If not, find returns the end iterator.
— end() — returns the end iterator, which is useful for comparing with the result from find() to know if it actually found the given Key.
— One more point about map iterators — they are actually STL pairs. An STL pair is a useful structure for putting together any two types; you can read more about STL pairs here.

Conceptually, the map will contain a tree of back-links from each location L to the previous location on the shortest path from the start location to L. For example, suppose the dictionary contains the following words: "red, rad, rod, bed, bad, bod, cod", and the start word is red (it doesn't matter for this example what the end word is).

Here's a picture of the neighbors, connected by lines:
graph of connections between words in the dictionary
Note that all of these words have the same length, but that doesn't matter. The neighbor relation applies to any pair of words that are one letter different (whether that difference comes from changing, inserting, or deleting a letter).

Then after the puzzle has been completely explored, the map (let's call it m) should have this information, logically speaking:

m.get("bed").equals("red")
m.get("rad").equals("red")
m.get("bad").equals("bed")
m.get("cod").equals("rod")
...

We could express this information visually as:
visual illustration of the STL map constructed by BFS
Here the arrows indicate the back-links, which is the information kept by the map. The number in parentheses indicates the order in which each word will be explored in the BFS algorithm. The dotted lines are just there to show neighbor relationships.

By doing this, we can find the shortest path between the start and the end by starting at the end, finding its previous location, and keep on going until we reach the start. Of course, we want to print it from start to end, rather than end to start — there are several ways to handle this.

How could we denote the previous location for the start? As itself! That way every location in the previous map has an entry (which is important for the next point...).

If we keep the back-links in the STL map up to date as we explore the puzzle's dictionary (building it as we discover new locations to explore), then we can tell if we have visited a location before by seeing if it has an entry in the map (using the map's containsKey(Object key) method, which is similar to the set's contains(Object item) method).

We will discuss this map of previous locations in detail in class. You should refer to the STL map documentation for more information on the relevant features of the STL map.

Changes to the Maze and Location classes

The Maze class also now uses a STL set to keep track of validLocations, which is a faster lookup than searching through an array, since a STL set is usually implemented as a Red-Black tree, which is a type of fast binary search tree. You will use the following set methods:

— insert(const T &) — to place a Location object in the validLocations set,
— find(const T &) — returns an iterator that points at the Key you were searching for (which will be the end iterator if the Key isn't found), and
— end() — which returns the special iterator that points to one past the last valid object in the set.

See the STL set documentation for more information.

The STL set and map are ordered data structures — that means that they need to be able to place the objects they store in an order that is computable with a less-than operation. Therefore, we must implement this operation for the Location class so that we can use them in the STL map and set containers.

Sample inputs

Several sample inputs are available here. You can get the correct output by capturing the output of the sample executable on these inputs.

Provided code

You must use the C++ files provided here. I have provided a lot of code in some of these files. You are not allowed to change the existing code in some of these files. Instead, write your code in a cpp implementation.

Remember that when using templates, all of the code goes in the .h file. So you will turn in 4 files for this project: arrayqueue-student-proj2.h, location-proj2.cpp, maze-proj2.cpp, and your driver (a .cpp file). Where appropriate, you should #include the 'student' file, and the student file should #include the corresponding 'prof' file (which it already does).

Sample executables

Here are sample executables for you. When you design test cases, you can judge your output against the output from my correct solution. Here are my compiled solutions:

If you give a command-line argument to these executables, they will print extra information about how they are exploring. For example, this will execute the program like normal, redirecting input from a file called my_input.txt:

% project2_solution_dos.exe < my_input.txt

But here is the mode of operation that will cause the program to print out what it is doing in more detail:

% project2_solution_dos.exe printMore < my_input.txt

The command-line argument doesn't have to be the word "printMore", it can be anything.

Milestones

Since this is a large project, it helps to have a plan of attack. The following milestones will not be turned in nor graded, however, I strongly recommend that you do them for your own good and at the suggested pace.

Step	Finish by	Milestone
1.	Thursday a.p.	WRITE and TEST the modifications for the Location and Maze classes.
2.	Monday w.a.p.	WRITE and TEST the methods for the ArrayQueue.
3.	Wednesday w.a.p.	WRITE and TEST all the methods for the driver. Make your own test cases. Aim to finish early, so you can be sure you will finish before the due date.

Final notes

Remember when writing this program to adhere to the coding style guidelines. This is an individual project, so you must work alone. No credit will be given for a solution which does not pass all the hidden tests I create, or does not pass in the allowed time. For more detailed instructions, read the project submission guidelines.