In this lab you will be implementing functions on hash tables with two different collision resolution strategies—separate chaining and linear probing. These hash tables serve an implementation of the dictionary abstract data type. In the second part of the lab you will use the dictionary to solve some problems.
Note that there are a LOT of files in this lab—don't get too intimidated as you won't be modifying the vast majority of them.
list IteratorsWhen you are working with the Separate Chaining Hash Table, you will need to iterate over the linked list of a given bucket. Since the hash tables are templatized, however, this causes us a slight headache syntactically in C++. To define a list iterator on a given bucket, you will need to declare it as follows:
typename list< pair<K,V\> \>::iterator it \= table\[i\].begin();
Alternatively you can just use auto and have the compiler worry about the type:
auto it \= table\[i\].begin();
Also note that if you do not insert your data at the head of the bucket, you may end up with your results being in a different order than ours.
If you use the list::erase() function, be advised that if you erase the element pointed to by an iterator that the parameter iterator is no longer valid. For instance:
auto it \= table\[i\].begin();
table\[i\].erase(it);
it++;
is invalid because it is invalidated after the call to erase(). So, if you are looping with an iterator, remember a break statement after you call erase()!
You can download to code here: lab_has.zip. As usual, please upload the files onto the school's Linux servers and complete the lab there.
Note that there are a lot of files, but you only will need to modify four of them, and the spec guides you through which functions you have to implement. Don't panic!
For reference, here are the only files you need to modify (and submit for credit):
lphashtable.cppschashtable.cppword_counter.cppanagram_finder.cpp.h filesIt goes without saying that .h files are a great way to understand how the code works. In particular, we are implementing hash tables with two different collision resolution methods—separate chaining, and linear probing. Both types of hash tables are subclasses of the HashTable class. Take a look at the hashtable.h file before proceeding.
remove And resizeTable on a Separate Chaining Hash TableOpen your schashtable.cpp file. In this file, the above two functions have not been implemented—your job is to implement them.
removeresizeTable>= 0.7. The function shouldResize checks this and is already implemented for you.findPrime with a parameter of double the current size to set the size. See other calls to resize for reference.findPrime and the declaration for 'size' in hashtable.hschashtable.h to see how the SCHashTable is actually implemented.insert On a Linear Probing Hash TableOpen your lphashtable.cpp. In this file, you will be implementing the insert function.
insert, given a key and a value, should insert the (key, value) pair into the hash table. Remember to check if the array needs to be resized prior to insertion.lphashtable.h file to make sure the appropriate fields have been updated. For example, the should_probe array flags whether we continue to probe forwards in the array if the key is not found at the current index.You MUST handle collisions in your insert function, or your hash table will be broken!
charcountThe CharFreq class has already been implemented for you. Type the following into the terminal:
make charcount
This will make the charcount executable. This file takes two arguments: the first is the file to analyse, and the second is the frequency for which to return characters. For example, a run on SherlockHolmes.txt for characters with frequency greater or equal to 10000 might look like:
$ ./charcount SherlockHolmes.txt 10000 schash Finding chars in SherlockHolmes.txt with frequency \>\= 10000 using SCHashTable... 54944 e 40511 t 36142 a 34869 o 31248 i 29731 n 29579 h 27965 s 25684 r 19100 d 17636 l 13636 u 12155 m 11534 w 11103 c
Note You should verify that your solution outputs the right values for both kinds of hash tables!! You can test each one by changing the third command line argument to "schash" or "lphash", respectively.
WordFreq ClassNow we will write our own application for a hash table! Open word_counter.cpp. You will be writing the getWords function. This function, given an integer threshold, should return a vector of (string, int) pairs. Each of these pairs is a string and its frequency in the file. Your vector should only contain those pairs whose frequencies are greater than or equal to threshold. Look at char_counter.cpp if you are stuck. You may find the TextFile class defined in textfile.h helpful.
wordcount ExecutableWhen you're done with the above, type the following into your terminal:
make wordcount
This will build the wordcount executable, which uses the WordFreq class to determine word frequencies in a file. It, like charcount, takes two arguments: the first is the path to the file to be analyzed, and the second is the threshold for which to grab words. If a word occurs >= threshold times, it will be printed to the console.
For example, a run of wordcount on metamorphoses.txt with a frequency 1000 might look like:
$ ./wordcount metamorphoses.txt 1000 lphash Finding words in metamorphoses.txt with frequency \>\= 1000 using LPHashTable... 10473 the 5753 of 4739 and 3078 to 2246 a 1989 in 1706 was 1572 his 1548 that 1500 her 1456 with 1306 is 1241 he 1185 by
You should verify your solution works with both kinds of hash tables.
AnagramFinder ClassFinding anagrams is another clever use of hash tables. Open anagram_finder.cpp. You will be writing the checkWord function, which simply returns a boolean value. It should return true if the string test is an anagram of word, and false otherwise.
An anagram of a given word is a word which has the same letters. For example, "retinas" and "stainer" are anagrams of each other. Think carefully about what it means for a word to be an anagram of another. Can we use a hash table to keep track of something? Perhaps counting letter frequencies will help you?
Note: the proper procedure to update a value in our hash tables is to remove the key from the hash table, and re-insert the key with the new value.
anagramtest ExecutableWhen you're done with the above, type the following into your terminal:
make anagramtest
This will make the anagramtest executable, which you can use to test your AnagramFinder class. It can be run with or without arguments. Without arguments will test a very simplistic example—with arguments is more interesting.
anagramtest can also take two arguments: the first is the path to the file to be analyzed, and the second is the word to find anagrams for. For example, a run of anagramtest on words.txt looking for anagrams of retinas might look like:
$ ./anagramtest words.txt retinas schash Checking file words.txt for anagrams of retinas using SCHashTable... anestri is an anagram of retinas asterin is an anagram of retinas eranist is an anagram of retinas nastier is an anagram of retinas ratines is an anagram of retinas resiant is an anagram of retinas restain is an anagram of retinas retains is an anagram of retinas retinas is an anagram of retinas retsina is an anagram of retinas sainter is an anagram of retinas stainer is an anagram of retinas starnie is an anagram of retinas stearin is an anagram of retinas
You should verify your solution works with both kinds of hash tables to verify their correctness as well.
anagram_finder.cpp
/**
* @file anagram_finder.cpp
* Implementation of the AnagramFinder class.
*
* @author Chase Geigle
* @date Spring 2011
* @date Summer 2012
*/
#include "anagram_finder.h"
using std::string;
using std::vector;
using std::ofstream;
using std::endl;
/**
* Constructs an AnagramFinder based on a filename to read potential
* anagrams from.
*
* @param ifilename The name of the file to read in.
*/
template <template <class K, class V> class Dict>
AnagramFinder<Dict>::AnagramFinder(const string& ifilename)
: file(true), filename(ifilename)
{
/* nothing */
}
/**
* Constructs an AnagramFinder based on a set of strings instead of a
* filename.
*
* @param istrings The set of strings to use for this finder.
*/
template <template <class K, class V> class Dict>
AnagramFinder<Dict>::AnagramFinder(const vector<string>& istrings)
: file(false), strings(istrings)
{
/* nothing */
}
/**
* Determines if the given word is an anagram of the test word.
*
* @param word Word that is possibly an anagram.
* @param test Word to check against.
* @return A boolean value indicating whether word is an anagram of test.
*/
template <template <class K, class V> class Dict>
bool AnagramFinder<Dict>::checkWord(const string& word, const string& test)
{
/**
* @todo Implement this function! You should use the provided
* templated hashtable class Dict.
*/
if (word.length() != test.length()) return false;
Dict<char, int> charCount(256);
for (size_t i = 0; i < word.length(); i++) {
charCount[word[i]]++;
}
for (size_t i = 0; i < test.length(); i++) {
charCount[test[i]]--;
}
for (auto kv = charCount.begin(); kv != charCount.end(); ++kv) {
if (kv->second != 0) return false;
}
return true;
}
/**
* Retrieves a set of words that are anagrams of a given word.
*
* @param word The word we wish to find anagrams of inside the finder.
*/
template <template <class K, class V> class Dict>
vector<string> AnagramFinder<Dict>::getAnagrams(const string& word)
{
// set up the return vector
vector<string> ret;
if (file) {
TextFile infile(filename);
string line;
vector<string> tests;
while (infile.good()) {
string test = infile.getNextWord();
if (checkWord(word, test))
ret.push_back(test);
}
} else {
for (size_t i = 0; i < strings.size(); i++) {
if (checkWord(word, strings[i]))
ret.push_back(strings[i]);
}
}
return ret;
}
/**
* Retrieves a set of anagrams in the finder of a given words, but writes
* them out to a file instead of returning a vector.
*
* @param word The word we wish to find anagrams of inside the finder.
* @param output_file The name of the file we want to write to.
*/
template <template <class K, class V> class Dict>
void AnagramFinder<Dict>::writeAnagrams(const string& word,
const string& output_file)
{
vector<string> anagrams = getAnagrams(word);
ofstream outfile(output_file.c_str());
if (outfile.is_open()) {
for (size_t i = 0; i < anagrams.size(); i++)
outfile << anagrams[i] << endl;
}
outfile.close();
}
lphashtable.cpp
/**
* @file lphashtable.cpp
* Implementation of the LPHashTable class.
*
* @author Chase Geigle
* @date Spring 2011
* @date Summer 2012
*/
#include "lphashtable.h"
using hashes::hash;
using std::pair;
template <class K, class V>
LPHashTable<K, V>::LPHashTable(size_t tsize)
{
if (tsize <= 0)
tsize = 17;
size = findPrime(tsize);
table = new pair<K, V>*[size];
should_probe = new bool[size];
for (size_t i = 0; i < size; i++) {
table[i] = NULL;
should_probe[i] = false;
}
elems = 0;
}
template <class K, class V>
LPHashTable<K, V>::~LPHashTable()
{
for (size_t i = 0; i < size; i++)
delete table[i];
delete[] table;
delete[] should_probe;
}
template <class K, class V>
LPHashTable<K, V> const& LPHashTable<K, V>::operator=(LPHashTable const& rhs)
{
if (this != &rhs) {
for (size_t i = 0; i < size; i++)
delete table[i];
delete[] table;
delete[] should_probe;
table = new pair<K, V>*[rhs.size];
should_probe = new bool[rhs.size];
for (size_t i = 0; i < rhs.size; i++) {
should_probe[i] = rhs.should_probe[i];
if (rhs.table[i] == NULL)
table[i] = NULL;
else
table[i] = new pair<K, V>(*(rhs.table[i]));
}
size = rhs.size;
elems = rhs.elems;
}
return *this;
}
template <class K, class V>
LPHashTable<K, V>::LPHashTable(LPHashTable<K, V> const& other)
{
table = new pair<K, V>*[other.size];
should_probe = new bool[other.size];
for (size_t i = 0; i < other.size; i++) {
should_probe[i] = other.should_probe[i];
if (other.table[i] == NULL)
table[i] = NULL;
else
table[i] = new pair<K, V>(*(other.table[i]));
}
size = other.size;
elems = other.elems;
}
template <class K, class V>
void LPHashTable<K, V>::insert(K const& key, V const& value)
{
/**
* @todo Implement this function.
*
* @note Remember to resize the table when necessary (load factor >=
* 0.7). **Do this check *after* increasing elems!!** Also, don't
* forget to mark the cell for probing with should_probe!
*/
(void)key; // prevent warnings... When you implement this function, remove this line.
(void)value; // prevent warnings... When you implement this function, remove this line.
++elems;
if (shouldResize()) resizeTable();
size_t idx = hash(key, size);
while (table[idx] != NULL) {
idx = (idx + 1) % size;
}
table[idx] = new pair<K, V>(key, value);
should_probe[idx] = true;
}
template <class K, class V>
void LPHashTable<K, V>::remove(K const& key)
{
int idx = findIndex(key);
if (idx != -1) {
delete table[idx];
table[idx] = NULL;
--elems;
}
}
template <class K, class V>
int LPHashTable<K, V>::findIndex(const K& key) const
{
size_t idx = hash(key, size);
size_t start = idx;
while (should_probe[idx]) {
if (table[idx] != NULL && table[idx]->first == key)
return idx;
idx = (idx + 1) % size;
// if we've looped all the way around, the key has not been found
if (idx == start)
break;
}
return -1;
}
template <class K, class V>
V LPHashTable<K, V>::find(K const& key) const
{
int idx = findIndex(key);
if (idx != -1)
return table[idx]->second;
return V();
}
template <class K, class V>
V& LPHashTable<K, V>::operator[](K const& key)
{
// First, attempt to find the key and return its value by reference
int idx = findIndex(key);
if (idx == -1) {
// otherwise, insert the default value and return it
insert(key, V());
idx = findIndex(key);
}
return table[idx]->second;
}
template <class K, class V>
bool LPHashTable<K, V>::keyExists(K const& key) const
{
return findIndex(key) != -1;
}
template <class K, class V>
void LPHashTable<K, V>::clear()
{
for (size_t i = 0; i < size; i++)
delete table[i];
delete[] table;
delete[] should_probe;
table = new pair<K, V>*[17];
should_probe = new bool[17];
for (size_t i = 0; i < 17; i++)
should_probe[i] = false;
size = 17;
elems = 0;
}
template <class K, class V>
void LPHashTable<K, V>::resizeTable()
{
size_t newSize = findPrime(size * 2);
pair<K, V> **temp = new pair<K, V> *[newSize];
delete[] should_probe;
should_probe = new bool[newSize];
for (size_t i = 0; i < newSize; i++)
{
temp[i] = NULL;
should_probe[i] = false;
}
for (size_t i = 0; i < size; i++)
{
if (table[i] != NULL)
{
size_t idx = hash(table[i]->first, newSize);
while (temp[idx] != NULL)
idx = (idx + 1) % newSize;
temp[idx] = table[i];
should_probe[idx] = true;
}
}
delete[] table;
// don't delete elements since we just moved their pointers around
table = temp;
size = newSize;
}
schashtable.cpp
/**
* @file schashtable.cpp
* Implementation of the SCHashTable class.
*
* @author Chase Geigle
* @date Spring 2011
* @date Summer 2012
*/
#include "schashtable.h"
using hashes::hash;
using std::list;
using std::pair;
template <class K, class V>
SCHashTable<K, V>::SCHashTable(size_t tsize)
{
if (tsize <= 0)
tsize = 17;
size = findPrime(tsize);
table = new list<pair<K, V>>[size];
elems = 0;
}
template <class K, class V>
SCHashTable<K, V>::~SCHashTable()
{
delete[] table;
}
template <class K, class V>
SCHashTable<K, V> const& SCHashTable<K, V>::
operator=(SCHashTable<K, V> const& rhs)
{
if (this != &rhs) {
delete[] table;
table = new list<pair<K, V>>[rhs.size];
for (size_t i = 0; i < rhs.size; i++)
table[i] = rhs.table[i];
size = rhs.size;
elems = rhs.elems;
}
return *this;
}
template <class K, class V>
SCHashTable<K, V>::SCHashTable(SCHashTable<K, V> const& other)
{
table = new list<pair<K, V>>[other.size];
for (size_t i = 0; i < other.size; i++)
table[i] = other.table[i];
size = other.size;
elems = other.elems;
}
template <class K, class V>
void SCHashTable<K, V>::insert(K const& key, V const& value)
{
++elems;
if (shouldResize())
resizeTable();
pair<K, V> p(key, value);
size_t idx = hash(key, size);
table[idx].push_front(p);
}
template <class K, class V>
void SCHashTable<K, V>::remove(K const& key)
{
typename list<pair<K, V>>::iterator it;
/**
* @todo Implement this function.
*
* Please read the note in the lab spec about list iterators and the
* erase() function on std::list!
*/
size_t idx = hash(key, size);
for (it = table[idx].begin(); it != table[idx].end(); it++) {
if (it->first != key) continue;
table[idx].erase(it);
--elems;
return;
}
}
template <class K, class V>
V SCHashTable<K, V>::find(K const& key) const
{
size_t idx = hash(key, size);
typename list<pair<K, V>>::iterator it;
for (it = table[idx].begin(); it != table[idx].end(); it++) {
if (it->first == key)
return it->second;
}
return V();
}
template <class K, class V>
V& SCHashTable<K, V>::operator[](K const& key)
{
size_t idx = hash(key, size);
typename list<pair<K, V>>::iterator it;
for (it = table[idx].begin(); it != table[idx].end(); it++) {
if (it->first == key)
return it->second;
}
++elems;
if (shouldResize())
resizeTable();
idx = hash(key, size);
pair<K, V> p(key, V());
table[idx].push_front(p);
return table[idx].front().second;
}
template <class K, class V>
bool SCHashTable<K, V>::keyExists(K const& key) const
{
size_t idx = hash(key, size);
typename list<pair<K, V>>::iterator it;
for (it = table[idx].begin(); it != table[idx].end(); it++) {
if (it->first == key)
return true;
}
return false;
}
template <class K, class V>
void SCHashTable<K, V>::clear()
{
delete[] table;
table = new list<pair<K, V>>[17];
size = 17;
elems = 0;
}
template <class K, class V>
void SCHashTable<K, V>::resizeTable()
{
typename list<pair<K, V>>::iterator it;
size_t oldSize = size;
size = findPrime(size * 2);
list<pair<K, V>>* newTable = new list<pair<K, V>>[size];
for (size_t i = 0; i < oldSize; i++) {
for (it = table[i].begin(); it != table[i].end(); it++) {
size_t idx = hash(it->first, size);
newTable[idx].push_front(*it);
}
}
delete[] table;
table = newTable;
}
word_counter.cpp
/**
* @file word_counter.cpp
* Implementation of the WordFreq class.
*
* @author Chase Geigle
* @date Spring 2011
* @date Spring 2012
*/
using std::cout;
using std::endl;
using std::ifstream;
using std::istringstream;
using std::pair;
using std::string;
using std::vector;
template <template <class K, class V> class Dict>
WordFreq<Dict>::WordFreq(const string& filename) : dict(256) {
TextFile infile(filename);
while (infile.good()) {
string str = infile.getNextWord();
dict[str]++;
}
}
template <template <class K, class V> class Dict>
vector<pair<string, int>> WordFreq<Dict>::getWords(int threshold) {
vector<pair<string, int>> ret;
/**
* @todo Implement this function.
* @see char_counter.cpp if you're having trouble.
*/
for (auto kv = dict.begin(); kv != dict.end(); ++kv) {
if (kv->second < threshold) continue;
ret.push_back(*kv);
}
return ret;
}