Basic Datastructure Toolbox

Hash Table

Collision resolution:

1. Chaining wtih LinkedList

  • $O(N)$ worst case
  • most common approach

2. Chaining with BST

  • $O(logN)$ worst case
  • take more space than linked list
  • hence, rarely used unless we had extremely nonuniform distribution

3. Open Addressing with Linear Probing

  • idea: move to next cloest closed open spot
  • no bucket chaining, total entries limited by array size
  • issue of clustering because of linear probing
  • worst case query time get() is $O(N)$

4. Quadratic Probing and Double Hashing

  • instead of probing linear as in approach 3, we prob quadratically
  • or use second hash function to random determine prob distance (reduce chance of clustering)

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
package ood;

import java.util.Iterator;
import java.util.LinkedList;

public class HashMapOOD {
private final static int DEFAULT_CAP = 100;
private final static int EMPTY_SIZE = 0;
private final static int LOAD_FACTOR = 70;
private int mapSize = DEFAULT_CAP;
private int currentSize;
private KeyValue[] map;

public HashMapOOD() {
currentSize = 0;
map = new KeyValue[DEFAULT_CAP];
}

public void put(int key, int val) {
// TOOD: load factor, expand
int index = hash(key);
if (map[index] == null) {
// create new node
map[index] = new KeyValue(key, val);
} else {
// add/update new node
KeyValue node = new KeyValue(key, val);
addOrUpdateNode(map[index], node);
}
}

public int get(int key) {
int index = hash(key);
KeyValue node = map[index];

while (node != null) {
if (node.key == key) {
return node.val;
}
node = node.next;
}
return -1;
}

public void remove(int key) {
int index = hash(key);
KeyValue head = map[index];
KeyValue dummy = new KeyValue(0, 0);
dummy.next = head;
KeyValue pre = dummy;

while (head != null) {
if (head.key == key) {
pre.next = head.next;
break;
}
pre = head;
head = head.next;
}
map[index] = dummy.next;
}

public boolean containsKey(int key) {
int index = hash(key);
KeyValue node = map[index];

while (node != null) {
if (node.key == key) {
return true;
}
node = node.next;
}
return false;
}

public boolean isEmpty() {
return currentSize == EMPTY_SIZE;
}

public int size() {
return currentSize;
}

/*
* expand() is NOT accessible to instance directly
* hence, use static key word
*
*/
private static boolean expand() {

return false;
}

// private limits to only current class, not accessible for user side
private int hash(int key) { // IMPORT: static method cannot access non-static variable.
return key % mapSize;
}

private void addOrUpdateNode(KeyValue head, KeyValue node) {
KeyValue prev = head;
while (head != null) {
if (head.key == node.key) {
head.val = node.val;
return;
}
prev = head;
head = head.next;
}
prev.next = node;
}
}

ref: https://discuss.leetcode.com/topic/102606/design-a-hashmap

Trie

Why Trie?

  • hashing search, insert, delete takes $O(L)$ (for hash() opeartion) in average, but need to handle collision
  • Trie: insert and serach string in $O(L)$, where $L$ is the length of string, no need to handle collision
  • prefix search & autocomplete

https://www.geeksforgeeks.org/advantages-trie-data-structure/

Implementation1: Array

  • fixed sized array, limited to lower case letters
  • use c - 'a' to index array for character c
  • waste of unused space
1
2
3
4
class TrieNode { // nested class
public boolean isWord;
public TrieNode[] childrenMap = new TrieNode[26]; // for letters only
}

example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
private class Trie {
private Trie[] children = new Trie[26];
private int count = 0;
private boolean ban = false;
}

// tire approach
public String mostCommonWord(String paragraph, String[] banned) {
Trie root = new Trie();
Trie curr = root;

// add ban words
for (String ban : banned) {
curr = root; // import! dont forget!
for (int i = 0; i < ban.length(); i++) {
int idx = ban.charAt(i) - 'a';
if (curr.children[idx] == null) {
curr.children[idx] = new Trie();
}
curr = curr.children[idx];
}
curr.ban = true;
}

// add paragraph words and count max
int maxCount = 0;
String maxWord = "";
String[] words = paragraph.toLowerCase().split("\\W+");// faster if process manually by char
for (String word : words) {
curr = root; // import! dont forget!
for (int i = 0; i < word.length(); i++) {
int idx = word.charAt(i) - 'a';
if (curr.children[idx] == null) {
curr.children[idx] = new Trie();
}
curr = curr.children[idx];
}
curr.count += 1;
if (curr.count > maxCount && !curr.ban) {
maxCount = curr.count;
maxWord = word;
}
}
return maxWord;
}

ref: https://leetcode.com/problems/most-common-word/

Implementation2: HashMap

  • slower query/search than array (time for hash(), etc)
  • saves spaces, only store the children nodes we need
  • more flexible in functionality (suppoert upper-cases letter, or TrieNode could store words in hiericy, Quantcast OA JSON storage & query)
1
2
3
4
class TrieNode { // nested class
public boolean isWord;
public Map<Character, TrieNode> childrenMap = new HashMap<>();
}

Trie w/ implementation2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class Trie {
// nested class TrieNode can also access private varaible within Trie class
class TrieNode { // nested class, used as "helper classes", only serving class Trie,
public boolean isWord;
public Map<Character, TrieNode> children = new HashMap<>();
}
/** Initialize your data structure here. */
private TrieNode root; // why private? limitd within Trie class only

public Trie() {
this.root = new TrieNode();
}

/** Inserts a word into the trie. */
public void insert(String word) {
TrieNode curr = this.root; // dont forget!
for (int i = 0; i < word.length(); i++) {
char c = word.charAt(i);
if (!curr.children.containsKey(c)) {
curr.children.put(c, new TrieNode());
}
curr = curr.children.get(c);
}
curr.isWord = true;
}

/** Returns if the word is in the trie. */
public boolean search(String word) {
TrieNode curr = searchPrefix(word);
return curr != null && curr.isWord;
}

/** Returns if there is any word in the trie that starts with the given prefix. */
public boolean startsWith(String prefix) {
TrieNode curr = searchPrefix(prefix);
return curr != null;
}

// static method only when:
// 1. call method not require an instance of the class
// 2. only if it makes sense to call this method even no object/instance has been created

// searchPrefix violates second rule, hence no static
private TrieNode searchPrefix(String prefix) {
TrieNode curr = root;
for (int i = 0; i < prefix.length(); i++) {
char c = prefix.charAt(i);
if (!curr.children.containsKey(c)) {
return null;
}
curr = curr.children.get(c);
}
return curr;
}
}

/**
* Your Trie object will be instantiated and called as such:
* Trie obj = new Trie();
* obj.insert(word);
* boolean param_2 = obj.search(word);
* boolean param_3 = obj.startsWith(prefix);
*/

ref: https://leetcode.com/explore/learn/card/trie/

Binary Search Tree (BST)

ref: http://pages.cs.wisc.edu/~paton/readings/Binary-Search-Trees/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package ood;

public class TreeNode<K> {
private K key;
private TreeNode<K> left, right;

TreeNode () {};
TreeNode (K key) {
this.key = key;
}

TreeNode (TreeNode<K> left, TreeNode<K> right, K key) {
this.key = key;
this.left = left;
this.right = right;
}

// accessors
public K getKey() {
return key;
}

public TreeNode<K> getLeft() {
return left;
}

public TreeNode<K> getRight() {
return right;
}

public boolean isLeaf() {
return this.getLeft() == null && this.getRight() == null;
}

// mutators
public void setLeft(TreeNode<K> node) {
left = node;
}

public void setRight(TreeNode<K> node) {
right = node;
}

public void setKey(K newKey) {
key = newKey;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
package ood;

public class BST<K extends Comparable<K>> {
private TreeNode<K> root;
private int size;

BST() {
root = null;
size = 0;
}

public void insert(K key) {
TreeNode<K> curr = root;
if (curr == null) {
// if root == null, new node
root = new TreeNode<K>(key);
size++;
return;
}

curr = travereTree(curr, key);
if (curr == null) return; // duplicate key found, exit
if (key.compareTo(curr.getKey()) < 0) {
curr.setLeft(new TreeNode<K>(key));
} else {
curr.setRight(new TreeNode<K>(key));
}
size++;
}

public boolean lookup(K key) {
TreeNode<K> node = root;
while (node != null) {
if (key.compareTo(node.getKey()) == 0) {
// if duplicate node, return null
return true;
} else if (key.compareTo(node.getKey()) < 0) {
node = node.getLeft();
} else {
node = node.getRight();
}
}
return false;
}

private TreeNode<K> travereTree(TreeNode<K> node, K key) {
while (node != null) {
if (key.compareTo(node.getKey()) == 0) {
// if duplicate node, return null
return null;
} else if (key.compareTo(node.getKey()) < 0) {
if (node.getLeft() == null) {
return node;
}
node = node.getLeft();
} else {
if (node.getRight() == null) {
return node;
}
node = node.getRight();
}
}
return node;
}

// TODO:
public void delete() {

}

public int size() {
return size;
}
}

Test program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package ood;

public class BSTTest {
public static void main(String[] args) {
BST<Integer> tree = new BST<Integer>();
tree.insert(1);
tree.insert(2);
tree.insert(2);
tree.insert(3);
tree.insert(3);

System.out.println(tree.lookup(3));
System.out.println(tree.lookup(2));
System.out.println(tree.lookup(1));
System.out.println(tree.lookup(4));
System.out.println(tree.size());
}
}

ref:
数据结构:二叉搜索树(BST)的基本操作
http://blog.csdn.net/lemon_tree12138/article/details/50366993
用java源代码学数据结构<七>: BST
http://blog.csdn.net/wanghao109/article/details/13508357
二叉查找树 (C#)
http://www.cnblogs.com/gaochundong/p/binary_search_tree.html
你曾实现过二叉树吗 (C#)
http://www.cnblogs.com/gaochundong/archive/2013/04/24/csharp_binary_tree.html

Stack

Using Array

  • works, however, array has fixed size, space used NOT proportional to number of items in list.
  • require expansion $O(n)$ extra time complexity for each expandsion, but $O(1)$ amortized/average time
  • OR better always $O(1)$ for push() w/ a shadow array.

Using LinkedList (Preferred)

  • insert to head, remove from head (difference to Queue)
  • dynamically size, space used proportional to number of items in list, no expansion required.
  • extra space to store next pointer for each of the value (Array no such extra pointer)

LinkedList implementation

ref: http://pages.cs.wisc.edu/~paton/readings/Stacks-and-Queues/

1
2
3
4
5
6
7
8
9
10
// ListNode class
public class ListNode {
int val;
ListNode next;

ListNode() {}
ListNode(int val) {
this.val = val;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Stack class
import java.util.EmptyStackException;

public class StackOOD {
private ListNode head;
private int size;
StackOOD () {
head = null;
size = 0;
}

public void push(int val) {
ListNode newNode = new ListNode(val);
newNode.next = head.next;
head.next = newNode;
size++;
}

public int pop() {
if (stack.isEmpty()) throw new EmptyStackException();
ListNode node = head.next;
int val = node.val;
head.next = node.next;

// gc clean up
node.next = null;

size--;
return val;
}

public int size() {
return size;
}

public boolean isEmpty() {
return size == 0;
}

public int peek() {
if (stack.isEmpty()) throw new EmptyStackException();
return head.next.val;
}
}

Test Program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class StackOODTest {
public static void main(String[] args) {
StackOOD stack = new StackOOD();
stack.push(1);
stack.push(2);
stack.push(2);
stack.push(2);
stack.push(1);
stack.push(2);
stack.push(2);
stack.push(2);
System.out.println("Is stack empty: " + stack.isEmpty());
System.out.println(stack.pop());
System.out.println("current size: " + stack.size());
System.out.println(stack.pop());
System.out.println("current size: " + stack.size());
}
}

Queue

  • trade of between Array and LinkedList is similar to Stack
  • i used LinkedList for implementation
  • insert to tail, remove from head (difference to Stack)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
package ood;
import java.util.EmptyStackException;

public class QueueOOD {
private ListNode head;
private ListNode tail;
private int size;

QueueOOD() {
size = 0;
}

public void offer(int val) {
ListNode newNode = new ListNode(val);
size++;

if (head == null) {
head = newNode;
tail = newNode;
return;
}

tail.next = newNode;
tail = newNode;
}

public int poll() {
if (head == null) throw new EmptyStackException();
int value = head.val;
head = head.next;
size--;
return value;
}

public int size() {
return size;
}

public boolean isEmpty() {
return size == 0;
}
}

Test Program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package ood;

public class QueueOODTest {
public static void main(String[] args) {
QueueOOD queue = new QueueOOD();
queue.offer(1);
queue.offer(2);
queue.offer(3);
queue.offer(4);

System.out.println("size: " + queue.size());
System.out.println(queue.poll());
System.out.println(queue.poll());
System.out.println(queue.poll());
System.out.println(queue.poll());
System.out.println("size: " + queue.size());
System.out.println(queue.poll());
queue.offer(100);
System.out.println("size: " + queue.size());

}
}

Heap

Heap Property

  • Heap is a tree-based datastructure with two special properties:
    • ORDER property
    • SHAPE property
  • A common implementation of a heap is the binary heap, in which the tree is a binary tree (complete binary tree)
  • Hence we could use array as representation of a heap
  • All examples given in this section are max-heap, min-heap is similiar

The ORDER property:

For every node n, the value in n is greater than or equal to the values in its children (and thus is also greater than or equal to all of the values in its subtrees).

The SHAPE property:

  • All leaves are either at depth d or d-1 (for some value d).
  • All of the leaves at depth d-1 are to the right of the leaves at depth d.
  • There is at most 1 node with just 1 child, that child is the left child of its parent, and it is the rightmost leaf at depth.

The SHAPE property Examples:

Image from uwmadion cs367 webpage

The ORDER property Examples:

Image from uwmadion cs367 webpage

Implementing priority queues using heaps

  • Array or ArrayList
  • Starts at position 1 (instead of 0)
  • The root of the heap is always in array[1].
  • In general, if a node is in array[k], then its left child is in array[k*2], and its right child is in array[k*2 + 1].
  • If a node is in array[k], then its parent is in array[k/2] (using integer division, so that if k is odd, then the result is truncated; e.g., 3/2 = 1).

Image from uwmadion cs367 webpage

Insert() operation

  • insert to the end of the array (the right most leave to tree)
  • swap with its parent if it is larger than its parent value
  • recursively perform step two until hit root OR step 2 fails

removeMax() operation

  • swap root with last element in array, and pop the last element
  • swap new root with its larger childern
  • recursively perform step two util its fails

source: http://pages.cs.wisc.edu/~vernon/cs367/notes/11.PRIORITY-Q.html

Graph

Graph consisits of:

  • finite set of vertices called nodes
  • finite set of vertice pair $(u, v)$ called edges
  • two types of graph: directed graph and undirected graph
  • two types of graph representation: adjacency matrix and adjacency list
  • undirected graph has symmetric adjacency matrix

[TODO] X-code PPT
https://goo.gl/pd5XDK
https://goo.gl/hciJzA

updated: 05/30/2018

133. Clone Graph

BFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public UndirectedGraphNode cloneGraph(UndirectedGraphNode node) {
if (node == null) {
return null;
}

// Key is the node in original graph and value is the corresponding
// cloned node
Map<UndirectedGraphNode, UndirectedGraphNode> map = new HashMap<>();
// Queue to iterate through all nodes in original graph
Deque<UndirectedGraphNode> q = new ArrayDeque<>();
q.add(node);

UndirectedGraphNode graphCopy = new UndirectedGraphNode(node.label);
map.put(node, graphCopy);

while (!q.isEmpty()) {
UndirectedGraphNode n = q.poll();
UndirectedGraphNode clone = map.get(n);
for (UndirectedGraphNode neighbor : n.neighbors) {
// create if clone copy does not exist
UndirectedGraphNode copy = map.computeIfAbsent(neighbor, k -> {
q.add(k); // every node has a chance to be processed
return new UndirectedGraphNode(k.label);
});
clone.neighbors.add(copy);
}
}

return graphCopy;
}

DFS (bottom up)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public UndirectedGraphNode cloneGraph2(UndirectedGraphNode node) {
return cloneHelper(node, new HashMap<>());
}

// recursive bottom up
// top bottom might not work in this case, since some of the node (clone) might has not been created, hence require bottom up
public static cloneHelper(UndirectedGraphNode node, Map<UndirectedGraphNode, UndirectedGraphNode> clones) {
if (node == null) return null;

UndirectedGraphNode clone = clones.get(node);
if (clone != null) return clone;

clone = new UndirectedGraphNode(node.label);
clones.put(node, clone);
for (UndirectedGraphNode neighbor : node.neighbors) {
clone.neighbors.add(cloneHelper(neighbor, clones));
}
return clone;
}

Logs

  • 03/15/2018 - add Trie
  • 03/15/2018 - add Hash table
  • 03/16/2018 - add BST
  • 03/16/2018 - add Stack & Queue
  • 05/07/2018 - add heap
  • 05/30/2018 - add Graph
  • 05/04/2019 - add example for trie array[26] implementation
  • 05/11/2019 - refactor StackOOD code
  • 05/11/2019 - add DFS and BFS for clone graph