Data structure: Trie
Published:
notes about trie (aka. prefix tree)
1. Concept
- Motivation: reduce unnecessary comparisons between strings, making it faster than Hash Table
- By saying “faster than Hash Table”, simply looking for a key takes O(1) for both of them. But (1) finding all keys with common prefix, and (2) enumerating dataset in lexicographical order will be faster using Trie.
- Solution: create a tree-like structure, root is ‘0’ as the initial start index, each edge is a candidate of appended character. So each child node will be a possible extended word from the prefix.
- To indicate the end of word, there are two ways: (1) using a “termination leaf node” with special char such as ‘#’. (2) using a boolean attribute on each node to indicate if this is the end of word.
- The trie will have a predictable size: (1) depth: about 10 as most word will have no more than 10 character. (2) width: 26x2 if for alphabets. There will be more if special characters are included. But no more than 256 (ALPHABETIC_SIZE of all ASCII chars).
2. Core code block
There are two ways to write a binary search: (1)while loop, (2)recursion
2(1). Implement a Trie
# using python
# To implement tree like structure, first need to implement node class
class TrieNode:
def __init__(self):
"""
1. A boolean attribute indicating if this is the end of word
"""
self.isWord = False
"""
2. A dictionary to record the child nodes
"""
# To avoid key error using empty dict(), we use collections.defaultdict() instead. If the passing item is not found, instead of throwing error, it will just create a new entry
self.child = collections.defaultdict(TrieNode)
# In python we don't need to explicitly declare constructor
class Trie:
def __init__(self):
"""
Initialize your data structure here.
"""
self.root = TrieNode()
def insert(self, word: str) -> None:
"""
Inserts a word into the trie.
"""
# 1. iterate top-down and append
curr = self.root
for w in word:
curr = curr.child[w]
# 2. check is end of word
curr.isWord = True
def search(self, word: str) -> bool:
"""
Returns if the word is in the trie.
"""
# 1. iterate top-down and append
curr = self.root
for w in word:
if w not in curr.child:
return False
else:
curr = curr.child[w]
# 2. check is end of word
return curr.isWord
def startsWith(self, prefix: str) -> bool:
"""
Returns if there is any word in the trie that starts with the given prefix.
"""
# 1. iterate top-down and append
curr = self.root
for p in prefix:
if p not in curr.child:
return False
else:
curr = curr.child[p]
# 2. Warning: 'word' is prefix of 'word' itself, no need to check isWord
return True
3. Leetcode questions (sorted by frequency, then difficulty)
There will be 3 different levels of mastery:
- Unvisited: Not done
- Forgotten: Not remember
- Familiar: Remember the overall structure
- Master: Bug-free and remember all the pitfalls
4(1). High Frequency
Question | Mastery | Core coding skills | Pitfalls | Last date of practice |
---|---|---|---|---|
208. Implement Trie (Prefix Tree) | Familiar | 实现字典树的创建、插入、查找 | 1. always implement node class before the tree class. And there are only 2 attributes for Trie nodes:(1) boolean isWord (2) dictionary children. 2. dictionary is usually a handy way to represent nested trees. 3. for insert, since we already use defaultdict, we can iterate top-down without checking key exist in dict. 4. for prefix, ‘word’ is prefix of ‘word’ itself, no need to check the boolean isWord | 08/18/2021 |
139. Word Break (I) | Familiar | 判断一个字符串能否被分割成一些其它的字符串 | 1. notice that some of the candidate can be used zero time. 2. gist of this problem: wb(‘word’) = ‘w’ in candidates and wb(‘ord’) and wb(‘’) is always true as base case. So it becomes a recursion problem with single moving pointer. 3. This brute force will takes O(2^N) time where N is the length of word, as each gap has options of to split or not too split. Using memoization will reduce redundant subproblems, reducing down to O(N). However, the total runtime will still be O(N^3) considering also the for loops and using substring function. 4. PITFALL 1: both the forloop and substring have EXCLUSIVE end index. So using len+1 instead of len for forloop range. 5. PITFALL 2: remember where to update the memo, it should be done after the normal cases assertion where we are sure that wb(start_idx=i) is True or False. Which means if we truncate the word, we are sure that target_str[start:] can be wb’ed. | 08/18/2021 |
98 Validate Binary Search Tree | Unvisited | ??? | ??? | 06/15/2021 |
103 Binary Tree Zigzag Level Order Traversal | Unvisited | ??? | ??? | 06/15/2021 |
226 Invert Binary Tree | Unvisited | ??? | ??? | 06/15/2021 |
426 Convert Binary Search Tree to Sorted Doubly Linked List | Unvisited | ??? | ??? | 06/15/2021 |
173 Binary Search Tree Iterator | Unvisited | ??? | ??? | 06/15/2021 |
1650 Lowest Common Ancestor of a Binary Tree III | Unvisited | ??? | ??? | 06/15/2021 |
545 Boundary of Binary Tree | Unvisited | ??? | ??? | 06/15/2021 |