Dictionary Tree trie Tree

Posted by wolfraider on Sat, 05 Feb 2022 18:45:34 +0100

Dictionary Tree trie Tree - Class Panton's Graphical Notes

Trie tree, also known as dictionary tree, Prefix Tree, and word search tree, is a multifork tree structure.

The role of trie trees

The core idea of Trie Tree is to change time by space and reduce unnecessary string comparisons by using common prefixes of strings to improve query efficiency.

Core Applications

  • String Retrieval
  • word frequency count
  • String Sort
  • Prefix matching (search associations, don't know if autocomplete in element-ui is made from a trie tree, it's cool to use anyway)

trie tree node

Each letter takes up a trie tree node. There is no essential difference between the root node and the child node. There are several main attributes.

  • Root:Is it a root node
  • dict:hash table with child nodes
  • flag: Is it the end of a word
  • value: The complete content of the word (only if flag is True)
  • count: The number of times the word appears (only if flag is True)
  • list: is a dictionary used to traverse stored things (only root nodes)
class Trie(object):

    def __init__(self, root=None,flag=False,count=None,value=None):
        self.root = root  # root node is used to load content
        self.count = count  # Count counts are used for counting and can count word frequencies
        self.dict = {}  # dict is used to load child nodes, child nodes or trie trees
        self.flag = flag  # flag is used to determine whether or not to end
        self.value = value # value means what the word is
        self.list = {}

General operation

First, implement the four most basic operations:

  • insert: increase, notice the last letter and handle it
  • search:
  • Delete:delete
  • startsWith: Determines if a prefix exists

increase

def insert(self, word):
        """
        :type word: str
        :rtype: None
        """
        trie = self
        for j,i in enumerate(word):
            # Traverse (or add) if it is not the last letter
            if j != len(word) - 1:
                if i not in trie.dict:
                    trie.dict[i] = Trie(root=i)
                trie = trie.dict[i]
            else:
                # Is the last letter and flag is true counts
                if i in trie.dict and trie.dict[i].flag:
                    trie = trie.dict[i]
                    trie.count += 1
                # Is the last letter but flag is false modifies flag to change count to 1
                elif i in trie.dict and not trie.dict[i].flag:
                    trie = trie.dict[i]
                    trie.flag = True
                    trie.count = 1
                    trie.value = word
                # If trie. Add if not in Dict
                else:
                    trie.dict[i] = Trie(root=i,flag=True,count=1,value=word)

check

def search(self, word):
        """
        :type word: str
        :rtype: bool
        """
        trie = self
        for i in word:
            if i in trie.dict:
                trie = trie.dict[i]
            else:
                return False
        return trie.flag

Delete

def delete(self, word):
        '''
        :type word: str
        :rtype: None
        '''
        if self.search(word):
            trie = self
            for i in word:
                if len(trie.dict) == 1:
                    trie.dict.pop(i)
                    break
                else:
                    trie = trie.dict[i]
        else:
            print('The word is not present trie In a tree')

Determine if a prefix exists

def startsWith(self, prefix):
        """
        :type prefix: str
        :rtype: bool
        """
        trie = self
        for i in prefix:
            if i in trie.dict:
                trie = trie.dict[i]
            else:
                return False
        return True

Traversal operation

Traversal operation prefers DFS depth-first search, and prefix is added to the parameters for the purpose of subsequent association operations, which can be omitted if only traversal is done; It is worth mentioning that my storage of child nodes is in the form of a dictionary, and the sorting of the results just happens to be the dictionary order, which also implements the string dictionary order mentioned earlier (in fact, to sort the dictionary order directly, there is no need to use a trie tree).

def dfs(self,path,trie,prefix=''):
        # If you come to the end of a word for the truth
        if trie.flag:
            self.list[prefix + ''.join(path)] = trie.count
            if trie.dict == {}:
                return
        
        for i in trie.dict:
            path.append(i)
            self.dfs(path,trie.dict[i],prefix)
            path.pop()

Create iterative methods for trie trees

def __iter__(self):
        self.list = {}  # Reset self.list
        self.dfs([],self)
        return iter(self.list)

Association Operation

Lenovo business should be like this: users enter the first few letters, and then match out the complete words, the more frequently used words should be placed first;

So the overall idea is as follows: find the trie tree node where the last letter of the prefix is located according to the prefix, start DFS traversal from that node, add a prefix before the traversal result, sort the result by count and output the result as a list;

Implement the sorting algorithm first, using quick sorting here, note that the value of comparison is count, and the word is left behind.

def quick_sorted(self,dict):
        '''
        :type dict: dict
        :rtype: list
        '''
        if len(dict) >= 2:
            nums = list(dict.keys())
            mid = nums[len(nums) // 2]
            left, right = {}, {}
            nums.remove(mid)
            for d in nums:
                if dict[d] < dict[mid]:
                    left[d] = dict[d]
                else:
                    right[d] = dict[d]
            return self.quick_sorted(right) + [mid] + self.quick_sorted(left)
        return list(dict.keys())

Re-implement Association Operation

def dream(self,prefix):
        '''
        :type prefix: str
        :rtype: list
        '''
        trie = self
        for i in prefix:
            if i in trie.dict:
                trie = trie.dict[i]
            elif dict[d] == dict[mid]:
                if d > mid:
                    right[d] = dict[d]
                else:
                    left[d] = dict[d]
            else:
                return []
        self.list = {}  # Reset self.list
        self.dfs([],trie,prefix)
        return self.quick_sorted(self.list)

Some simple test cases

if __name__ == '__main__':
    trie = Trie()
    for i in ["ogz","eyj","e","ey","ey","hmn","v","hm","ogznkb","ogzn","hmnm","eyjuo","vuq","ogznk","og","eyjuoi","d"]:
        trie.insert(i)
    a = trie.dream('e')
    print(a)
    for j in trie:
        print(j)

Actual Warfare - LeetCode

Question 208: Simply add, check and determine if a prefix exists, it's very simple...

Question 720: The longest word in a dictionary, which requires returning the longest word formed by gradually adding a letter to other words in the dictionary;

Idea 1: Make a judgment about the prefix of each word, and then choose the longest word to pass the judgment (no additional action is required)

import Trie
class Solution(object):
    def longestWord(self, words):
        """
        :type words: List[str]
        :rtype: str
        """
        trie = Trie()
        words = sorted(list(set(words)))
        for i in words:
            trie.insert(i)
        result = ''
        for i in words:
            flag = True
            # Determine if all prefixes for this word are in the trie tree
            for j in range(len(i)):
                if not trie.search(i[:j+1]):
                    flag = False
                    break
            # If all prefixes of this word are in this trie tree
            if flag and len(i) > len(result):
                result = i
        return result

if __name__ == '__main__':
    words = ["a", "banana", "app", "appl", "ap", "apply", "apple"]
    print(Solution().longestWord(words))

Idea 2: Direct use of DFS to find the longest one, much faster but still not ideal

#%% trie + depth first search
class Solution(object):
    def __init__(self):
        self.result = ''
    def longestWord(self, words):
        """
        :type words: List[str]
        :rtype: str
        """
        trie = Trie(root=True)
        words = sorted(list(set(words)))
        for i in words:
            trie.insert(i)
        self.dfs([],trie)
        return self.result
    
    def dfs(self, path, trie):
            # When it is not the root node and the end of the word (because you have already decided whether it is the end of the word when you come in, you do not need to repeat your decision)
            if trie.root != True:
                temp = ''.join(path)
                if len(temp) > len(self.result):
                    self.result = temp
            if trie.dict == {}:
                return
            
            for i in trie.dict:
                # Pruning: skip directly if it is not the end of a word
                if not trie.dict[i].flag:
                    continue
                path.append(i)
                self.dfs(path,trie.dict[i])
                path.pop()
        

if __name__ == '__main__':
    words = ["a", "banana", "app", "appl", "ap", "apply", "apple"]
    print(Solution().longestWord(words))

Question 692: The first K high frequency words. The Title is: Give a list of non-empty words and return the first k words that appear the most. Answers returned should be sorted by word frequency from high to low. If different words have the same frequency, sort them alphabetically.

That's not very similar to the associative function we wrote...

import Trie
class Solution(object):
    def topKFrequent(self, words, k):
        """
        :type words: List[str]
        :type k: int
        :rtype: List[str]
        """
        trie = Trie()
        for i in words:
            trie.insert(i)
        result = trie.dream('')
        return result[:k]


if __name__ == '__main__':
    words = ["i","love","leetcode","i","love","coding"]
    k = 2
    print(Solution().topKFrequent(words.k))

Topics: Graph Theory