[C++&Rust]LeetCode No.692 top K high frequency words (one question per day)

Posted by don_s on Wed, 09 Feb 2022 02:28:29 +0100

Original address: http://blog.leanote.com/post/dawnmagnet/lc692

subject

Give a non empty word list and return the first k words with the most occurrences.

The returned answers should be sorted by word frequency from high to low. If different words have the same frequency, sort them alphabetically.

Example 1:

input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
 output: ["i", "love"]
analysis: "i" and "love" It is the two words that appear most frequently, both twice.
    Note, in alphabetical order "i" stay "love" Before.

Example 2:

input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
 output: ["the", "is", "sunny", "day"]
analysis: "the", "is", "sunny" and "day" Are the four words that appear most frequently,
    The number of occurrences is 4, 3, 2 And once.

be careful:

assume k Total valid value, 1 ≤ k ≤ Number of collection elements.
All words entered are composed of lowercase letters.

Source: LeetCode
Link: https://leetcode-cn.com/problems/top-k-frequent-words
The copyright belongs to Lingkou network. For commercial reprint, please contact the official authorization, and for non-commercial reprint, please indicate the source.

Train of thought analysis

This problem is very simple in terms of thinking. First, we have to traverse the word list to obtain the most basic criterion, that is, the number of occurrences of each word. This process can only be realized by hash table.
After doing the above work, we got each word and the number of times it appeared. With this number of times, we can sort these key value pairs. And sorting still needs to be defined by ourselves. Because no language provides size comparison for key value pairs, even if there is, it may not meet our requirements, so we need to customize the comparison function.

However, we are not in a hurry to compare. We think about a problem. We sort n key value pairs, but in fact we only need k, so we need to introduce a data structure to solve our problem. This data structure is priority queue / maximum heap / minimum heap. These three are actually one thing. In fact, the bottom layer is realized by heap. It's just that different languages have different names. Basically, a language will provide these tools (except C, which doesn't match).
For example, priority in C + +_ Queue and set, heapq in python, BinaryHeap in t rust, PriorityQueue in java, etc. are similar but not very different. The methods provided are push / Pop / top (or peek). We maintain a priority queue with a length less than k. because we don't need elements after k, we can directly discard them, In this way, our complexity is reduced from O(nlgn) to O(nlgk), because each time we insert the priority queue, we only need to compare with the number of k.
So the difficulty shifts to how to express to the priority queue that we need to specify the order of sorting. This makes use of lambda functions and even requires custom objects. All these require us to have a deeper grasp of the use of a language. We will not expand the lambda function of specific language in detail here. Those who are interested can csdn learn by themselves. This mainly depends on your own habitual language. You have to master it comprehensively in order to play a greater value

C + + code

auto cmp = [](const pair<string, int>& a, const pair<string, int>& b) {
            return a.second == b.second ? a.first < b.first : a.second > b.second;
        };
class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        unordered_map<string, int> m;
        for (auto & word : words) m[word]++;
        priority_queue<pair<string, int>, vector<pair<string, int>>, decltype(cmp)> que(cmp);
        for (auto& it : m) {
            que.emplace(it);
            if (que.size() > k) {
                que.pop();
            }
        }
        vector<string> res;
        while (que.size()) {
            res.push_back(que.top().first);
            que.pop();
        }
        reverse(res.begin(), res.end());
        return res;
    }
};

Rust code

use std::collections::*;
use std::cmp::Ordering;
#[derive(Eq, Debug)]
struct Pair {
    pub word: String,
    freq: i32,
}
impl Pair {
    fn new(w: &str, f: i32) -> Self {
        Pair {
            word: w.to_string(),
            freq: f,
        }
    }
}
impl Ord for Pair {
    fn cmp(&self, other: &Self) -> Ordering {
        if self.freq == other.freq {
            self.word.cmp(&other.word)
        } else {
            other.freq.cmp(&self.freq)
        }
    }
}
impl PartialOrd for Pair {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

impl PartialEq for Pair {
    fn eq(&self, other: &Self) -> bool {
        self.freq == other.freq
    }
}
impl Solution {
    pub fn top_k_frequent(words: Vec<String>, k: i32) -> Vec<String> {
        let mut map = HashMap::new();
        for word in words {
            *map.entry(word).or_insert(0) += 1;
        }
        let mut heap = BinaryHeap::new();
        for (key, val) in map.iter() { 
            heap.push(Pair::new(key, *val));
            if heap.len() > k as usize {
                heap.pop();
            }
        }
        // println!("{:?}", heap);
        let mut res = vec![];
        while heap.len() > 0 {
            if let Some(pair) = heap.pop() {
                res.push(pair.word);
            }
        }
        res.reverse();
        res
    }
}

Topics: leetcode