【设计难题】力扣642-设计搜索自动补全系统

字数统计: 1.3k字 | 阅读时长: 6分

2020-11-01

摘要: 自动补全系统设计，基于 Trie

【对算法，数学，计算机感兴趣的同学，欢迎关注我哈，阅读更多原创文章】
我的网站：潮汐朝夕的生活实验室
我的公众号：算法题刷刷
我的知乎：潮汐朝夕
我的github：FennelDumplings
我的leetcode：FennelDumplings

$1 题目

642. 设计搜索自动补全系统

为搜索引擎设计一个搜索自动补全系统。用户会输入一条语句（最少包含一个字母，以特殊字符 ‘#’ 结尾）。

给定一个字符串数组 sentences 和一个整数数组 times ，长度都为 n ，其中 sentences[i] 是之前输入的句子， times[i] 是该句子输入的相应次数。对于除 ‘#’ 以外的每个输入字符，返回前 3 个历史热门句子，这些句子的前缀与已经输入的句子的部分相同。

下面是详细规则：

一条句子的热度定义为历史上用户输入这个句子的总次数。
返回前 3 的句子需要按照热度从高到低排序（第一个是最热门的）。如果有多条热度相同的句子，请按照 ASCII 码的顺序输出（ASCII 码越小排名越前）。
如果满足条件的句子个数少于 3 ，将它们全部输出。
如果输入了特殊字符，意味着句子结束了，请返回一个空集合。

实现 AutocompleteSystem 类：

AutocompleteSystem(String[] sentences, int[] times): 使用数组sentences 和 times 对对象进行初始化。
List
input(char c) 表示用户输入了字符 c 。
- 如果 c == ‘#’ ，则返回空数组 [] ，并将输入的语句存储在系统中。
- 返回前 3 个历史热门句子，这些句子的前缀与已经输入的句子的部分相同。如果少于 3 个匹配项，则全部返回。

提示:

n == sentences.length
n == times.length
1 <= n <= 100
1 <= sentences[i].length <= 100
1 <= times[i] <= 50
c 是小写英文字母， '#', 或空格 ' '
每个被测试的句子将是一个以字符 '#' 结尾的字符 c 序列。
每个被测试的句子的长度范围为 [1,200] 
每个输入句子中的单词用单个空格隔开。
input 最多被调用 5000 次

示例 1：
输入
[“AutocompleteSystem”, “input”, “input”, “input”, “input”]
[[[“i love you”, “island”, “iroman”, “i love leetcode”], [5, 3, 2, 2]], [“i”], [“ “], [“a”], [“#”]]
输出
[null, [“i love you”, “island”, “i love leetcode”], [“i love you”, “i love leetcode”], [], []]

解释
AutocompleteSystem obj = new AutocompleteSystem([“i love you”, “island”, “iroman”, “i love leetcode”], [5, 3, 2, 2]);
obj.input(“i”); // return [“i love you”, “island”, “i love leetcode”]. There are four sentences that have prefix “i”. Among them, “ironman” and “i love leetcode” have same hot degree. Since ‘ ‘ has ASCII code 32 and ‘r’ has ASCII code 114, “i love leetcode” should be in front of “ironman”. Also we only need to output top 3 hot sentences, so “ironman” will be ignored.
obj.input(“ “); // return [“i love you”, “i love leetcode”]. There are only two sentences that have prefix “i “.
obj.input(“a”); // return []. There are no sentences that have prefix “i a”.
obj.input(“#”); // return []. The user finished the input, the sentence “i a” should be saved as a historical sentence in system. And the following input will be counted as a new search.

$2 题解

算法: Trie

Trie 维护句子及其前缀，如果是叶子节点，则有一个热度字段表示该句子的热度。

每次查询，先找到前缀对应的节点，然后 DFS 该子树的所有叶子节点，得到热度前三的句子。

代码 (C++)

const int ALPHABET = 27;

struct TrieNode
{
    int cnt;
    vector<TrieNode*> children;
    TrieNode()
    {
        cnt = 0;
        children = vector<TrieNode*>(ALPHABET, nullptr);
    }
};

class Trie
{
public:
    Trie()
    {
        root = new TrieNode();
        input = root;
        cur = "";
    }

    ~Trie()
    {
        if(root)
        {
            _delete_sub_tree(root);
            input = nullptr;
        }
    }

    void insert(const string& sentence, int cnt)
    {
        TrieNode *iter = root;
        for(const char ch: sentence)
        {
            TrieNode *&nxt = (iter -> children)[_char2int(ch)];
            if(!nxt)
                nxt = new TrieNode();
            iter = nxt;
        }
        iter -> cnt = cnt;
    }

    vector<string> auto_complete(char ch)
    {
        if(ch == '#')
        {
            ++(input -> cnt);
            cur = "";
            input = root;
            return vector<string>();
        }
        cur += ch;
        TrieNode *&nxt = (input -> children)[_char2int(ch)];
        if(!nxt)
        {
            nxt = new TrieNode();
            input = nxt;
            return vector<string>();
        }
        input = nxt;
        priority_queue<PSI, vector<PSI>, Cmp> pq; // 容量为 3 的最小堆
        dfs(input, cur, pq);
        vector<string> result;
        stack<string> st;
        while(!pq.empty())
        {
            st.push(pq.top().first);
            pq.pop();
        }
        while(!st.empty())
        {
            result.push_back(st.top());
            st.pop();
        }
        return result;
    }

private:
    TrieNode *root;
    string cur;
    TrieNode *input;

    using PSI = pair<string, int>; // sentence, cnt

    void _delete_sub_tree(TrieNode* node)
    {
        for(TrieNode *child: node -> children)
            if(child)
                _delete_sub_tree(child);
        delete node;
        node = nullptr;
    }

    struct Cmp
    {
        bool operator()(const PSI& psi1, const PSI& psi2)
        {
            // psi1 比 psi2 小：
            // psi1.second < psi2.second
            // psi1.first > psi2.first
            if(psi1.second == psi2.second)
                return psi1.first < psi2.first;
            return psi1.second > psi2.second;
        }
    };

    void dfs(TrieNode* root, string& cur, priority_queue<PSI, vector<PSI>, Cmp>& pq)
    {
        if(root -> cnt > 0)
        {
            if(pq.size() < 3)
                pq.push(PSI(cur, root -> cnt));
            else
            {
                Cmp cmp;
                if(!cmp(pq.top(), PSI(cur, root -> cnt)))
                {
                    pq.push(PSI(cur, root -> cnt));
                    pq.pop();
                }
            }
        }
        for(int i = 0; i < 27; ++i)
        {
            TrieNode *child = (root -> children)[i];
            if(i < 26) cur += 'a' + i;
            else cur += ' ';
            if(child)
                dfs(child, cur, pq);
            cur.pop_back();
        }
    }

    int _char2int(const char& ch)
    {
        if(ch == ' ') return 26;
        return ch - 'a';
    }
};

class AutocompleteSystem {
public:
    AutocompleteSystem(vector<string>& sentences, vector<int>& times) {
        trie = new Trie();
        int n = sentences.size();
        for(int i = 0; i < n; ++i)
            trie -> insert(sentences[i], times[i]);
    }

    vector<string> input(char c) {
        return trie -> auto_complete(c);
    }

private:
    Trie *trie;
};

潮汐朝夕的生活实验室 \Doge 陪伴一个算法工程师的职业生涯

算法leetcode

【设计难题】力扣642-设计搜索自动补全系统

$1 题目

$2 题解

算法: Trie

代码 (C++)