Trie单串多模式匹配，模式开头为统一特殊字符

字数统计: 1.2k字 | 阅读时长: 5分

2020-11-28

摘要: 一类字符串匹配需求。在单个串中匹配多个模式串

【对算法，数学，计算机感兴趣的同学，欢迎关注我哈，阅读更多原创文章】
我的网站：潮汐朝夕的生活实验室
我的公众号：算法题刷刷
我的知乎：潮汐朝夕
我的github：FennelDumplings
我的leetcode：FennelDumplings

背景

给定单串 s，n 个模式串 p1, p2, ..., pn。

其中 n 个模式串中，任意两个模式串都没有子串的关系，且都以一个特殊字符开头，模式的其它字符不会出现该特殊字符。

在 s 中找到所有匹配到这 n 个模式串的所有起始位置。

Trie

用 n 个模式串建立一个 Trie。推进 i 的字符时，如果 s[i] 为特殊字符，则记录 j = i，并初始化 iter = root，之后尝试同时推进 iter 和 i，最终 iter 如果走到了叶子，则记录答案 j；如果 iter 到达叶子之前就走不动了，则停止推进 iter，并等待 s 遇到下一个特殊字符。

因为所有模式都以特殊字符开头，因此当 s[i] 为特殊字符时才进入 trie 中推进 i；因为模式中不会出现特殊字符，trie 中推进的 i 在匹配失败时就不用回溯。

题目1

1410. HTML 实体解析器

「HTML 实体解析器」是一种特殊的解析器，它将 HTML 代码作为输入，并用字符本身替换掉所有这些特殊的字符实体。

HTML 里这些特殊字符和它们对应的字符实体包括：

双引号：字符实体为 " ，对应的字符是 “ 。
单引号：字符实体为 ' ，对应的字符是 ‘ 。
与符号：字符实体为 & ，对应对的字符是 & 。
大于号：字符实体为 > ，对应的字符是 > 。
小于号：字符实体为 < ，对应的字符是 < 。
斜线号：字符实体为 ⁄ ，对应的字符是 / 。

给你输入字符串 text ，请你实现一个 HTML 实体解析器，返回解析器解析后的结果。

提示：

1 2	1 <= text.length <= 10^5 字符串可能包含 256 个ASCII 字符中的任意字符。

示例 1：
输入：text = “& is an HTML entity but &ambassador; is not.”
输出：”& is an HTML entity but &ambassador; is not.”
解释：解析器把字符实体 & 用 & 替换

示例 2：
输入：text = “and I quote: "…"”
输出：”and I quote: \”…\””

示例 3：
输入：text = “Stay home! Practice on Leetcode :)”
输出：”Stay home! Practice on Leetcode :)”

示例 4：
输入：text = “x > y && x < y is always false”
输出：”x > y && x < y is always false”

示例 5：
输入：text = “leetcode.com⁄problemset⁄all”
输出：”leetcode.com/problemset/all”

算法：Trie

六个实体均以特殊字符 & 开头，且实体其它字符不出现 &，将六个实体预处理成一个 trie，在叶子节点标记对应的字符。

然后在 text 上推进 i，当 text[i] 为 & 时进入 trie 中推进 i，若匹配则替换，若不匹配则跳出 trie 继续推进 i。

注意，起始的特殊字符 & 在模式串其它字符中不会出现，但在单串 text 中却可能出现，因此当匹配失败返回的时候，如果匹配失败的字符是 &，需要回退一步继续以该 & 为起始点匹配。

代码 (C++)

struct TrieNode
{
    unordered_map<char, TrieNode*> children;
    string val;
    TrieNode(){}
};

class Trie
{
public:
    Trie()
    {
        root = new TrieNode();
    }

    ~Trie()
    {
        if(root)
            _delete_sub_tree(root);
    }

    void insert(const string& key, const string& val)
    {
        TrieNode *iter = root;
        for(const char &ch: key)
        {
            if(!(iter -> children)[ch])
                (iter -> children)[ch] = new TrieNode();
            iter = (iter -> children)[ch];
        }
        iter -> val = val;
    }

    string search(const string& s, int& i)
    {
        // s[i] == '&', 从 s[i] 开始推进，若匹配到，返回匹配的字符，未匹配到返回 ""
        // i 的推进通过引用直接返回给调用方
        int n = s.size();
        TrieNode *iter = root;
        int left = i;
        while(i < n && (iter -> children)[s[i]])
        {
            iter = (iter -> children)[s[i]];
            if(!(iter -> val).empty())
                return iter -> val;
            ++i;
        }
        if(s[i] == '&')
            --i;
        return "";
    }

private:
    TrieNode *root;

    void _delete_sub_tree(TrieNode* node)
    {
        for(auto child: node -> children)
            if(child.second)
                _delete_sub_tree(child.second);
        delete node;
        node = nullptr;
    }
};

class Solution {
public:
    string entityParser(string text) {
        Trie *trie = new Trie();
        trie -> insert("&quot;", "\"");
        trie -> insert("&apos;", "'");
        trie -> insert("&amp;", "&");
        trie -> insert("&gt;", ">");
        trie -> insert("&lt;", "<");
        trie -> insert("&frasl;", "/");
        int n = text.size();
        int i = 0;
        string result;
        while(i < n)
        {
            if(text[i] != '&')
                result += text[i++];
            else
            {
                int j = i;
                string alter = trie -> search(text, i);
                if(alter.empty())
                    result += text.substr(j, i - j + 1);
                else
                    result += alter;
                ++i;
            }
        }
        return result;
    }
};

潮汐朝夕的生活实验室 \Doge 陪伴一个算法工程师的职业生涯

算法字符串

Trie单串多模式匹配，模式开头为统一特殊字符

背景

Trie

题目1

算法：Trie

代码 (C++)