剑指52-正则表达式匹配

题目描述

请实现一个函数用来匹配包括.和*的正则表达式。模式中的字符.表示任意一个字符，而*表示它前面的字符可以出现任意次（包含0次）。在本题中，匹配是指字符串的所有字符匹配整个模式。例如，字符串aaa与模式a.a和ab*ac*a匹配，但是与aa.a和ab*a均不匹配

关于模式匹配：

一条包括.和*的正则表达式，即模式，可以用非确定有限状态机来阐述。可知比较麻烦的是处理*。
- ba*ab.的状态机来举例，状态s1是初始状态，收到b进入s2，s2收到a有两个选择，可能回到s2，也可能进入新状态s3，s3收到b进入s4，s4收到任意一个字符进入s5。
用递归。

关于如下代码：

根据是否处理*分为前后两部分，后者是第二个字符是*这一种情况，前者包括s为空、无.和*的参与和.后面没有*这三种情况。而前者根据是否处理.又分为.后面没有*这种情况，以及s为空、无.和*的参与这种情况。四种类别互斥且完备，如下树状图所示。

                         p为空
                        /
                       /
           无 ---- p[0]有无`.`  --- 有`.`其后无`*`
          /            \
         /              \
p[2]是否有`*`            无`.`           无
         \                             /
          \                           /
           有 ---- s[0]是否为对应元素
                                      \
                                       \
                                         有

处理部分只主动返回True，为确保返回的是True而非False，在返回之前有确认逻辑。坚持不返回False，因为不知为不知，以确保四分支处理线互不干扰。最后一行有个返回False，可以认为是被动返回。
返回True的确认逻辑的match函数，其输入参数s和p的变更要有依据。如无.和*的参与部分里参数变更为s[1:], p[1:]，是因为有s[0] == p[0]；而第二个字符是*部分里参数变更为s, p[2:]，是因为有p[1] == '*'。

class Solution:
    def match(self, s, p):
        # s为空
        if len(s) == 0 and len(p) == 0:
                return True
        # 无.和*的参与，且s和p的第一个字符匹配
        if len(s) > 0 and len(p) > 0 and s[0] == p[0] and p[0] not in '.*':
            if len(p) >=2 and p[1] != '*' or len(p) == 1:
                if self.match(s[1:], p[1:]):
                    return True
        # 第一个字符是`.`，后面没有`*`
        if len(p) > 0 and p[0] == '.' and (len(p) == 1 or len(p) >= 2 and  p[1] != '*' ) and len(s) > 0:
            if self.match(s[1:], p[1:]):
                return True

        # 第二个字符是`*`
        if len(p) >= 2 and p[1] == '*':
            if self.match(s, p[2:]):
                return True
            if len(s) > 0 and (s[0] == p[0] or p[0] == '.'):
                if self.match(s[1:], p) or self.match(s[1:], p[2:]):
                    return True

        return False

动态规划。

长度为x的字符串\(s_1s_2\cdots s_x\)，模式长度为y的模式序列\(p_1\cdots p_y\)。如字符串aaa的长度是3，而模式序列ab*ac*a有5个子模式a,b*,a,c*,a。令g(x,y)表示\(s_1s_2\cdots s_x\)与\(p_1\cdots p_y\)的匹配状态，如g(x,y)=0表示匹配不成功，g(x,y)=1表示匹配成功。则g(x,y)有如下递推关系：

如果\(p_y\)不是带*子模式，有g(x,y) = g(s_x, p_y) and g(x-1,y-1)。其中\(g(s_x, p_y)\)表示字符\(s_x\)与模式\(p_y\)的匹配状态，很容易计算。特殊的，有g(0,y)=g(0,p_y)=1 if y>=1，以及g(x,0)=g(s_x,0) if x>=1。
如果\(p_y\)是带*子模式，有g(x,y)=g(x,y-1) or g(s_x,s_y) and g(x-1,y)。其中，g(x,y-1)=1则说明去掉\(p_y\)仍匹配，即\(p_y\)匹配0个字符。而g(s_x,s_y)& g(x-1,y)=1则说明\(p_y\)匹配一个或多个字符。特殊的，有g(0,y)=g(0,y-1) if y>=1
g(0,?)和g(?,0)分别表示空模式和空字符串。
边界值g(0,0)=1
g(x,y)组成一个维度为(x+1)*(y+1)的二维矩阵。

class Solution:
    def match(self, s, p):
        def limit(sc,pc): # 计算sc是单字符或者空字符，以及pc是单模式和空模式的情形
            if len(sc) == 0 and len(pc) == 0: # 空字串和空子模式
                return 1
            if len(pc)==2 and pc[1] == '*':
                if pc[0] == '.':
                    return 1
                elif len(sc) == 0:
                    return 1
                elif sc == pc[0]:
                    return 1
            if len(sc) == 1 and len(pc) == 1 and sc == pc:
                return 1
            if len(sc) == 1 and len(pc) == 1 and pc == '.':
                return 1
            return 0
        def getPatentsNumber(p): # 获得模式序列中的所有子模式，返回子模式列表
            ps = []
            for i in range(len(p)):
                if i + 1 <= len(p)-1:
                    if p[i+1] == '*':
                        continue
                if p[i] == '*' and i-1 >= 0:
                    ps.append(p[i-1:i+1])
                    continue
                ps.append(p[i])
            return ps          
        def computeMatch(s, ps, x, y, G): #以递归和动态规划的思想获得矩阵G
            if x + y == 0:
                G[x][y] = 1
                return
            if x == 0 and len(ps[y-1]) > 1 and ps[y-1][1] != '*':
                G[x][y] = limit('', ps[y-1])
                return
            if y == 0:
                G[x][y] = limit(s[x-1],'')
                return
            if x > 0 and y > 0 and (len(ps[y-1]) > 1 and ps[y-1][1] != '*' or len(ps[y-1]) == 1):
                if G[x-1][y-1] == -1:
                    computeMatch(s, ps, x-1, y-1, G)
                G[x][y] = limit(s[x-1],ps[y-1]) and G[x-1][y-1]
                return
            if len(ps[y-1]) > 1 and ps[y-1][1] == '*':
                if G[x][y-1] == -1:
                    computeMatch(s, ps, x, y-1, G)
                if x == 0:
                    G[x][y] = G[x][y-1]
                    return
                else:
                    if G[x-1][y] == -1:
                        computeMatch(s, ps, x-1, y, G)
                    G[x][y] = G[x][y-1] or limit(s[x-1],ps[y-1]) and G[x-1][y]
                        
        ps = getPatentsNumber(p)
        x, y = len(s), len(ps)
        G = [([-1]*(y+1))[::] for _ in range(x+1)]
        computeMatch(s, ps, x, y, G)
        
        return True if G[x][y] == 1 else False