Are there regular expressions to detect valid regular expressions?

Posted by allydm on Tue, 17 Dec 2019 04:52:20 +0100

Can another regular expression be used to detect valid regular expressions?If so, give an example below.

#1st floor

Not likely.

Evaluate try..catch or the language you provide.

#2nd floor

/
^                                             # start of string
(                                             # first group start
  (?:
    (?:[^?+*{}()[\]\\|]+                      # literals and ^, $
     | \\.                                    # escaped characters
     | \[ (?: \^?\\. | \^[^\\] | [^\\^] )     # character classes
          (?: [^\]\\]+ | \\. )* \]
     | \( (?:\?[:=!]|\?<[=!]|\?>)? (?1)?? \)  # parenthesis, with recursive content
     | \(\? (?:R|[+-]?\d+) \)                 # recursive matching
     )
    (?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )?   # quantifiers
  | \|                                        # alternative
  )*                                          # repeat content
)                                             # end first group
$                                             # end of string
/

This is a recursive regular expression that many regular expression engines do not support.PCR E-based should support it.

No spaces or comments:

/^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/

.NET does not directly support recursion.(?1) and (?R) structures.)Recursion must be converted to a count balance group:

^                                         # start of string
(?:
  (?: [^?+*{}()[\]\\|]+                   # literals and ^, $
   | \\.                                  # escaped characters
   | \[ (?: \^?\\. | \^[^\\] | [^\\^] )   # character classes
        (?: [^\]\\]+ | \\. )* \]
   | \( (?:\?[:=!]
         | \?<[=!]
         | \?>
         | \?<[^\W\d]\w*>
         | \?'[^\W\d]\w*'
         )?                               # opening of group
     (?<N>)                               #   increment counter
   | \)                                   # closing of group
     (?<-N>)                              #   decrement counter
   )
  (?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )? # quantifiers
| \|                                      # alternative
)*                                        # repeat content
$                                         # end of string
(?(N)(?!))                                # fail if counter is non-zero.

Compaction:

^(?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>|\?<[^\W\d]\w*>|\?'[^\W\d]\w*')?(?<N>)|\)(?<-N>))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*$(?(N)(?!))

From comments:

Does this validate substitution and translation?

It will only validate the replacement and translation of the regular expression part.S/<this part>/.../

It is theoretically impossible to match all valid regular expression grammars to regular expressions.

If the regular expression engine supports recursion (such as PCR E), it can no longer be called a regular expression.

In fact, a recursive regular expression is not a regular expression.But this is a very often accepted extension of the regular expression engine... Ironically, this extended regular expression does not match the extended regular expression.

"Theoretically, theory and practice are the same.In practice, they are not."Almost everyone who knows regular expressions knows that they do not support recursion.However, PCR E and most other implementations support more than just basic regular expressions.

It is used with the shell script in the grep command and it shows some errors. The contents of grep:{} are invalid.Can you help me, I'm creating a script that can grep a code base to find all the files that contain regular expressions

This pattern takes advantage of an extension called a recursive regular expression.The POSIX style of regular expressions does not support this feature.You can try using the -P switch to enable the PCR E regular expression style.

The regular expression itself "is not a regular language, so it cannot be parsed by a regular expression..."

This is correct for classical regular expressions.Some modern implementations allow recursion, which makes it a Context Free language, although the task is a bit lengthy.

I see you match []()/\.And other special regular expression characters.Where do you allow non-special characters?It looks like this will match ^(?: [\]+)$, but not ^abcdefg$.This is a valid regular expression.

[^?+*{}() [\]\\\|] will match any single character and not part of any other structure.This includes text (a - z) and some special characters (^, $,.).

#3rd floor

Good question.True general language cannot determine a good form of brackets nested at any depth.That is, if your letters contain'('and')', the goal is to determine if these strings have well-formed matching brackets.Since this is necessary for regular expressions, the answer is No.

However, if you relax the requirements and add recursion, you can do so.The reason is that recursion can act as a "stack" that allows you to "calculate" the current nesting depth by pushing into it.

Russ Cox wrote a great paper on the implementation of the Regular Expression Engine: Regular expression matching can be simple and fast

#4th floor

If you talk strictly about regular expressions and don't include some that are actually contextual grammar-free regular expression implementations, that's not true.

Regular expressions have a limitation that makes it impossible to write regular expressions that match all regular expressions.You cannot match paired braces, etc.Regular expressions use many of these structures, let's take [] for example.As long as there is [must be a match].A sufficiently simple regular expression'['.*].

Regular expressions cannot be implemented because they can be nested.How do you write a regular expression that matches nested parentheses?The answer is that you can't have infinite regular expressions.You can match any number of nested parens by violence, but you cannot match any long nested parens.

This function is often referred to as counting (you calculate the depth of nesting).By definition, regular expressions cannot be counted.

Edit: End blog post about this: Regular expression restrictions

#5th floor

While it is entirely possible to use recursive regular expressions as published by MizardX, parsers are more useful for such things.Regexes were originally intended for general language use, and recursive or balanced groups were just patches.

The language that defines a valid regular expression is actually a contextless grammar, and you should use the appropriate parser to handle it.This is an example of a university project that parses simple regular expressions (without most constructs).It uses JavaCC.Yes, the comments are in Spanish, although the method name is self-explanatory.

SKIP :
{
    " "
|   "\r"
|   "\t"
|   "\n"
}
TOKEN : 
{
    < DIGITO: ["0" - "9"] >
|   < MAYUSCULA: ["A" - "Z"] >
|   < MINUSCULA: ["a" - "z"] >
|   < LAMBDA: "LAMBDA" >
|   < VACIO: "VACIO" >
}

IRegularExpression Expression() :
{
    IRegularExpression r; 
}
{
    r=Alternation() { return r; }
}

// Matchea disyunciones: ER | ER
IRegularExpression Alternation() :
{
    IRegularExpression r1 = null, r2 = null; 
}
{
    r1=Concatenation() ( "|" r2=Alternation() )?
    { 
        if (r2 == null) {
            return r1;
        } else {
            return createAlternation(r1,r2);
        } 
    }
}

// Matchea concatenaciones: ER.ER
IRegularExpression Concatenation() :
{
    IRegularExpression r1 = null, r2 = null; 
}
{
    r1=Repetition() ( "." r2=Repetition() { r1 = createConcatenation(r1,r2); } )*
    { return r1; }
}

// Matchea repeticiones: ER*
IRegularExpression Repetition() :
{
    IRegularExpression r; 
}
{
    r=Atom() ( "*" { r = createRepetition(r); } )*
    { return r; }
}

// Matchea regex atomicas: (ER), Terminal, Vacio, Lambda
IRegularExpression Atom() :
{
    String t;
    IRegularExpression r;
}
{
    ( "(" r=Expression() ")" {return r;}) 
    | t=Terminal() { return createTerminal(t); }
    | <LAMBDA> { return createLambda(); }
    | <VACIO> { return createEmpty(); }
}

// Matchea un terminal (digito o minuscula) y devuelve su valor
String Terminal() :
{
    Token t;
}
{
    ( t=<DIGITO> | t=<MINUSCULA> ) { return t.image; }
}

Topics: Lambda shell