Morilib Rei is a regular expression library for JavaScript. Morilib Rei has features shown as follows.

  • Building regular expression by JavaScript object

  • Regular expression matcher like Java

How to use

Browser

To use in browsers, load rei.js by script tag.

<script src="rei.js"></script>

node.js

To use in node.js, download Morilib Rei package from npm.

$ npm install morilib-rei

You can use Morilib Rei with requiring the package.

var Re = require("morilib-rei");

Building regular expression from the JavaScript object

Regular expressions can be built from JavaScript object.
Morilib Rei makes meaning of regular expressions clearer.

Re.i or Re.build

To build regular expressions, use Re.i or Re.build function.

var regex = Re.i([ "www.", "morilib.", "net/" ]);
var regex = Re.bulid([ "www.", "morilib.", "net/" ]);

Simple strings

A string literal matches a simple sequence. Metacharacters in the string will be escaped.

var regex = Re.i("www.morilib.net/");
console.log(regex.test("www.morilib.net/"));   // true
console.log(regex.test("wwwwmorilib_net/"));   // false

Concatenation of patterns

An array represents pattern which is concatenated with patterns in the array.

var regex = Re.i([ "www.", "morilib.", "net/" ]);
console.log(regex.test("www.morilib.net/"));   // true
console.log(regex.test("wwwwmorilib_net/"));   // false

Repetation

To match repetation of a pattern, describe the object with the property shown as follows.

Property name Description

oneOrMore

repetation one or more times

zeroOrMore

repetation zero or more times

maybe

appearance zero or one time

repeat

repetation from n to m times

oneOrMoreNonGreedy

repetation one or more times (non-greedy)

zeroOrMoreNonGreedy

repetation zero or more times (non-greedy)

maybeNonGreedy

appearance zero or one time (non-greedy)

repeatNonGreedy

repetation from n to m times (non-greedy)

Repetation one ore more times is described as follows.

var regex = Re.i({ "oneOrMore": "a" });
console.log(regex.test("aaaaa"));  // true

Repetation n from m times is described as folles.

var regex = Re.i({
  "repeat": {
    "from": 2,
    "to": 5,
    "pattern": "a"
  }
});
console.log(regex.test("aaa"));  // true

Every property name can have aliases.
For example, you can specify repeatOneOrMore instead of oneOrMore.
Case of property name is ignored.
See Chapter "Notation" for more details.

Alternation

To match alternation of patterns, use or property. Element of the property must be an array of patterns.

var regex = Re.i({ "or": [ "765", "876", "346", "283" ] });
console.log(regex.test("765"));  // true
console.log(regex.test("961"));  // false

Character set

To match a character in a character set, use charset property.
Element of the property is predefined character set or range object.
To match a character not in a character set, use complementaryCharset property.
See Chapter "Predefined character set" for more details.

var regex = Re.i({ "charset": "digit" });
console.log(regex.test("2"));  // true
var regex = Re.i({
  "charset": {
    "range": {
      "from": "a",
      "to": "z"
    }
  }
});
console.log(regex.test("a"));  // true

Capture

To capture a matched sequence, use capture property.
To refer a matched sequence, use refer property. Element of refer is a number of capture.

var regex = Re.i([
  {
    "capture": {
      "charset": "all"
    }
  },
  { "refer": 1 }
]);
console.log(regex.test("aa"));  // true

Named capture

Any capture can be named.
To use named capture, element of capture property specifies an object with name property.

var regex = Re.i([
  {
    "capture": {
      "name": "char",
      "pattern": {
        "charset": "all"
      }
    }
  },
  { "refer": "char" }
]);
console.log(regex.test("aa"));  // true

Raw regular expression

To bury raw regular expressions, use raw property.
Captures in the raw regular expression are recognized.

// capture of raw regex is considered
var regex = Re.i([
  { "raw": "<([^>]+)>" },
  {
    "capture": {
      "name": "body",
      "pattern": {
        "oneOrMoreNonGreedy": {
          "charset": "all"
        }
      }
    }
  },
  { "raw": "</[^>]+>" }
]);
// { 0: "<a>index.html</a>", 1: "a", 2: "index.html", body: "index.html" }
console.log(regex.execWithName("<a>index.html</a>"));

Unicode properties

To use Unicode properties, use unicode property and its element specifies a name of Unicode property. You can add Is before any Unicode property.
Headers of Unicode property is shown as follows.

Type Header Example

Categories

(none)

Math Symbol

Bidirectional categories

Bidi_Class:

Bidi_Class:Left to Right

Blocks

Block: or Blk= or In

Block: Basic Latin

Scripts

Script= or sc= or Script:

Script=Greek

Whitespaces, underscore(_) and hyphen can be used as delimiter of phrase of Unicode property.

// expands Unicode Property Letter to a character set
var regex = Re.i([
  { "anchor": "begin" },
  {
    "oneOrMore": {
      "unicode": "Letter"
    }
  },
  { "anchor": "end" }
]);
console.log(regex.test("Reiは正規表現のライブラリです"));  // true

Predefined sequences

Morilib Rei has some predefined sequences.
To use predefined sequence, use sequence property.
See Chapter "Predefined sequence" for more details.

var regex = Re.i([
  { "anchor": "begin" },
  { "sequence": "real" },
  { "anchor": "end" }
]);
console.log(regex.test("3.46e+27"));  // true

Java-like Matcher Class

Morilib Rei has Java-like matcher class to realize continuous matching.
To create a matcher class, use matcher method of class generated Re.i function.

var matcher = Re.i({ "unicode": "Letter" }).matcher("Rei");

The matcher class has methods shown as follows.
In these methods, position from which match starts is the last position of last matching.
The position is unchanged if the last match is failed.

Method Description

find

find the pattern

lookingAt

match the pattern from the index of last match

matches

match the pattern from the index of last match to the end of sequence

usePattern

change matching pattern. The index of last match is not changed

You can match with changing patterns by usePattern method.
Show the program which parses S-Expression used in Lisp using usePattern method.

function parseS(aString) {
    var undef = void 0,
        isDot = false,
        stack = [],
        matching,
        matcher;
    function pushStack(anObject) {
        var stackTop = stack[stack.length - 1];
        if(isDot) {
            stackTop.now.cdr = anObject;
            isDot = false;
        } else {
            if(!stackTop.now) {
                stackTop.now = stackTop.start;
            } else {
                stackTop.now = stackTop.now.cdr;
            }
            stackTop.now.car = anObject;
            stackTop.now.cdr = {};
        }
    }

    matcher = Re.i(/$/g).matcher(aString);
    while(!matcher.usePattern(Re.i({ "anchor": "end" }, "global")).lookingAt()) {
        if(matcher.usePattern(Re.i("(", "global")).lookingAt()) {
            stack.push({
                start: {},
                now: null
            });
            matching = undef;
        } else if(matcher.usePattern(Re.i(")", "global")).lookingAt()) {
            matching = stack.pop().start;
        } else if(matcher.usePattern(Re.i([ ".", { "lookahead": { "charset": "space" } } ], "global")).lookingAt()) {
            if(isDot || !stack[stack.length - 1].now) {
                throw new Error("invalid dot");
            }
            matching = undef;
            isDot = true;
        } else if(matcher.usePattern(Re.i({ "sequence": "real" }, "global")).lookingAt()) {
            matching = parseFloat(matcher.group[0]);
        } else if(matcher.usePattern(Re.i({ "oneOrMore": { "charset": "nonspace" } }, "global")).lookingAt()) {
            matching = matcher.group[0];
        }

        if(matching === undef) {
            // do nothing
        } else if(stack.length === 0) {
            return matching;
        } else {
            pushStack(matching);
        }
        matcher.usePattern(Re.i({ "zeroOrMore": { "charset": "space" } }, "global")).lookingAt();
    }
    return undef;
}

Notation

zeroOrMore

alias: repeatZeroOrMore

Format

{
  "zeroOrMore": <pattern>
}

Example

{
  "zeroOrMore": "a"
}

correspond regular expression: a*
matches aaaaa for input aaaaab
matches <empty> for input b

Description

repeat given pattern zero or more times.

oneOrMore

alias: repeatOneOrMore

Format

{
  "oneOrMore": <pattern>
}

Example

{
  "oneOrMore": "a"
}

correspond regular expression: a+
matches aaaaa for input aaaaab
no match for input b

Description

repeat given pattern one or more times.

maybe

alias: option, optional

Format

{
  "maybe": <pattern>
}

Example

{
  "maybe": "a"
}

correspond regular expression: a?
matches a for input aaaaab
matches <empty> for input b

Description

repeat given pattern zero times or one times

repeat

Format

{
  "repeat": {
    "from": <number>,
    "to": <number>,
    "pattern": <pattern>
  }
}

Example

{
  "repeat": {
    "from": 1,
    "to": 3,
    "pattern": "a"
  }
}

correspond regular expression: a{1,3}
matches aaa for input aaaaab
matches a for input ab

Description

repeat given pattern at least "from" times and at most "to" times.
If value "from" is not specified, "from" will be 0.
If value "to" is not specified, "to" will be infinity.

zeroOrMoreNonGreedy

alias: repeatZeroOrMoreNonGreedy, repeatZeroOrMoreNotGreedy, zeroOrMoreNotGreedy

Format

{
  "zeroOrMoreNonGreedy": <pattern>
}

Example

[
  {
    "zeroOrMoreNonGreedy": {
      "charset": "exceptNewline"
    }
  },
  "-"
]

correspond regular expression: .*?-
matches aaaaa for input aaaaa-a-

Description

repeat given pattern zero or more times. This match is the smallest posible match.
For above example aaaaa-a, matches aaaaa.
But if you use zeroOrMore instead, it will match aaaaa-a.

oneOrMoreNonGreedy

alias: repeatOneOrMoreNonGreedy, repeatOneOrMoreNotGreedy, oneOrMoreNotGreedy

Format

{
  "oneOrMoreNonGreedy": <pattern>
}

Example

[
  {
    "oneOrMoreNonGreedy": {
      "charset": "exceptNewline"
    }
  },
  "-"
]

correspond regular expression: .+?-
matches aaaaa for input aaaaa-a-

Description

repeat given pattern one or more times. This match is the smallest posible match.
For above example aaaaa,a, matches aaaaa.
But if you use oneOrMore instead, it will match aaaaa-a.

maybeNonGreedy

alias: optionalNonGreedy, optionalNotGreedy, optionNonGreedy, optionNotGreedy, maybeNotGreedy

Format

{
  "maybeNonGreedy": <pattern>
}

Example

[
  {
    "maybeNonGreedy": {
      "charset": "exceptNewline"
    }
  },
  "-"
]

correspond regular expression: .??-
matches <empty> for input — 

Description

repeat given pattern zero times or one time. This match is the smallest posible match.
For above example — matches empty string.
But if you use oneOrMore instead, it will match -.

repeatNonGreedy

alias: repeatNotGreedy

Format

{
  "repeatNonGreedy": {
    "from": <number>,
    "to": <number>,
    "pattern": <pattern>
  }
}

Example

[
  {
    "repeat": {
      "from": 1,
      "to": 10,
      "pattern": {
        "charset": "exceptNewline"
      }
    }
  },
  "-"
]

correspond regular expression: .{1,10}?-
matches aaaaa for input aaaaa-a-

Description

repeat given pattern at least "from" times and at most "to" times. This match is the smallest posible match.
If value "from" is not specified, "from" will be 0.
If value "to" is not specified, "to" will be infinity. For above example aaaaa,a, matches aaaaa.
But if you use repeat instead, it will match aaaaa-a.

or

alias: alter, alternate, alternation, alternative

Format

{
  "or": [ list of alternation ]
}

Example

{
  "or": [ "765", "346", "283" ]
}

correspond regular expression: 765|346|283
matches 765 for input 765pro

Description

matches one of the list of alternation.

capture

Format

{
  "capture": <pattern>
}
{
  "capture": {
    "name": <name>
    "pattern": <pattern>
  }
}

Example

{
  "capture": {
    "zeroOrMore": "a"
  }
}

correspond regular expression: "(a*)"

Description

matches the given pattern and caputures the matched result.
If the value is an object which has the property "name", the capture is named.
The named capture can use API of Morilib Rei.
The named capture will be numbered as normal capturing.

raw

alias: regex, regexp

Format

{
  "raw": <raw regex>
}

Example

{
  "raw": "a+b"
}

correspond regular expression: a+b

Description

buries raw regular expression.

charset

alias: characterSet

Format

{
  "charset": <name of character set>
}

Example

{
  "charset": "all"
}

correspond regular expression: [\s\S]

Description

matches a character in the given character set.

complementCharset

alias: complementSet, complementaryCharset, complementarySet, complementCharacterSet, complementaryCharacterSet

Format

{
  "complementCharset": <name of character set>
}

Example

{
  "complementCharset": "digit"
}

correspond regular expression: [^\d]

Description

matches a character not in the given character set.

anchor

alias: bound

Format

{
  "anchor": <name of anchor>
}

Example

[
  {
    "anchor": "beginOfLine"
  },
  "abc"
]

correspond regular expression: ^abc
matches abc for input abc
no match for input dabc

Description

matches boundaries. Name of anchor can be shown as follows.

begin, start, beginOfLine startOfLine

matches beginning of line or input

end, endOfLine

matches end of line or input

word, wordBound, wordBoundary

matches word boundary

nonWord, nonWordBound, nonWordBoundary, notWord, notWordBound, notWordBoundary

matches non-word boundary

charCode

alias: characterCode

Format

{
  "charCode": <character code>
}

Example

{
  "charCode": 41
}

correspond regular expression: \u0041

Description

matches a character which has the given code by UTF-16.

lookahead

alias: positiveLookahead, lookaheadAssertion, positiveLookaheadAssertion

Format

{
  "lookahead": <pattern>
}

Example

[
  "765",
  {
    "lookahead": "pro"
  }
]

correspond regular expression: 765(?=pro)
matches 765 for input 765pro
no match for input 765

Description

matches the given pattern but input will not consume.

negativeLookahead

alias: negativeLookaheadAssertion

Format

{
  "negativeLookahead": <pattern>
}

Example

[
  "765",
  {
    "negativeLookahead": "?"
  }
]

correspond regular expression: 765(?!\?)
matches 765 for input 765!
no match for input 765?

Description

matches if the given pattern is not matched but input will not consume.

unicode

alias: unicodeProperty

Format

{
  "unicode": <unicode property>
}

Example

{
  "unicode": "L"
}

correspond regular expression: \p{L}

Description

matches a character which is in the given Unicode property.

complementUnicode

alias: complementaryUnicode, complementUnicodeProperty, complementaryUnicodeProperty

Format

{
  "complementUnicode": <unicode property>
}

Example

{
  "complementUnicode": "L"
}

correspond regular expression: \P{L}

Description

matches a character which is not in the given Unicode property.

sequence

alias: seq

Format

{
  "sequence": <name of sequence>
}

Example

{
  "sequence": "real"
}

Description

matches the predefined pattern.

Notation of character sets

range

Format

{
  "range": {
    "from": <charcter>
    "to": <charcter>
  }
}

Example

{
  "range": {
    "from": "a",
    "to": "z"
  }
}

Description

matches a character between "from" property to "to" property.

unicode

alias: unicodeProperty

Format

{
  "unicode": <unicode property>
}

Example

{
  "unicode": "L"
}

Description

matches a character which is in the given Unicode property.

complementUnicode

alias: complementaryUnicode, complementUnicodeProperty, complementaryUnicodeProperty

Format

{
  "complementUnicode": <unicode property>
}

Example

{
  "complementUnicode": "L"
}

Description

matches a character which is not in the given Unicode property.

Predefined character sets

all

Description

matches any character.

digit

Description

matches any digit.

nonDigit

alias: notDigit

Description

matches any character which is not digit.

word

Description

matches any word.

nonWord

alias: notWord

Description

matches any character which is not word.

space

alias: whitespace

Description

matches any whitespace.

nonSpace

alias: notSpace, nonWhitespace, notWhitespace

Description

matches any character which is not whitespace.

tab

Description

matches a tab character.

carriageReturn

alias: cr

Description

matches a carriage return.

lineFeed

alias: lf

Description

matches a line feed.

verticalTab

alias: vt

Description

matches a vertical tab.

formFeed

alias: ff

Description

matches a form feed.

backspace

alias: bs

Description

matches a backspace.

Predefined sequences

all

Description

matches any character.

exceptNewline

alias: allExceptNewline

Description

matches all characters except newline.

newline

alias: nl, br

Description

matches sequence of newline. It will match like CRLF.

real

alias: float, realNumber, floatNumber, realNumberWithSign, floatNumberWithSign

Description

matches sequence of float numbers. Sign is considerated.

realWithoutSign

alias: floatWithoutSign, realNumberWithoutSign, floatNumberWithoutSign

Description

matches sequence of flaot numbers without sign.

API

Rei.execWithName

Parameter

Name Description

aString

string to match

Return

matched and captured result

Description

matches the given string to this pattern.

Example Code

var result = Re.i([
  {
    "capture": {
       "name": "a",
       "pattern": {
         "oneOrMoreNonGreedy": {
           "charset": "all"
         }
       }
    }
  },
  ";"
]).execWithName("abc;");
console.log(result.a);  // output "abc"

Rei.matcher

Parameter

Name Description

aString

string to match

Return

a created matcher

Description

creates matcher for the given string.

Example Code

var matcher = Re.i([
  {
    "capture": {
       "name": "a",
       "pattern": {
         "oneOrMoreNonGreedy": {
           "charset": "all"
         }
       }
    }
  },
  ";"
]).matcher("abc;");

Rei.find

Return

matched result

Description

finds the next sequence of the pattern.
The return value is the same to the return value of RegExp.exec().

Example Code

var json = [
  {
    "capture": {
       "name": "a",
       "pattern": {
         "oneOrMoreNonGreedy": {
           "charset": "all"
         }
       }
    }
  },
  ";"
];
var matcher = Re.i(json).matches("@@@@@abc;@@@@@def;");
var result = matcher.find();
console.log(result.a);  // output abc
result = matcher.find();
console.log(result.a);  // output def

Rei.lookingAt

Return

matched result

Description

matches the next sequence of the pattern from the index of last match.
If the match is failed, the index of last match is unchanged.
The return value is the same to the return value of RegExp.exec().

Example Code

var json = [
  {
    "capture": {
       "name": "a",
       "pattern": {
         "oneOrMoreNonGreedy": {
           "charset": "all"
         }
       }
    }
  },
  ";"
];
var matcher = Re.i(json).matches("abc;@@@@@def;");
var result = matcher.lookingAt();
console.log(result.a);  // output abc
result = matcher.lookingAt();
console.log(result);  // output null

Rei.matches

Return

matched result

Description

matches the next sequence of the pattern from the index of last match to the end of sequence.
If the match is failed, the index of last match is unchanged.
The return value is the same to the return value of RegExp.exec().

Example Code

var json = [
  {
    "capture": {
       "name": "a",
       "pattern": {
         "oneOrMoreNonGreedy": {
           "charset": "all"
         }
       }
    }
  },
  ";"
];
var matcher = Re.i(json).matches("abc;");
var result = matcher.lookingAt();
console.log(result.a);  // output abc
matcher = Re.i(json).matches("@@@@abc;");
result = matcher.matches();
console.log(result);  // output null

Rei.usePattern

Parameter

Name Description

regexOrJson

regular expression or Morilib Rei formed JavaScript object

Return

this instance

Description

change matching pattern to the given pattern.
The index of last match is not changed.

Example Code

var json1 = [
  {
    "capture": {
       "name": "a",
       "pattern": {
         "oneOrMoreNonGreedy": {
           "charset": "all"
         }
       }
    }
  },
  ";"
];
var json2 = {
  "capture": {
     "name": "a",
     "pattern": {
       "oneOrMoreNonGreedy": {
         "charset": "digit"
       }
     }
  }
};
var matcher = Re.i(json1).matches("@@@@@abc@@@@@a01a;");
var result = matcher.find();
console.log(result.a);  // output abc
matcher.usePattern(json2);
result = matcher.matches();
console.log(result);  // output 01