This module provides an interface to the Oniguruma regular expression
library. Use require("onig")
to load it.
Create an instance of a regular-expression
Onig_Type onig_new (pattern [,options [,encoding [,syntax]]])
String_Type pattern;
Int_Type options;
String_Type encoding;
String_Type syntax;
The onig_new
function compiles the specified regular
expression (pattern
) and returns the result. The other
parameters are optional and may be used to specify compilation
options, the character-set encoding, and the regular expression
syntax. Upon success, this function returns an Onig_Type
object representing the compiled pattern. If compilation fails, a
OnigError
exception will be thrown.
The options
parameters is a bit mapped value that may be
formed from the bitwise-or of zero or more of the following constants:
ONIG_OPTION_NONE
ONIG_OPTION_IGNORECASE
ONIG_OPTION_EXTEND
ONIG_OPTION_MULTILINE
ONIG_OPTION_SINGLELINE
ONIG_OPTION_FIND_LONGEST
ONIG_OPTION_FIND_NOT_EMPTY
ONIG_OPTION_NEGATE_SINGLELINE
ONIG_OPTION_DONT_CAPTURE_GROUP
ONIG_OPTION_CAPTURE_GROUP
The character-set encoding may be specified using one of the following strings:
"ascii" "iso_8859_1" "iso_8859_2" "iso_8859_3"
"iso_8859_4" "iso_8859_5" "iso_8859_6" "iso_8859_7"
"iso_8859_8", "iso_8859_9" "iso_8859_10" "iso_8859_11"
"iso_8859_13" "iso_8859_14" "iso_8859_15" "iso_8859_16"
"utf8" "utf16_be" "utf16_le" "utf32_be"
"utf32_le" "euc_jp" "euc_tw" "euc_kr"
"euc_cn" "sjis" "koi8_r" "cp1251"
"big5" "gb18030"
If not specified, "utf8" will be used if the interpreter is UTF-8
mode and "iso_8859_1" if not in UTF-8 mode.
The regular expression syntax of the pattern may be specified by
setting the syntax
parameter to one of the following:
"asis" "posix_basic" "posix_extended" "emacs"
"grep" "gnu_regex" "java" "perl"
"perl_ng" "ruby"
If unspecified, the syntax defaults to "perl"
.
onig_search, onig_nth_match, onig_nth_substr
Search a string using an Onig compiled pattern
Int_Type onig_search (p, str [,start_pos, end_pos] [,option])
Onig_Type p;
String_Type str;
Int_Type start_pos, end_pos;
The onig_search
function applies a pre-compiled pattern p
to a
string str
and returns the result of the match. The optional
third and fourth arguments may be used to constrain the search region
to the specified byte-offsets in the string. The option
parameter is also optional and may be used to control how the search
is to be performed. Its value may be specified as a bitwise-or of zero or
more of the following flags:
ONIG_OPTION_NOTBOL
ONIG_OPTION_NOTEOL
ONIG_OPTION_POSIX_REGION
See the onig library documentation for more information about the meaning
of these flags.
Upon success, this function returns a positive integer equal to 1 plus the number of so-called captured substrings. It will return 0 if the pattern failed to match the string.
onig_new, onig_nth_match, onig_nth_substr
Return the location of the nth match of an onig regular expression
Int_Type[2] onig_nth_match (Onig_Type p, Int_Type nth)
The onig_nth_match
function returns an integer array whose
values specify the locations as byte-offsets to the beginning and end
of the nth
captured substring of the most recent call to
onig_search
with the compiled pattern. A value of nth
equal to 0 represents the substring representing the entire match of
the pattern.
If the nth
match did not take place, the function returns NULL
.
After the execution of:
str = "Error in file foo.c, line 127, column 10";
pattern = "file ([^,]+), line ([0-9]+)";
p = onig_new (pattern);
if (onig_search (p, str))
{
match_pos = onig_nth_match (p, 0);
file_pos = onig_nth_match (p, 1);
line_pos = onig_nth_match (p, 2);
}
match_pos
will be set to [9,29]
, file_pos
to [14,19,]
and line_pos
to [26,29]
. These integer arrays may be used to
extract the substrings matched by the pattern, e.g.,
file = substr (str, file_pos[0]+1, file_pos[1]-file_pos[0]);
line = str[[line_pos[0]:line_pos[1]-1]];
Alternatively, the function onig_nth_substr
may be used to get the
matched substrings:
file = onig_nth_substr (p, str, 0);
onig_new, onig_search, onig_nth_substr
Extract the nth matched substring from an Onig regular expression search
String_Type onig_nth_substr (Onig_Type p, String_Type str, Int_Type nth)
This function may be used to extract the nth
captured substring
resulting from the most recent use of the compiled pattern p
by
the onig_search
function. Unlike onig_nth_match
, this
function returns the specified captured substring itself and not the
position of the substring. For this reason, the subject string of the
pattern is a required argument.
onig_new, onig_search, onig_nth_match