Next Previous Contents

8. Oniguruma Regular Expression Module

This module provides an interface to the Oniguruma regular expression library. Use require("onig") to load it.

8.1 onig_new

Synopsis

Create an instance of a regular-expression

Usage

Onig_Type onig_new (pattern [,options [,encoding [,syntax]]])

   String_Type pattern;
   Int_Type options;
   String_Type encoding;
   String_Type syntax;

Description

The onig_new function compiles the specified regular expression (pattern) and returns the result. The other parameters are optional and may be used to specify compilation options, the character-set encoding, and the regular expression syntax. Upon success, this function returns an Onig_Type object representing the compiled pattern. If compilation fails, a OnigError exception will be thrown.

The options parameters is a bit mapped value that may be formed from the bitwise-or of zero or more of the following constants:

   ONIG_OPTION_NONE
   ONIG_OPTION_IGNORECASE
   ONIG_OPTION_EXTEND
   ONIG_OPTION_MULTILINE
   ONIG_OPTION_SINGLELINE
   ONIG_OPTION_FIND_LONGEST
   ONIG_OPTION_FIND_NOT_EMPTY
   ONIG_OPTION_NEGATE_SINGLELINE
   ONIG_OPTION_DONT_CAPTURE_GROUP
   ONIG_OPTION_CAPTURE_GROUP

The character-set encoding may be specified using one of the following strings:

  "ascii"         "iso_8859_1"     "iso_8859_2"   "iso_8859_3"
  "iso_8859_4"    "iso_8859_5"     "iso_8859_6"   "iso_8859_7"
  "iso_8859_8",   "iso_8859_9"     "iso_8859_10"  "iso_8859_11"
  "iso_8859_13"   "iso_8859_14"    "iso_8859_15"  "iso_8859_16"
  "utf8"          "utf16_be"       "utf16_le"     "utf32_be"
  "utf32_le"      "euc_jp"         "euc_tw"       "euc_kr"
  "euc_cn"        "sjis"           "koi8_r"       "cp1251"
  "big5"          "gb18030"
If not specified, "utf8" will be used if the interpreter is UTF-8 mode and "iso_8859_1" if not in UTF-8 mode.

The regular expression syntax of the pattern may be specified by setting the syntax parameter to one of the following:

  "asis"    "posix_basic"  "posix_extended"  "emacs"
  "grep"    "gnu_regex"    "java"            "perl"
  "perl_ng" "ruby"
If unspecified, the syntax defaults to "perl".

See Also

onig_search, onig_nth_match, onig_nth_substr

8.2 onig_search

Synopsis

Search a string using an Onig compiled pattern

Usage

Int_Type onig_search (p, str [,start_pos, end_pos] [,option])

   Onig_Type p;
   String_Type str;
   Int_Type start_pos, end_pos;

Description

The onig_search function applies a pre-compiled pattern p to a string str and returns the result of the match. The optional third and fourth arguments may be used to constrain the search region to the specified byte-offsets in the string. The option parameter is also optional and may be used to control how the search is to be performed. Its value may be specified as a bitwise-or of zero or more of the following flags:

   ONIG_OPTION_NOTBOL
   ONIG_OPTION_NOTEOL
   ONIG_OPTION_POSIX_REGION
See the onig library documentation for more information about the meaning of these flags.

Upon success, this function returns a positive integer equal to 1 plus the number of so-called captured substrings. It will return 0 if the pattern failed to match the string.

See Also

onig_new, onig_nth_match, onig_nth_substr

8.3 onig_nth_match

Synopsis

Return the location of the nth match of an onig regular expression

Usage

Int_Type[2] onig_nth_match (Onig_Type p, Int_Type nth)

Description

The onig_nth_match function returns an integer array whose values specify the locations as byte-offsets to the beginning and end of the nth captured substring of the most recent call to onig_search with the compiled pattern. A value of nth equal to 0 represents the substring representing the entire match of the pattern.

If the nth match did not take place, the function returns NULL.

Example

After the execution of:

    str = "Error in file foo.c, line 127, column 10";
    pattern = "file ([^,]+), line ([0-9]+)";
    p = onig_new (pattern);
    if (onig_search (p, str))
      {
         match_pos = onig_nth_match (p, 0);
         file_pos = onig_nth_match (p, 1);
         line_pos = onig_nth_match (p, 2);
      }
match_pos will be set to [9,29], file_pos to [14,19,] and line_pos to [26,29]. These integer arrays may be used to extract the substrings matched by the pattern, e.g.,
     file = substr (str, file_pos[0]+1, file_pos[1]-file_pos[0]);
     line = str[[line_pos[0]:line_pos[1]-1]];
Alternatively, the function onig_nth_substr may be used to get the matched substrings:
     file = onig_nth_substr (p, str, 0);

See Also

onig_new, onig_search, onig_nth_substr

8.4 onig_nth_substr

Synopsis

Extract the nth matched substring from an Onig regular expression search

Usage

String_Type onig_nth_substr (Onig_Type p, String_Type str, Int_Type nth)

Description

This function may be used to extract the nth captured substring resulting from the most recent use of the compiled pattern p by the onig_search function. Unlike onig_nth_match, this function returns the specified captured substring itself and not the position of the substring. For this reason, the subject string of the pattern is a required argument.

See Also

onig_new, onig_search, onig_nth_match


Next Previous Contents