S-Lang Library Intrinsic Function Reference (v2.3.0): Functions that Operate on Strings

Synopsis: Count the number of occurrences of a character in a string
Usage: UInt_Type count_char_occurrences (str, ch)
Description: This function returns the number of times the specified character ch occurs in the string str.
Notes: If UTF-8 mode is in effect, then the character may correspond to more than one byte. In such a case, the function returns the number of such byte-sequences in the string. To count actual bytes, use the count_byte_occurrences function.
See Also: count_byte_occurrences

5.2 create_delimited_string

Synopsis

Concatenate strings using a delimiter

Usage

String_Type create_delimited_string (delim, s_1, s_2, ..., s_n, n)


    String_Type delim, s_1, ..., s_n
    Int_Type n

Description

create_delimited_string performs a concatenation operation on the n strings s_1, ...,s_n, using the string delim as a delimiter. The resulting string is equivalent to one obtained via


      s_1 + delim + s_2 + delim + ... + s_n

Example


    create_delimited_string ("/", "user", "local", "bin", 3);

will produce "usr/local/bin".

Notes

New code should use the strjoin function, which performs a similar task.

5.3 extract_element

Synopsis

Extract the nth element of a string with delimiters

Usage

String_Type extract_element (String_Type list, Int_Type nth, Int_Type delim)

Description

The extract_element function may be used to extract the nth substring of a string delimited by the character given by the delim parameter. If the string contains fewer than the requested substring, the function will return NULL. Substring elements are numbered from 0.

Example

The expression


     extract_element ("element 0, element 1, element 2", 1, ',')

returns the string " element 1", whereas


     extract_element ("element 0, element 1, element 2", 1, ' ')

returns "0,".

The following function may be used to compute the number of elements in the list:


     define num_elements (list, delim)
     {
        variable nth = 0;
        while (NULL != extract_element (list, nth, delim))
          nth++;
        return nth;
     }

Alternatively, the strchop function may be more useful. In fact, extract_element may be expressed in terms of the function strchop as


    define extract_element (list, nth, delim)
    {
       list = strchop(list, delim, 0);
       if (nth >= length (list))
         return NULL;
       else
         return list[nth];
    }

and the num_elements function used above may be recoded more simply as:


    define num_elements (list, delim)
    {
       return length (strchop (length, delim, 0));
    }

Notes

New code should make use of the List_Type object for lists.

5.4 glob_to_regexp

Synopsis: Convert a globbing expression to a regular expression
Usage: String_Type glob_to_regexp (String_Type g)
Description: This function may be used to convert a so-called globbing expression to a regular expression. A globbing expression is frequently used for matching filenames where '?' represents a single character and '*' represents 0 or more characters.
Notes: The slsh program that is distributed with the S-Lang library includes a function called glob that is a wrapper around glob_to_regexp and listdir. It returns a list of filenames matching a globbing expression.
See Also: string_match, listdir

5.5 is_list_element

Synopsis

Test whether a delimited string contains a specific element

Usage

Int_Type is_list_element (String_Type list, String_Type elem, Int_Type delim)

Description

The is_list_element function may be used to determine whether or not a delimited list of substring, list, contains the element elem. If elem is not an element of list, the function will return zero, otherwise, it returns 1 plus the matching element number.

Example

The expression


     is_list_element ("element 0, element 1, element 2", "0,", ' ');

returns 2 since "0," is element number one of the list (numbered from zero).

5.6 is_substr

Synopsis

Test for a specified substring within a string

Usage

Int_Type is_substr (String_Type a, String_Type b)

Description

This function may be used to determine if a contains the string b. If it does not, the function returns 0; otherwise it returns the position of the first occurrence of b in a expressed in terms of characters, not bytes.

Notes

This function regards the first character of a string to be given by a position value of 1.

The distinction between characters and bytes is significant in UTF-8 mode.

This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.

See Also

substr, string_match, strreplace

5.7 make_printable_string

Synopsis: Format a string suitable for parsing
Usage: String_Type make_printable_string(String_Type str)
Description: This function formats a string in such a way that it may be used as an argument to the eval function. The resulting string is identical to str except that it is enclosed in double quotes and the backslash, newline, control, and double quote characters are expanded.
See Also: eval, str_quote_string

5.8 Sprintf

Synopsis: Format objects into a string (deprecated)
Usage: String_Type Sprintf (String_Type format, ..., Int_Type n)
Description: This function performs a similar task as the sprintf function but requires an additional argument that specifies the number of items to format. For this reason, the sprintf function should be used.
See Also: sprintf, string, sscanf, vmessage

5.9 strbskipchar

Synopsis

Get an index to the previous character in a UTF-8 encoded string

Usage

(p1, wch) = strbskipchar (str, p0 [,skip_combining])

Description

This function moves backward from the 0-based byte-offset p0 in the string str to the previous character in the string. It returns the byte-offset (p1 of the previous character and the decoded character value at that byte-offset.

The optional third argument specifies the handling of combining characters. If it is non-zero, combining characters will be ignored, otherwise a combining character will not be treated differently from other characters. The default is to ignore such characters.

If the byte-offset p0 corresponds to the end of the string (p0=0), then (p0,0) will be returned. Otherwise if the byte-offset specifies a value that lies outside the string, an IndexError exception will be thrown. Finally, if the byte-offset corresponds to an illegally coded character, the character returned will be the negative byte-value at the position.

See Also

strskipchar, strskipbytes

5.10 sprintf

Synopsis

Format objects into a string

Usage

String_Type sprintf (String fmt, ...)

Description

The sprintf function formats a string from a variable number of arguments according to according to the format specification string fmt.

The format string is a C library sprintf style format descriptor. Briefly, the format string may consist of ordinary characters (not including the % character), which are copied into the output string as-is, and conversion specification sequences introduced by the % character. The number of additional arguments passed to the sprintf function must be consistent with the number required by the format string.

The % character in the format string starts a conversion specification that indicates how an object is to be formatted. Usually the percent character is followed immediately by a conversion specification character. However, it may optionally be followed by flag characters, field width characters, and precision modifiers, as described below.

The character immediately following the % character may be one or more of the following flag characters:


    -         Use left-justification
    #         Use alternate form for formatting.
    0         Use 0 padding
    +         Preceed a number by a plus or minus sign.
    (space)   Use a blank instead of a plus sign.

The flag characters (if any) may be followed by an optional field width specification string represented by one or more digit characters. If the size of the formatted object is less than the field width, it will be right-justified in the specified field width, unless the - flag was given, in which case it will be left justified.

If the next character in the control sequence is a period, then it introduces a precision specification sequence. The precision is given by the digit characters following the period. If none are given the precision is taken to be 0. The meaning of the precision specifier depends upon the type of conversion: For integer conversions, it gives the minimum number digits to appear in the output. For e and f floating point conversions, it gives the number of digits to appear after the decimal point. For the g floating point conversion, it gives the maximum number of significant digits to appear. Finally for the s and S conversions it specifies the maximum number of characters to be copied to the output string.

The next character in the sequence may be a modifier that controls the size of object to be formatted. It may consist of the following characters:


     h    This character is ignored in the current implementation.
     l    The integer is be formatted as a long integer, or a
          character as a wide character.

Finally the conversion specification sequence ends with the conversion specification character that describes how the object is to be formatted:


     s    as a string
     f    as a floating point number
     e    as a float using exponential form, e.g., 2.345e08
     g    format as e or f, depending upon its value
     c    as a character
     b    as a byte
     %    a literal percent character
     d    as a signed decimal integer
     u    as an unsigned decimal integer
     o    as an octal integer
     X,x  as hexadecimal
     B    as a binary integer
     S    convert object to a string and format accordingly

The S conversion specifier is a S-Lang extension which will cause the corresponding object to be converted to a string using the string function, and then converted as s. formatted as string. In fact, sprintf("%S",x) is equivalent to sprintf("%s",string(x)).

Example


    sprintf("%s","hello")               ===> "hello"
    sprintf("%s %s","hello", "world")   ===> "hello world"
    sprintf("Agent %.3d",7)             ===> "Agent 007"
    sprintf("%S",PI)                    ===> "3.141592653589793"
    sprintf("%g",PI)                    ===> "3.14159"
    sprintf("%.2g",PI)                  ===> "3.1"
    sprintf("%.2e",PI)                  ===> "3.14e+00"
    sprintf("%.2f",PI)                  ===> "3.14"
    sprintf("|% 8.2f|",PI)              ===> "|    3.14|"
    sprintf("|%-8.2f|",PI)              ===> "|3.14    |"
    sprintf("|%+8.2f|",PI)              ===> "|   +3.14|"
    sprintf("|%8B|", 21)                ===> "|   10101|"
    sprintf("|%.8B|", 21)               ===> "|00010101|"
    sprintf("|%#.8B|", 21)              ===> "|0b00010101|"
    sprintf("%S",{1,2,3})               ===> "List_Type with 3 elements"
    sprintf("%S",1+2i)                  ===> "(1 + 2i)"

Notes

The set_float_format function controls the format for the S conversion of floating point numbers.

5.11 sscanf

Synopsis

Parse a formatted string

Usage

Int_Type sscanf (s, fmt, r1, ... rN)


    String_Type s, fmt;
    Ref_Type r1, ..., rN

Description

The sscanf function parses the string s according to the format fmt and sets the variables whose references are given by r1, ..., rN. The function returns the number of references assigned, or throws an exception upon error.

The format string fmt consists of ordinary characters and conversion specifiers. A conversion specifier begins with the special character % and is described more fully below. A white space character in the format string matches any amount of whitespace in the input string. Parsing of the format string stops whenever a match fails.

The % character is used to denote a conversion specifier whose general form is given by %[*][width][type]format where the brackets indicate optional items. If * is present, then the conversion will be performed but no assignment to a reference will be made. The width specifier specifies the maximum field width to use for the conversion. The type modifier is used to indicate the size of the object, e.g., a short integer, as follows.

If type is given as the character h, then if the format conversion is for an integer (dioux), the object assigned will be a short integer. If type is l, then the conversion will be to a long integer for integer conversions, or to a double precision floating point number for floating point conversions.

The format specifier is a character that specifies the conversion:


       %     Matches a literal percent character.  No assignment is
             performed.
       d     Matches a signed decimal integer.
       D     Matches a long decimal integer (equiv to `ld')
       u     Matches an unsigned decimal integer
       U     Matches an unsigned long decimal integer (equiv to `lu')
       i     Matches either a hexadecimal integer, decimal integer, or
             octal integer.
       I     Equivalent to `li'.
       x     Matches a hexadecimal integer.
       X     Matches a long hexadecimal integer (same as `lx').
       e,f,g Matches a decimal floating point number (Float_Type).
       E,F,G Matches a double precision floating point number, same as `lf'.
       s     Matches a string of non-whitespace characters (String_Type).
       c     Matches one character.  If width is given, width
             characters are matched.
       n     Assigns the number of characters scanned so far.
       [...] Matches zero or more characters from the set of characters
             enclosed by the square brackets.  If '^' is given as the
             first character, then the complement set is matched.

Example

Suppose that s is "Coffee: (3,4,12.4)". Then


    n = sscanf (s, "%[a-zA-Z]: (%d,%d,%lf)", &item, &x, &y, &z);

will set n to 4, item to "Coffee", x to 3, y to 4, and z to the double precision number 12.4. However,


    n = sscanf (s, "%s: (%d,%d,%lf)", &item, &x, &y, &z);

will set n to 1, item to "Coffee:" and the remaining variables will not be assigned.

5.12 strbytelen

Synopsis: Get the number of bytes in a string
Usage: Int_Type strbytelen (String_Type s)
Description: This function returns the number of bytes in a string. In UTF-8 mode, this value is generally different from the number of characters in a string. For the latter information, the strlen or strcharlen functions should be used.
Notes: This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.
See Also: strlen, strcharlen, length

5.13 strbytesub

Synopsis: Replace a byte with another in a string.
Usage: String_Type strsub (String_Type s, Int_Type pos, UChar_Type b)
Description: The strbytesub function may be be used to substitute the byte b for the byte at byte position pos of the string s. The resulting string is returned.
Notes: The first byte in the string s is specified by pos equal to 1. This function uses byte semantics, not character semantics.
See Also: strsub, is_substr, strreplace, strbytelen

5.14 strcat

Synopsis

Concatenate strings

Usage

String_Type strcat (String_Type a_1, ..., String_Type a_N)

Description

The strcat function concatenates its N string arguments a_1, ... a_N together and returns the result.

Example


    strcat ("Hello", " ", "World");

produces the string "Hello World".

Notes

This function is equivalent to the binary operation a_1+...+a_N. However, strcat is much faster making it the preferred method to concatenate strings.

See Also

sprintf, strjoin

5.15 strcharlen

Synopsis: Get the number of characters in a string including combining characters
Usage: Int_Type strcharlen (String_Type s)
Description: The strcharlen function returns the number of characters in a string. If the string contains combining characters, then they are also counted. Use the strlen function to obtain the character count ignoring combining characters.
Notes: This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.
See Also: strlen, strbytelen

5.16 strchop

Synopsis

Chop or split a string into substrings.

Usage

String_Type[] strchop (String_Type str, Int_Type delim, Int_Type quote)

Description

The strchop function may be used to split-up a string str that consists of substrings delimited by the character specified by delim. If the integer quote is non-zero, it will be taken as a quote character for the delimiter. The function returns the substrings as an array.

Example

The following function illustrates how to sort a comma separated list of strings:


     define sort_string_list (a)
     {
        variable i, b, c;
        b = strchop (a, ',', 0);

        i = array_sort (b);
        b = b[i];   % rearrange

        % Convert array back into comma separated form
        return strjoin (b, ",");
     }

See Also

strchopr, strjoin, strtok

5.17 strchopr

Synopsis: Chop or split a string into substrings.
Usage: String_Type[] strchopr (String_Type str, String_Type delim, String_Type quote)
Description: This routine performs exactly the same function as strchop except that it returns the substrings in the reverse order. See the documentation for strchop for more information.
See Also: strchop, strtok, strjoin

5.18 strcmp

Synopsis

Compare two strings

Usage

Int_Type strcmp (String_Type a, String_Type b)

Description

The strcmp function may be used to perform a case-sensitive string comparison, in the lexicographic sense, on strings a and b. It returns 0 if the strings are identical, a negative integer if a is less than b, or a positive integer if a is greater than b.

Example

The strup function may be used to perform a case-insensitive string comparison:


    define case_insensitive_strcmp (a, b)
    {
      return strcmp (strup(a), strup(b));
    }

Notes

One may also use one of the binary comparison operators, e.g., a > b.

This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.

See Also

strup, strncmp

5.19 strcompress

Synopsis

Remove excess whitespace characters from a string

Usage

String_Type strcompress (String_Type s, String_Type white)

Description

The strcompress function compresses the string s by replacing a sequence of one or more characters from the set white by the first character of white. In addition, it also removes all leading and trailing characters from s that are part of white.

Example

The expression


    strcompress (",;apple,,cherry;,banana", ",;");

returns the string "apple,cherry,banana".

Notes

This function has been vectorized in the sense that if an array of strings is passed as the first argument then a corresponding array of strings will be returned. Array values are not supported for the remaining arguments.

See Also

strtrim, strtrans, str_delete_chars

5.20 string_match

Synopsis

Match a string against a regular expression

Usage

Int_Type string_match(String_Type str, String_Type pat [,Int_Type pos])

Description

The string_match function returns zero if str does not match the regular expression specified by pat. This function performs the match starting at the first byte of the string. The optional pos argument may be used to specify a different byte offset (numbered from 1). This function returns the position in bytes (numbered from 1) of the start of the match in str. The exact substring matched may be found using string_match_nth.

Notes

Positions in the string are specified using byte-offsets not character offsets. The value returned by this function is measured from the beginning of the string str.

The function is not yet UTF-8 aware. If possible, consider using the pcre module for better, more sophisticated regular expressions.

The pos argument was made optional in version 2.2.3.

5.21 string_match_nth

Synopsis

Get the result of the last call to string_match

Usage

(Int_Type pos, Int_Type len) = string_match_nth(Int_Type nth)

Description

The string_match_nth function returns two integers describing the result of the last call to string_match. It returns both the zero-based byte-position of the nth submatch and the length of the match.

By convention, nth equal to zero means the entire match. Otherwise, nth must be an integer with a value 1 through 9, and refers to the set of characters matched by the nth regular expression enclosed by the pairs \(, \).

Example

Consider:


     variable matched, pos, len;
     matched = string_match("hello world", "\([a-z]+\) \([a-z]+\)"R, 1);
     if (matched)
       (pos, len) = string_match_nth(2);

This will set matched to 1 since a match will be found at the first byte position, pos to 6 since w is offset 6 bytes from the beginning of the string, and len to 5 since "world" is 5 bytes long.

Notes

The position offset is not affected by the value of the offset parameter to the string_match function. For example, if the value of the last parameter to the string_match function had been 3, pos would still have been set to 6.

The string_matches function may be used as an alternative to string_match_nth.

See Also

string_match, string_matches

5.22 string_matches

Synopsis

Match a string against a regular expression and return the matches

Usage

String_Type[] string_matches(String_Type str, String_Type pat [,Int_Type pos])

Description

The string_matches function combines the functionality of string_match and string_match_nth. Like string_match, it matches the string str against the regular expression pat. If the string does not match the pattern the function will return NULL. Otherwise, the function will return an array of strings whose ith element is the string that corresponds to the return value of the string_match_nth function.

Example


    strs = string_matches ("p0.5keV_27deg.dat",
                           "p\([0-9.]+\)keV_\([0-9.]+\)deg\.dat"R, 1);
    % ==> strs[0] = "p0.5keV_27deg.dat"
    %     strs[1] = "0.5"
    %     strs[2] = "27"

    strs = string_matches ("q0.5keV_27deg.dat",
                           "p\([0-9.]+\)keV_\([0-9.]+\)deg\.dat"R);
    % ==> strs = NULL

Notes

The function is not yet UTF-8 aware. If possible, consider using the pcre module for better, more sophisticated regular expressions.

The pos argument was made optional in version 2.2.3.

5.23 strjoin

Synopsis

Concatenate elements of a string array

Usage

String_Type strjoin (Array_Type a [, String_Type delim])

Description

The strjoin function operates on an array of strings by joining successive elements together separated with the optional delimiter delim. If delim is not specified, then empty string "" will be used resulting in a concatenation of the elements.

Example

Suppose that


      days = ["Sun","Mon","Tue","Wed","Thu","Fri","Sat","Sun"];

Then strjoin (days,"+") will produce "Sun+Mon+Tue+Wed+Thu+Fri+Sat+Sun". Similarly, strjoin (["","",""], "X") will produce "XX".

See Also

strchop, strcat

5.24 strlen

Synopsis

Compute the length of a string

Usage

Int_Type strlen (String_Type a)

Description

The strlen function may be used to compute the character length of a string ignoring the presence of combining characters. The strcharlen function may be used to count combining characters as distinct characters. For byte-semantics, use the strbytelen function.

Example

After execution of


   variable len = strlen ("hello");

len will have a value of 5.

Notes

This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.

5.25 strlow

Synopsis

Convert a string to lowercase

Usage

String_Type strlow (String_Type s)

Description

The strlow function takes a string s and returns another string identical to s except that all upper case characters that are contained in s are converted converted to lower case.

Example

The function


    define Strcmp (a, b)
    {
      return strcmp (strlow (a), strlow (b));
    }

performs a case-insensitive comparison operation of two strings by converting them to lower case first.

Notes

This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of strings will be returned.

5.26 strnbytecmp

Synopsis: Compare the first n bytes of two strings
Usage: Int_Type strnbytecmp (String_Type a, String_Type b, Int_Type n)
Description: This function compares the first n bytes of the strings a and b. See the documentation for strcmp for information about the return value.
Notes: This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.
See Also: strncmp, strncharcmp, strcmp

5.27 strncharcmp

Synopsis: Compare the first n characters of two strings
Usage: Int_Type strncharcmp (String_Type a, String_Type b, Int_Type n)
Description: This function compares the first n characters of the strings a and b counting combining characters as distinct characters. See the documentation for strcmp for information about the return value.
Notes: This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.
See Also: strncmp, strnbytecmp, strcmp

5.28 strncmp

Synopsis

Compare the first few characters of two strings

Usage

Int_Type strncmp (String_Type a, String_Type b, Int_Type n)

Description

This function behaves like strcmp except that it compares only the first n characters in the strings a and b. See the documentation for strcmp for information about the return value.

In counting characters, combining characters are not counted, although they are used in the comparison. Use the strncharcmp function if you want combining characters to be included in the character count. The strnbytecmp function should be used to compare bytes.

Example

The expression


     strncmp ("apple", "appliance", 3);

will return zero since the first three characters match.

Notes

This function uses character semantics.

5.29 strreplace

Synopsis

Replace one or more substrings

Usage

(new,n) = strreplace(a, b, c, max_n)

Usage

new = strreplace(a, b, c)

Description

The strreplace function may be used to replace one or more occurrences of b in a with c. This function supports two calling interfaces.

The first form may be used to replace a specified number of substrings. If max_n is positive, then the first max_n occurrences of b in a will be replaced. Otherwise, if max_n is negative, then the last abs(max_n) occurrences will be replaced. The function returns the resulting string and an integer indicating how many replacements were made.

The second calling form may be used to replace all occurrences of b in a with c. In this case, only the resulting string will be returned.

Example

The following function illustrates how strreplace may be used to remove all occurrences of a specified substring:


     define delete_substrings (a, b)
     {
        return strreplace (a, b, "");
     }

5.30 strskipbytes

Synopsis

Skip a range of bytes in a byte string

Usage

Int_Type strskipbytes (str, range [n0 [,nmax]])


   String_Type s;
   String_Type range;
   Int_Type n0, nmax;

Description

This function skips over a range of bytes in a string str. The byte range to be skipped is specified by the range parameter. Optional start (n0) and stop (nmax) (0-based) parameters may be used to specifiy the part of the input string to be processed. The function returns a 0-based offset from the beginning of the string where processing stopped.

See the documentation for the strtrans function for the format of the range parameter.

See Also

strskipchar, strbskipchar, strtrans

5.31 strskipchar

Synopsis

Get an index to the next character in a UTF-8 encoded string

Usage

(p1, wch) = strskipchar (str, p0 [,skip_combining])

Description

This function decodes the character at the 0-based byte-offset p0 in the string str. It returns the byte-offset (p1 of the next character in the string and the decoded character at byte-offset p0.

If the byte-offset p0 corresponds to the end of the string, then (p0,0) will be returned. Otherwise if the byte-offset specifies a value that lies outside the string, an IndexError exception will be thrown. Finally, if the byte-offset corresponds to an illegally coded character, the character returned will be the negative byte-value at the position.

Example

The following is an example of a function that skips alphanumeric characters and returns the new byte-offset.


    private define skip_word_chars (line, p)
    {
       variable p1 = p, ch;
       do
         {
            p = p1;
            (p1, ch) = strskipchar (line, p, 1);
          }
       while (isalnum(ch));
       return p;
    }

Notes

In non-UTF-8 mode (_slang_utf8_ok=0), this function is equivalent to:


     define strskipchar (s, p)
     {
        if ((p < 0) || (p > strlen(s)))
          throw IndexError;
        if (p == strlen(s))
          return (p, s[p])
        return (p+1, s[p]);
     }

It is important to understand that the above code relies upon byte-semantics, which are invalid for multi-byte characters.

See Also

strbskipchar, strskipbytes

5.32 strsub

Synopsis

Replace a character with another in a string.

Usage

String_Type strsub (String_Type s, Int_Type pos, Int_Type ch)

Description

The strsub function may be used to substitute the character ch for the character at character position pos of the string s. The resulting string is returned.

Example


    define replace_spaces_with_comma (s)
    {
      variable n;
      while (n = is_substr (s, " "), n) s = strsub (s, n, ',');
      return s;
    }

For uses such as this, the strtrans function is a better choice.

Notes

The first character in the string s is specified by pos equal to 1. This function uses character semantics, not byte semantics.

See Also

is_substr, strreplace, strlen

5.33 strtok

Synopsis

Extract tokens from a string

Usage

String_Type[] strtok (String_Type str [,String_Type white])

Description

strtok breaks the string str into a series of tokens and returns them as an array of strings. If the second parameter white is present, then it specifies the set of characters that are to be regarded as whitespace when extracting the tokens, and may consist of the whitespace characters or a range of such characters. If the first character of white is '^', then the whitespace characters consist of all characters except those in white. For example, if white is " \t\n,;.", then those characters specify the whitespace characters. However, if white is given by "^a-zA-Z0-9_", then any character is a whitespace character except those in the ranges a-z, A-Z, 0-9, and the underscore character. To specify the hyphen character as a whitespace character, then it should be the first character of the whitespace string. In addition to ranges, the whitespace specifier may also include character classes:


    \w matches a unicode "word" character, taken to be alphanumeric.
    \a alphabetic character, excluding digits
    \s matches whitespace
    \l matches lowercase
    \u matches uppercase
    \d matches a digit
    \\ matches a backslash
    \^ matches a ^ character

If the second parameter is not present, then it defaults to "\s".

Example

The following example may be used to count the words in a text file:


    define count_words (file)
    {
       variable fp, line, count;

       fp = fopen (file, "r");
       if (fp == NULL) return -1;

       count = 0;
       while (-1 != fgets (&line, fp))
         {
           line = strtok (line, "^\\a");
           count += length (line);
         }
       () = fclose (fp);
       return count;
    }

Here a word was assumed to consist only of alphabetic characters.

See Also

strchop, strcompress, strjoin

5.34 strtrans

Synopsis

Replace characters in a string

Usage

String_Type strtrans (str, old_set, new_set)


   String_Type str, old_set, new_set;

Description

The strtrans function may be used to replace all the characters from the set old_set with the corresponding characters from new_set in the string str. If new_set is empty, then the characters in old_set will be removed from str.

If new_set is not empty, then old_set and new_set must be commensurate. Each set may consist of character ranges such as A-Z and character classes:


    \, matches a punctuation character
    \7 matches any 7bit ascii character
    \\ matches a backslash
    \^ matches the ^ character
    \a matches an alphabetic character, excluding digits
    \c matches a control character
    \d matches a digit
    \g matches a graphic character
    \l matches lowercase
    \p matches a printable character
    \s matches whitespace
    \u matches uppercase
    \w matches a unicode "word" character, taken to be alphanumeric.
    \x matches hex digit (a-fA-F0-9)

If the first character of a set is ^ then the set is taken to be the complement set.

Example


    str = strtrans (str, "\\u", "\\l");   % lower-case str
    str = strtrans (str, "^0-9", " ");    % Replace anything but 0-9 by space
    str = strtrans (str, "\\^0-9", " ");  % Replace '^' and 0-9 by a space

Notes

See Also

strreplace, strtrim, strup, strlow

5.35 strtrim

Synopsis

Remove whitespace from the ends of a string

Usage

String_Type strtrim (String_Type s [,String_Type w])

Description

The strtrim function removes all leading and trailing whitespace characters from the string s and returns the result. The optional second parameter specifies the set of whitespace characters. If the argument is not present, then the set defaults to "\s". The whitespace specification may consist of character ranges such as A-Z and character classes:


    \w matches a unicode "word" character, taken to be alphanumeric.
    \a alphabetic character, excluding digits
    \s matches whitespace
    \l matches lowercase
    \u matches uppercase
    \d matches a digit
    \\ matches a backslash
    \^ matches a ^ character

If the first character of a set is ^ then the set is taken to be the complement set.

Notes

This function has been vectorized in the sense that if the first argument is an array of strings, then a corresponding array of strings will be returned. An array value for the optional whitespace argument is not supported.

See Also

strtrim_beg, strtrim_end, strcompress

5.36 strtrim_beg

Synopsis: Remove leading whitespace from a string
Usage: String_Type strtrim_beg (String_Type s [,String_Type w])
Description: The strtrim_beg function removes all leading whitespace characters from the string s and returns the result. The optional second parameter specifies the set of whitespace characters. See the documentation for the strtrim function form more information about the whitespace parameter.
Notes: This function has been vectorized in the sense that if the first argument is an array of strings, then a corresponding array of strings will be returned. An array value for the optional whitespace argument is not supported.
See Also: strtrim, strtrim_end, strcompress

5.37 strtrim_end

Synopsis: Remove trailing whitespace from a string
Usage: String_Type strtrim_end (String_Type s [,String_Type w])
Description: The strtrim_end function removes all trailing whitespace characters from the string s and returns the result. The optional second parameter specifies the set of whitespace characters. See the documentation for the strtrim function form more information about the whitespace parameter.
Notes: This function has been vectorized in the sense that if the first argument is an array of strings, then a corresponding array of strings will be returned. An array value for the optional whitespace argument is not supported.
See Also: strtrim, strtrim_beg, strcompress

5.38 strup

Synopsis

Convert a string to uppercase

Usage

String_Type strup (String_Type s)

Description

The strup function takes a string s and returns another string identical to s except that all lower case characters that contained in s are converted to upper case.

Example

The function


    define Strcmp (a, b)
    {
      return strcmp (strup (a), strup (b));
    }

performs a case-insensitive comparison operation of two strings by converting them to upper case first.

Notes

This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of strings will be returned.

5.39 str_delete_chars

Synopsis

Delete characters from a string

Usage

String_Type str_delete_chars (String_Type str [, String_Type del_set])

Description

This function may be used to delete the set of characters specified by the optional argument del_set from the string str. If del_set is not given, "\s" will be used. The modified string is returned.

The set of characters to be deleted may include ranges such as A-Z and characters classes:


    \w matches a unicode "word" character, taken to be alphanumeric.
    \a alphabetic character, excluding digits
    \s matches whitespace
    \l matches lowercase
    \u matches uppercase
    \d matches a digit
    \\ matches a backslash
    \^ matches a ^ character

If the first character of del_set is ^, then the set is taken to be the complement of the remaining string.

Example


    str = str_delete_chars (str, "^A-Za-z");

will remove all characters except A-Z and a-z from str. Similarly,


    str = str_delete_chars (str, "^\\a");

will remove all but the alphabetic characters.

Notes

See Also

strtrans, strreplace, strcompress

5.40 str_quote_string

Synopsis

Escape characters in a string.

Usage

String_Type str_quote_string(String_Type str, String_Type qlis, Int_Type quote)

Description

The str_quote_string returns a string identical to str except that all characters contained in the string qlis are escaped with the quote character, including the quote character itself. This function is useful for making a string that can be used in a regular expression.

Example

Execution of the statements


   node = "Is it [the coat] really worth $100?";
   tag = str_quote_string (node, "\\^$[]*.+?", '\\');

will result in tag having the value:


    Is it \[the coat\] really worth \$100\?

5.41 str_replace

Synopsis: Replace a substring of a string (deprecated)
Usage: Int_Type str_replace (String_Type a, String_Type b, String_Type c)
Description: The str_replace function replaces the first occurrence of b in a with c and returns an integer that indicates whether a replacement was made. If b does not occur in a, zero is returned. However, if b occurs in a, a non-zero integer is returned as well as the new string resulting from the replacement.
Notes: This function has been superceded by strreplace. It should no longer be used.
See Also: strreplace

5.42 str_uncomment_string

Synopsis

Remove comments from a string

Usage

String_Type str_uncomment_string(String_Type s, String_Type beg, String_Type end)

Description

This function may be used to remove simple forms of comments from a string s. The parameters, beg and end, are strings of equal length whose corresponding characters specify the begin and end comment characters, respectively. It returns the uncommented string.

Example

The expression


     str_uncomment_string ("Hello (testing) 'example' World", "'(", "')")

returns the string "Hello World".

Notes

This routine does not handle multi-character comment delimiters and it assumes that comments are not nested.

5.43 substr

Synopsis

Extract a substring from a string

Usage

String_Type substr (String_Type s, Int_Type n, Int_Type len)

Description

The substr function returns a substring with character length len of the string s beginning at the character position n. If len is -1, the entire length of the string s will be used for len. The first character of s is given by n equal to 1.

Example


     substr ("To be or not to be", 7, 5);

returns "or no"

Notes

This function assumes character semantics and not byte semantics. Use the substrbytes function to extract bytes from a string.

See Also

is_substr, substrbytes, strlen

5.44 substrbytes

Synopsis

Extract a byte sequence from a string

Usage

String_Type substrbytes (String_Type s, Int_Type n, Int_Type len)

Description

The substrbytes function returns a substring with byte length len of the string s beginning at the byte position n, counting from 1. If len is -1, the entire byte-length of the string s will be used for len. The first byte of s is given by n equal to 1.

Example


     substrbytes ("To be or not to be", 7, 5);

returns "or no"

Notes

In many cases it is more convenient to use array indexing rather than the substrbytes function. In fact substrbytes(s,i+1,-1) is equivalent to s[[i:]].

The function substr may be used if character semantics are desired.

See Also

substr, strbytelen

Next Previous Contents