Count the number of occurrences of a character in a string
UInt_Type count_char_occurrences (str, ch)
This function returns the number of times the specified character
ch
occurs in the string str
.
If UTF-8 mode is in effect, then the character may correspond to
more than one byte. In such a case, the function returns the number
of such byte-sequences in the string. To count actual bytes, use
the count_byte_occurrences
function.
Concatenate strings using a delimiter
String_Type create_delimited_string (delim, s_1, s_2, ..., s_n, n)
String_Type delim, s_1, ..., s_n
Int_Type n
create_delimited_string
performs a concatenation operation on
the n
strings s_1
, ...,s_n
, using the string
delim
as a delimiter. The resulting string is equivalent to
one obtained via
s_1 + delim + s_2 + delim + ... + s_n
create_delimited_string ("/", "user", "local", "bin", 3);
will produce "usr/local/bin"
.
New code should use the strjoin
function, which performs a
similar task.
Extract the nth element of a string with delimiters
String_Type extract_element (String_Type list, Int_Type nth, Int_Type delim)
The extract_element
function may be used to extract the
nth
substring of a string delimited by the character given by
the delim
parameter. If the string contains fewer than the
requested substring, the function will return NULL
. Substring
elements are numbered from 0.
The expression
extract_element ("element 0, element 1, element 2", 1, ',')
returns the string " element 1"
, whereas
extract_element ("element 0, element 1, element 2", 1, ' ')
returns "0,"
.
The following function may be used to compute the number of elements in the list:
define num_elements (list, delim)
{
variable nth = 0;
while (NULL != extract_element (list, nth, delim))
nth++;
return nth;
}
Alternatively, the strchop
function may be more useful. In
fact, extract_element
may be expressed in terms of the
function strchop
as
define extract_element (list, nth, delim)
{
list = strchop(list, delim, 0);
if (nth >= length (list))
return NULL;
else
return list[nth];
}
and the num_elements
function used above may be recoded more
simply as:
define num_elements (list, delim)
{
return length (strchop (length, delim, 0));
}
New code should make use of the List_Type
object for lists.
is_list_element, is_substr, strtok, strchop, create_delimited_string
Convert a globbing expression to a regular expression
String_Type glob_to_regexp (String_Type g)
This function may be used to convert a so-called globbing expression to a regular expression. A globbing expression is frequently used for matching filenames where '?' represents a single character and '*' represents 0 or more characters.
The slsh program that is distributed with the S-Lang library
includes a function called glob
that is a wrapper around
glob_to_regexp
and listdir
. It returns a list of
filenames matching a globbing expression.
Test whether a delimited string contains a specific element
Int_Type is_list_element (String_Type list, String_Type elem, Int_Type delim)
The is_list_element
function may be used to determine whether
or not a delimited list of substring, list
, contains the element
elem
. If elem
is not an element of list
, the function
will return zero, otherwise, it returns 1 plus the matching element
number.
The expression
is_list_element ("element 0, element 1, element 2", "0,", ' ');
returns 2
since "0,"
is element number one of the list
(numbered from zero).
Test for a specified substring within a string
Int_Type is_substr (String_Type a, String_Type b)
This function may be used to determine if a
contains the
string b
. If it does not, the function returns 0; otherwise it
returns the position of the first occurrence of b
in a
expressed in terms of characters, not bytes.
This function regards the first character of a string to be given by a position value of 1.
The distinction between characters and bytes is significant in UTF-8 mode.
This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.
Format a string suitable for parsing
String_Type make_printable_string(String_Type str)
This function formats a string in such a way that it may be used as
an argument to the eval
function. The resulting string is
identical to str
except that it is enclosed in double quotes
and the backslash, newline, control, and double quote characters are
expanded.
Format objects into a string (deprecated)
String_Type Sprintf (String_Type format, ..., Int_Type n)
This function performs a similar task as the sprintf
function but requires an additional argument that specifies the
number of items to format. For this reason, the sprintf
function should be used.
Get an index to the previous character in a UTF-8 encoded string
(p1, wch) = strbskipchar (str, p0 [,skip_combining])
This function moves backward from the 0-based byte-offset p0
in the string str
to the previous character in the string.
It returns the byte-offset (p1
of the previous character and
the decoded character value at that byte-offset.
The optional third argument specifies the handling of combining characters. If it is non-zero, combining characters will be ignored, otherwise a combining character will not be treated differently from other characters. The default is to ignore such characters.
If the byte-offset p0
corresponds to the end of the string
(p0=0
), then (p0,0)
will be returned. Otherwise if
the byte-offset specifies a value that lies outside the string, an
IndexError
exception will be thrown. Finally, if the
byte-offset corresponds to an illegally coded character, the
character returned will be the negative byte-value at the position.
Format objects into a string
String_Type sprintf (String fmt, ...)
The sprintf
function formats a string from a variable number
of arguments according to according to the format specification
string fmt
.
The format string is a C library sprintf
style format
descriptor. Briefly, the format string may consist of ordinary
characters (not including the %
character), which are copied
into the output string as-is, and conversion specification sequences
introduced by the %
character. The number of additional
arguments passed to the sprintf
function must be consistent
with the number required by the format string.
The %
character in the format string starts a conversion
specification that indicates how an object is to be formatted.
Usually the percent character is followed immediately by a
conversion specification character. However, it may optionally be
followed by flag characters, field width characters, and precision
modifiers, as described below.
The character immediately following the %
character may be
one or more of the following flag characters:
- Use left-justification
# Use alternate form for formatting.
0 Use 0 padding
+ Preceed a number by a plus or minus sign.
(space) Use a blank instead of a plus sign.
The flag characters (if any) may be followed by an optional field
width specification string represented by one or more digit
characters. If the size of the formatted object is less than the
field width, it will be right-justified in the specified field
width, unless the -
flag was given, in which case it will be
left justified.
If the next character in the control sequence is a period, then it
introduces a precision specification sequence. The precision is
given by the digit characters following the period. If none are
given the precision is taken to be 0. The meaning of the precision
specifier depends upon the type of conversion: For integer
conversions, it gives the minimum number digits to appear in the
output. For e
and f
floating point conversions, it
gives the number of digits to appear after the decimal point. For
the g
floating point conversion, it gives the maximum number
of significant digits to appear. Finally for the s
and
S
conversions it specifies the maximum number of characters
to be copied to the output string.
The next character in the sequence may be a modifier that controls the size of object to be formatted. It may consist of the following characters:
h This character is ignored in the current implementation.
l The integer is be formatted as a long integer, or a
character as a wide character.
Finally the conversion specification sequence ends with the conversion specification character that describes how the object is to be formatted:
s as a string
f as a floating point number
e as a float using exponential form, e.g., 2.345e08
g format as e or f, depending upon its value
c as a character
b as a byte
% a literal percent character
d as a signed decimal integer
u as an unsigned decimal integer
o as an octal integer
X,x as hexadecimal
B as a binary integer
S convert object to a string and format accordingly
The S
conversion specifier is a S-Lang extension which will
cause the corresponding object to be converted to a string using the
string
function, and then converted as s
. formatted as
string. In fact, sprintf("%S",x)
is equivalent to
sprintf("%s",string(x))
.
sprintf("%s","hello") ===> "hello"
sprintf("%s %s","hello", "world") ===> "hello world"
sprintf("Agent %.3d",7) ===> "Agent 007"
sprintf("%S",PI) ===> "3.141592653589793"
sprintf("%g",PI) ===> "3.14159"
sprintf("%.2g",PI) ===> "3.1"
sprintf("%.2e",PI) ===> "3.14e+00"
sprintf("%.2f",PI) ===> "3.14"
sprintf("|% 8.2f|",PI) ===> "| 3.14|"
sprintf("|%-8.2f|",PI) ===> "|3.14 |"
sprintf("|%+8.2f|",PI) ===> "| +3.14|"
sprintf("|%8B|", 21) ===> "| 10101|"
sprintf("|%.8B|", 21) ===> "|00010101|"
sprintf("|%#.8B|", 21) ===> "|0b00010101|"
sprintf("%S",{1,2,3}) ===> "List_Type with 3 elements"
sprintf("%S",1+2i) ===> "(1 + 2i)"
The set_float_format
function controls the format for the
S
conversion of floating point numbers.
Parse a formatted string
Int_Type sscanf (s, fmt, r1, ... rN)
String_Type s, fmt;
Ref_Type r1, ..., rN
The sscanf
function parses the string s
according to the
format fmt
and sets the variables whose references are given by
r1
, ..., rN
. The function returns the number of
references assigned, or throws an exception upon error.
The format string fmt
consists of ordinary characters and
conversion specifiers. A conversion specifier begins with the
special character %
and is described more fully below. A white
space character in the format string matches any amount of whitespace
in the input string. Parsing of the format string stops whenever a
match fails.
The %
character is used to denote a conversion specifier whose
general form is given by %[*][width][type]format
where the
brackets indicate optional items. If *
is present, then the
conversion will be performed but no assignment to a reference will be
made. The width
specifier specifies the maximum field width to
use for the conversion. The type
modifier is used to indicate
the size of the object, e.g., a short integer, as follows.
If type is given as the character h
, then if the format
conversion is for an integer (dioux
), the object assigned will
be a short integer. If type is l
, then the conversion
will be to a long integer for integer conversions, or to a double
precision floating point number for floating point conversions.
The format specifier is a character that specifies the conversion:
% Matches a literal percent character. No assignment is
performed.
d Matches a signed decimal integer.
D Matches a long decimal integer (equiv to `ld')
u Matches an unsigned decimal integer
U Matches an unsigned long decimal integer (equiv to `lu')
i Matches either a hexadecimal integer, decimal integer, or
octal integer.
I Equivalent to `li'.
x Matches a hexadecimal integer.
X Matches a long hexadecimal integer (same as `lx').
e,f,g Matches a decimal floating point number (Float_Type).
E,F,G Matches a double precision floating point number, same as `lf'.
s Matches a string of non-whitespace characters (String_Type).
c Matches one character. If width is given, width
characters are matched.
n Assigns the number of characters scanned so far.
[...] Matches zero or more characters from the set of characters
enclosed by the square brackets. If '^' is given as the
first character, then the complement set is matched.
Suppose that s
is "Coffee: (3,4,12.4)"
. Then
n = sscanf (s, "%[a-zA-Z]: (%d,%d,%lf)", &item, &x, &y, &z);
will set n
to 4
, item
to "Coffee"
, x
to 3
,
y
to 4
, and z
to the double precision number
12.4
. However,
n = sscanf (s, "%s: (%d,%d,%lf)", &item, &x, &y, &z);
will set n
to 1
, item
to "Coffee:"
and the
remaining variables will not be assigned.
Get the number of bytes in a string
Int_Type strbytelen (String_Type s)
This function returns the number of bytes in a string. In UTF-8
mode, this value is generally different from the number of
characters in a string. For the latter information, the
strlen
or strcharlen
functions should be used.
This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.
Replace a byte with another in a string.
String_Type strsub (String_Type s, Int_Type pos, UChar_Type b)
The strbytesub
function may be be used to substitute the byte
b
for the byte at byte position pos
of the string
s
. The resulting string is returned.
The first byte in the string s
is specified by pos
equal to 1. This function uses byte semantics, not character
semantics.
Concatenate strings
String_Type strcat (String_Type a_1, ..., String_Type a_N)
The strcat
function concatenates its N string
arguments a_1
, ... a_N
together and returns the result.
strcat ("Hello", " ", "World");
produces the string "Hello World"
.
This function is equivalent to the binary operation a_1+...+a_N
.
However, strcat
is much faster making it the preferred method
to concatenate strings.
Get the number of characters in a string including combining characters
Int_Type strcharlen (String_Type s)
The strcharlen
function returns the number of characters in a
string. If the string contains combining characters, then they are
also counted. Use the strlen
function to obtain the
character count ignoring combining characters.
This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.
Chop or split a string into substrings.
String_Type[] strchop (String_Type str, Int_Type delim, Int_Type quote)
The strchop
function may be used to split-up a string
str
that consists of substrings delimited by the character
specified by delim
. If the integer quote
is non-zero,
it will be taken as a quote character for the delimiter. The
function returns the substrings as an array.
The following function illustrates how to sort a comma separated list of strings:
define sort_string_list (a)
{
variable i, b, c;
b = strchop (a, ',', 0);
i = array_sort (b);
b = b[i]; % rearrange
% Convert array back into comma separated form
return strjoin (b, ",");
}
Chop or split a string into substrings.
String_Type[] strchopr (String_Type str, String_Type delim, String_Type quote)
This routine performs exactly the same function as strchop
except
that it returns the substrings in the reverse order. See the
documentation for strchop
for more information.
Compare two strings
Int_Type strcmp (String_Type a, String_Type b)
The strcmp
function may be used to perform a case-sensitive
string comparison, in the lexicographic sense, on strings a
and b
. It returns 0 if the strings are identical, a negative
integer if a
is less than b
, or a positive integer if
a
is greater than b
.
The strup
function may be used to perform a case-insensitive
string comparison:
define case_insensitive_strcmp (a, b)
{
return strcmp (strup(a), strup(b));
}
One may also use one of the binary comparison operators, e.g.,
a > b
.
This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.
Remove excess whitespace characters from a string
String_Type strcompress (String_Type s, String_Type white)
The strcompress
function compresses the string s
by
replacing a sequence of one or more characters from the set
white
by the first character of white
. In addition, it
also removes all leading and trailing characters from s
that
are part of white
.
The expression
strcompress (",;apple,,cherry;,banana", ",;");
returns the string "apple,cherry,banana"
.
This function has been vectorized in the sense that if an array of strings is passed as the first argument then a corresponding array of strings will be returned. Array values are not supported for the remaining arguments.
Match a string against a regular expression
Int_Type string_match(String_Type str, String_Type pat [,Int_Type pos])
The string_match
function returns zero if str
does not
match the regular expression specified by pat
. This function
performs the match starting at the first byte of the string. The
optional pos
argument may be used to specify a different byte
offset (numbered from 1). This function returns the position in
bytes (numbered from 1) of the start of the match in str
.
The exact substring matched may be found using
string_match_nth
.
Positions in the string are specified using byte-offsets not
character offsets. The value returned by this function is measured
from the beginning of the string str
.
The function is not yet UTF-8 aware. If possible, consider using
the pcre
module for better, more sophisticated regular
expressions.
The pos
argument was made optional in version 2.2.3.
Get the result of the last call to string_match
(Int_Type pos, Int_Type len) = string_match_nth(Int_Type nth)
The string_match_nth
function returns two integers describing
the result of the last call to string_match
. It returns both
the zero-based byte-position of the nth
submatch and
the length of the match.
By convention, nth
equal to zero means the entire match.
Otherwise, nth
must be an integer with a value 1 through 9,
and refers to the set of characters matched by the nth
regular
expression enclosed by the pairs \(, \)
.
Consider:
variable matched, pos, len;
matched = string_match("hello world", "\([a-z]+\) \([a-z]+\)"R, 1);
if (matched)
(pos, len) = string_match_nth(2);
This will set matched
to 1 since a match will be found at the
first byte position, pos
to 6 since w
is offset 6 bytes
from the beginning of the string, and len
to 5 since
"world"
is 5 bytes long.
The position offset is not affected by the value of the offset
parameter to the string_match
function. For example, if the
value of the last parameter to the string_match
function had
been 3, pos
would still have been set to 6.
The string_matches
function may be used as an alternative to
string_match_nth
.
Match a string against a regular expression and return the matches
String_Type[] string_matches(String_Type str, String_Type pat [,Int_Type pos])
The string_matches
function combines the functionality of
string_match
and string_match_nth
. Like
string_match
, it matches the string str
against
the regular expression pat
. If the string does not match the
pattern the function will return NULL
. Otherwise, the function
will return an array of strings whose ith
element is the string that
corresponds to the return value of the string_match_nth
function.
strs = string_matches ("p0.5keV_27deg.dat",
"p\([0-9.]+\)keV_\([0-9.]+\)deg\.dat"R, 1);
% ==> strs[0] = "p0.5keV_27deg.dat"
% strs[1] = "0.5"
% strs[2] = "27"
strs = string_matches ("q0.5keV_27deg.dat",
"p\([0-9.]+\)keV_\([0-9.]+\)deg\.dat"R);
% ==> strs = NULL
The function is not yet UTF-8 aware. If possible, consider using
the pcre
module for better, more sophisticated regular
expressions.
The pos
argument was made optional in version 2.2.3.
Concatenate elements of a string array
String_Type strjoin (Array_Type a [, String_Type delim])
The strjoin
function operates on an array of strings by joining
successive elements together separated with the optional delimiter
delim
. If delim
is not specified, then empty string
""
will be used resulting in a concatenation of the elements.
Suppose that
days = ["Sun","Mon","Tue","Wed","Thu","Fri","Sat","Sun"];
Then strjoin (days,"+")
will produce
"Sun+Mon+Tue+Wed+Thu+Fri+Sat+Sun"
. Similarly,
strjoin (["","",""], "X")
will produce "XX"
.
Compute the length of a string
Int_Type strlen (String_Type a)
The strlen
function may be used to compute the character
length of a string ignoring the presence of combining characters.
The strcharlen
function may be used to count combining
characters as distinct characters. For byte-semantics, use the
strbytelen
function.
After execution of
variable len = strlen ("hello");
len
will have a value of 5
.
This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of integers will be returned.
Convert a string to lowercase
String_Type strlow (String_Type s)
The strlow
function takes a string s
and returns another
string identical to s
except that all upper case characters
that are contained in s
are converted converted to lower case.
The function
define Strcmp (a, b)
{
return strcmp (strlow (a), strlow (b));
}
performs a case-insensitive comparison operation of two strings by
converting them to lower case first.
This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of strings will be returned.
Compare the first n bytes of two strings
Int_Type strnbytecmp (String_Type a, String_Type b, Int_Type n)
This function compares the first n
bytes of the strings
a
and b
. See the documentation for strcmp
for
information about the return value.
This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.
Compare the first n characters of two strings
Int_Type strncharcmp (String_Type a, String_Type b, Int_Type n)
This function compares the first n
characters of the strings
a
and b
counting combining characters as distinct
characters. See the documentation for strcmp
for information
about the return value.
This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.
Compare the first few characters of two strings
Int_Type strncmp (String_Type a, String_Type b, Int_Type n)
This function behaves like strcmp
except that it compares only the
first n
characters in the strings a
and b
.
See the documentation for strcmp
for information about the return
value.
In counting characters, combining characters are not counted,
although they are used in the comparison. Use the
strncharcmp
function if you want combining characters to be
included in the character count. The strnbytecmp
function
should be used to compare bytes.
The expression
strncmp ("apple", "appliance", 3);
will return zero since the first three characters match.
This function uses character semantics.
This function has been vectorized in the sense that if an array of strings is passed for either of the string-valued arguments, then a corresponding array of integers will be returned. If two arrays are passed then the arrays must have the same length.
Replace one or more substrings
(new,n) = strreplace(a, b, c, max_n)
new = strreplace(a, b, c)
The strreplace
function may be used to replace one or more
occurrences of b
in a
with c
. This function
supports two calling interfaces.
The first form may be used to replace a specified number of
substrings. If max_n
is positive, then the first
max_n
occurrences of b
in a
will be replaced.
Otherwise, if max_n
is negative, then the last
abs(max_n)
occurrences will be replaced. The function returns
the resulting string and an integer indicating how many replacements
were made.
The second calling form may be used to replace all occurrences of
b
in a
with c
. In this case, only the
resulting string will be returned.
The following function illustrates how strreplace
may be used
to remove all occurrences of a specified substring:
define delete_substrings (a, b)
{
return strreplace (a, b, "");
}
Skip a range of bytes in a byte string
Int_Type strskipbytes (str, range [n0 [,nmax]])
String_Type s;
String_Type range;
Int_Type n0, nmax;
This function skips over a range of bytes in a string str
.
The byte range to be skipped is specified by the range
parameter. Optional start (n0
) and stop (nmax
)
(0-based) parameters may be used to specifiy the part of the input
string to be processed. The function returns a 0-based offset from
the beginning of the string where processing stopped.
See the documentation for the strtrans
function for the
format of the range parameter.
Get an index to the next character in a UTF-8 encoded string
(p1, wch) = strskipchar (str, p0 [,skip_combining])
This function decodes the character at the 0-based byte-offset p0
in
the string str
. It returns the byte-offset (p1
of the next
character in the string and the decoded character at byte-offset
p0
.
The optional third argument specifies the handling of combining characters. If it is non-zero, combining characters will be ignored, otherwise a combining character will not be treated differently from other characters. The default is to ignore such characters.
If the byte-offset p0
corresponds to the end of the string,
then (p0,0)
will be returned. Otherwise if the byte-offset
specifies a value that lies outside the string, an IndexError
exception will be thrown. Finally, if the byte-offset corresponds
to an illegally coded character, the character returned will be the
negative byte-value at the position.
The following is an example of a function that skips alphanumeric characters and returns the new byte-offset.
private define skip_word_chars (line, p)
{
variable p1 = p, ch;
do
{
p = p1;
(p1, ch) = strskipchar (line, p, 1);
}
while (isalnum(ch));
return p;
}
In non-UTF-8 mode (_slang_utf8_ok=0
), this function is
equivalent to:
define strskipchar (s, p)
{
if ((p < 0) || (p > strlen(s)))
throw IndexError;
if (p == strlen(s))
return (p, s[p])
return (p+1, s[p]);
}
It is important to understand that the above code relies upon
byte-semantics, which are invalid for multi-byte characters.
Replace a character with another in a string.
String_Type strsub (String_Type s, Int_Type pos, Int_Type ch)
The strsub
function may be used to substitute the character
ch
for the character at character position pos
of the string
s
. The resulting string is returned.
define replace_spaces_with_comma (s)
{
variable n;
while (n = is_substr (s, " "), n) s = strsub (s, n, ',');
return s;
}
For uses such as this, the strtrans
function is a better choice.
The first character in the string s
is specified by pos
equal to 1. This function uses character semantics, not byte
semantics.
Extract tokens from a string
String_Type[] strtok (String_Type str [,String_Type white])
strtok
breaks the string str
into a series of tokens
and returns them as an array of strings. If the second parameter
white
is present, then it specifies the set of characters
that are to be regarded as whitespace when extracting the tokens,
and may consist of the whitespace characters or a range of such
characters. If the first character of white
is '^'
,
then the whitespace characters consist of all characters except
those in white
. For example, if white
is "
\t\n,;."
, then those characters specify the whitespace
characters. However, if white
is given by
"^a-zA-Z0-9_"
, then any character is a whitespace character
except those in the ranges a-z
, A-Z
, 0-9
, and
the underscore character. To specify the hyphen character as a
whitespace character, then it should be the first character of the
whitespace string. In addition to ranges, the whitespace specifier
may also include character classes:
\w matches a unicode "word" character, taken to be alphanumeric.
\a alphabetic character, excluding digits
\s matches whitespace
\l matches lowercase
\u matches uppercase
\d matches a digit
\\ matches a backslash
\^ matches a ^ character
If the second parameter is not present, then it defaults to
"\s"
.
The following example may be used to count the words in a text file:
define count_words (file)
{
variable fp, line, count;
fp = fopen (file, "r");
if (fp == NULL) return -1;
count = 0;
while (-1 != fgets (&line, fp))
{
line = strtok (line, "^\\a");
count += length (line);
}
() = fclose (fp);
return count;
}
Here a word was assumed to consist only of alphabetic characters.
Replace characters in a string
String_Type strtrans (str, old_set, new_set)
String_Type str, old_set, new_set;
The strtrans
function may be used to replace all the characters
from the set old_set
with the corresponding characters from
new_set
in the string str
. If new_set
is empty,
then the characters in old_set
will be removed from str
.
If new_set
is not empty, then old_set
and
new_set
must be commensurate. Each set may consist of
character ranges such as A-Z
and character classes:
\, matches a punctuation character
\7 matches any 7bit ascii character
\\ matches a backslash
\^ matches the ^ character
\a matches an alphabetic character, excluding digits
\c matches a control character
\d matches a digit
\g matches a graphic character
\l matches lowercase
\p matches a printable character
\s matches whitespace
\u matches uppercase
\w matches a unicode "word" character, taken to be alphanumeric.
\x matches hex digit (a-fA-F0-9)
If the first character of a set is ^
then the set is taken to
be the complement set.
str = strtrans (str, "\\u", "\\l"); % lower-case str
str = strtrans (str, "^0-9", " "); % Replace anything but 0-9 by space
str = strtrans (str, "\\^0-9", " "); % Replace '^' and 0-9 by a space
This function has been vectorized in the sense that if an array of strings is passed as the first argument then a corresponding array of strings will be returned. Array values are not supported for the remaining arguments.
Remove whitespace from the ends of a string
String_Type strtrim (String_Type s [,String_Type w])
The strtrim
function removes all leading and trailing whitespace
characters from the string s
and returns the result. The
optional second parameter specifies the set of whitespace
characters. If the argument is not present, then the set defaults
to "\s"
. The whitespace specification may consist of
character ranges such as A-Z
and character classes:
\w matches a unicode "word" character, taken to be alphanumeric.
\a alphabetic character, excluding digits
\s matches whitespace
\l matches lowercase
\u matches uppercase
\d matches a digit
\\ matches a backslash
\^ matches a ^ character
If the first character of a set is ^
then the set is taken to
be the complement set.
This function has been vectorized in the sense that if the first argument is an array of strings, then a corresponding array of strings will be returned. An array value for the optional whitespace argument is not supported.
Remove leading whitespace from a string
String_Type strtrim_beg (String_Type s [,String_Type w])
The strtrim_beg
function removes all leading whitespace
characters from the string s
and returns the result.
The optional second parameter specifies the set of whitespace
characters. See the documentation for the strtrim
function
form more information about the whitespace parameter.
This function has been vectorized in the sense that if the first argument is an array of strings, then a corresponding array of strings will be returned. An array value for the optional whitespace argument is not supported.
Remove trailing whitespace from a string
String_Type strtrim_end (String_Type s [,String_Type w])
The strtrim_end
function removes all trailing whitespace
characters from the string s
and returns the result. The
optional second parameter specifies the set of whitespace
characters. See the documentation for the strtrim
function
form more information about the whitespace parameter.
This function has been vectorized in the sense that if the first argument is an array of strings, then a corresponding array of strings will be returned. An array value for the optional whitespace argument is not supported.
Convert a string to uppercase
String_Type strup (String_Type s)
The strup
function takes a string s
and returns another
string identical to s
except that all lower case characters
that contained in s
are converted to upper case.
The function
define Strcmp (a, b)
{
return strcmp (strup (a), strup (b));
}
performs a case-insensitive comparison operation of two strings by
converting them to upper case first.
This function has been vectorized in the sense that if an array of strings is passed to the function, then a corresponding array of strings will be returned.
Delete characters from a string
String_Type str_delete_chars (String_Type str [, String_Type del_set])
This function may be used to delete the set of characters specified
by the optional argument del_set
from the string str
.
If del_set
is not given, "\s"
will be used.
The modified string is returned.
The set of characters to be deleted may include ranges such as
A-Z
and characters classes:
\w matches a unicode "word" character, taken to be alphanumeric.
\a alphabetic character, excluding digits
\s matches whitespace
\l matches lowercase
\u matches uppercase
\d matches a digit
\\ matches a backslash
\^ matches a ^ character
If the first character of del_set
is ^
, then the set
is taken to be the complement of the remaining string.
str = str_delete_chars (str, "^A-Za-z");
will remove all characters except A-Z
and a-z
from
str
. Similarly,
str = str_delete_chars (str, "^\\a");
will remove all but the alphabetic characters.
This function has been vectorized in the sense that if an array of strings is passed as the first argument then a corresponding array of strings will be returned. Array values are not supported for the remaining arguments.
Escape characters in a string.
String_Type str_quote_string(String_Type str, String_Type qlis, Int_Type quote)
The str_quote_string
returns a string identical to str
except that all characters contained in the string qlis
are
escaped with the quote
character, including the quote
character itself. This function is useful for making a string that
can be used in a regular expression.
Execution of the statements
node = "Is it [the coat] really worth $100?";
tag = str_quote_string (node, "\\^$[]*.+?", '\\');
will result in tag
having the value:
Is it \[the coat\] really worth \$100\?
Replace a substring of a string (deprecated)
Int_Type str_replace (String_Type a, String_Type b, String_Type c)
The str_replace
function replaces the first occurrence of b
in
a
with c
and returns an integer that indicates whether a
replacement was made. If b
does not occur in a
, zero is
returned. However, if b
occurs in a
, a non-zero integer is
returned as well as the new string resulting from the replacement.
This function has been superceded by strreplace
. It should no
longer be used.
Remove comments from a string
String_Type str_uncomment_string(String_Type s, String_Type beg, String_Type end)
This function may be used to remove simple forms of comments from a
string s
. The parameters, beg
and end
, are strings
of equal length whose corresponding characters specify the begin and
end comment characters, respectively. It returns the uncommented
string.
The expression
str_uncomment_string ("Hello (testing) 'example' World", "'(", "')")
returns the string "Hello World"
.
This routine does not handle multi-character comment delimiters and it assumes that comments are not nested.
Extract a substring from a string
String_Type substr (String_Type s, Int_Type n, Int_Type len)
The substr
function returns a substring with character length
len
of the string s
beginning at the character position
n
. If len
is -1
, the entire length of the string
s
will be used for len
. The first character of s
is given by n
equal to 1.
substr ("To be or not to be", 7, 5);
returns "or no"
This function assumes character semantics and not byte semantics.
Use the substrbytes
function to extract bytes from a string.
Extract a byte sequence from a string
String_Type substrbytes (String_Type s, Int_Type n, Int_Type len)
The substrbytes
function returns a substring with byte length
len
of the string s
beginning at the byte position
n
, counting from 1. If len
is -1
, the entire
byte-length of the string s
will be used for len
. The first
byte of s
is given by n
equal to 1.
substrbytes ("To be or not to be", 7, 5);
returns "or no"
In many cases it is more convenient to use array indexing rather
than the substrbytes
function. In fact
substrbytes(s,i+1,-1)
is equivalent to
s[[i:]]
.
The function substr
may be used if character semantics are
desired.