This module allows a S-Lang script to read and write
comma-separated-value, tab-delimited files, etc. Use
require("csv")
to load it.
Instantiate a parser for CSV data
obj = cvs_decoder_new (filename|File_Type|Strings[])
This function instantiates an object that may be used to parse and read so-called comma-separated-value (CSV) data. It requires a single argument, which may be the name of a file, an open file pointer, or an array of strings.
delim
: character used for the delimiter (Default: ','
)
quote
: character used for the quoting fields (Default: '"'
)
skiplines
: number of lines to skip before parsing (Default: 0
)
comment
: lines beginning with this string will be skipped
blankrows
: default for how blank rows should be handled (Default: "skip"
)
readrow
: Read and parse a row from the CSV object
readcol
: Read one or more columns from the CVS object
See the documentation for the cvs.readcol
and
cvs.readrow
methods for examples.
The current implementation assumes the CSV format specified according to RFC 4180.
It is important to understand the difference between a ROW and a LINE
in a CSV formatted file: a row may span more than one line in a file. The
skiplines
qualifier specifies the number of LINES to be
skipped, not ROWS.
CSV files have no notion of data-types: all field values are strings.
For this reason, the type
qualifier introduces an extra layer
that is not part CSV format.
csv.readcol, csv.readrow
Read one or more columns from a CSV file
datastruct = csv.readcol([columns])
This function method may be used to read one or more columns from a comma-separated-value file. If passed with no arguments, all columns of the file will be returned. Otherwise, only those columns specified by the columns argument will be returned.
The return value is a structure with fields that correspond to the
desired columns. The default is for the structure to have field
names col1
, col2
, etc., where the integer suffix
specifies the column number. The fields
and header
qualifiers may be used to specify a different set of names.
fields
: An array of field names to use for the returned structure
header
: Array of strings that correspond to the header row
type
: An scalar or array type-specifier, see below
typeN
: Type-specifier for column N
snan
: String value to use for an empty string element (Default: ""
)
inan
: Integer value to use for an empty integer element (Default: 0
)
lnan
: Long int value to use for an empty long int element (Default: 0L
)
fnan
: Float value to use for an empty float element (Default: _NaN
)
dnan
: Double value to use for an empty double element (Default: _NaN
)
nanN
: Value used for an empty element in the column N
blankrows
: How a blank row should be handled (Default: "skip"
)
The type-specifier is used to specify the type of a field. It must be one of the following characters:
's' (String_Type)
'i' (Int_Type)
'l' (Long_Type)
'f' (Float_Type)
'd' (Double_Type)
If the value of the type
qualifier is scalar, then all
columns will default to use the corresponding type. If different
types are desired, then an array of type-specifiers may be used.
The length of the array must be the same as the number of columns to
be returned. The typeN
qualifier may be used to give the
type of column N.
If the columns
argument is string-valued, then the
header
qualifier must be supplied to provide a mapping
from column names to column numbers. If it is present, it will also
be used to give normalized field names to the returned structure.
For normalization, the column name is first lower-cased, then all
non-alphanumeric values are converted to "_", and excess underscore
characters removed.
See the documentation for the csv.readrow
for more
information about how blank rows should be handled.
Suppose that data.csv
is a file that contains
# The data below are from runs 6 and 7
x,y,errx,erry,Notes - or - Comments
10.2,0.5,,0.1,
13.4,0.9,0.1,0.16,
20.7,18.2,,0.3,Vacuum leak in beam line
29.6,1.3,,0.31,
31.2,1.2,0.11,0.33,"This data point
taken from run 7"
This file consists of 8 lines and forms a CSV file with 6 rows. The
first row consists of a single column, and the subsequent rows of
consist of 5 columns. columns. Note that the last row is split
across two lines. The row with the single column will be regarded as
a comment in what follows.
The first step is to instantiate a parser object using:
csv = csv_decoder_new ("data.csv" ;comment="#");
The use of the comment
qualifier will cause all lines
beginning with "#"
to be skipped. Alternatively, the first
line could have been skipped using
csv = csv_decoder_new ("data.csv" ;skiplines=1);
The second row (also second line) in the file is the header line: it
gives the names of the columns. It may be read using
header = csv.readrow ();
The rest of the file consists of the data values. We want to read
the first 4 columns as single precision (Float_Type
) values,
and the 5th as a string. One way to do this is
table = csv.readcol (;type=['f','f','f','f','s']);
This will result in table
set to a structure of the form
struct { col1 = Float_Type[5],
col2 = Float_Type[5],
col3 = Float_Type[5],
col4 = Float_Type[5],
col5 = String_Type[5]
}
The same result could also have been achieved using
table = csv.readcol (;type='f', type5='s');
If the header
qualifier is used, then
table = csv.readcol (;type='f', type5='s', header=header);
would produce the structure
struct {x=Float_Type[5],
y=Float_Type[5],
errx=Float_Type[5],
erry=Float_Type[5],
notes_or_comments=String_Type[5]
}
Note how the "Notes -or- Comments" value was normalized.
To read just the x
and y
columns, either of the
following may be used:
table = csv.readcol ([1,2] ;type='f');
table = csv.readcol (["x","y"] ;type='f', header=header);
The header
qualifier was required in the last form to map the
column names to numbers.
cvs_decoder_new, csv_readcol, readascii
Read a row from a CSV file
row = csv.readrow ()
The csv.readrow
function method may be used to read the next
row from the underlying CSV (comma-separated-value) parser object. The
object must have already been instantiated using the
cvs_decoder_new
function. It returns the row data in the form
of an array of strings. If the end of input it reached, NULL
will
be returned.
blankrows
: How a blank row should be handled (Default: "skip"
)
The blankrows
qualifier is used to specify how a blank row
should be handled. A blankrow is defined as a row made up of no
characters except for the newline or carriage-return sequence. For
example, the following 9 lines has one blank row that occurs on
line 8:
"12.3"
"4
5"
"5.1"
""
"7.2"
"6.2"
If the value of "blankrow"
is "skip"
, then blank rows
will be ignored by the parser. If the value is "stop"
, then the row
will be returned as an empty array of strings (length equal to 0).
Otherwise the row will be treated as if it contained the empty
string and returned as an array of length 1 with a value of "". The
default behavior is to skip such rows.
csv_decoder_new, csv.readcol, csv_readcol
Read one or more columns from a CSV file
Struct_Type csv_readcol (file|fp [,columns] ;qualifiers)
This function may be used to read one or more of the columns in the specified
CSV file. If the columns
argument is present, then only those
columns will be read; otherwise all columns in the file will be read.
The columns will be returned in the form of a structure.
This function supports all of the qualifiers supported by the
csv_decoder_new
function and the csv.readcol
method. In
addition, if the has_header
qualifier is present, the first
line processed (after skipping any lines implied by the
skiplines
and comment
qualifiers) will be regarded as
the header.
If the rdb
qualifier is present, then assume that the file is
in the so-called RDB file format. This is a tab-delimited format
that consists of a line that contains the names of the fields,
followed by a line that specifies the data types of the columns.
data = csv_readcol ("mirror.csv" ;comment="#", has_header, delim='|');
data = csv_readcol ("foo.rdb" ; rdb);
csv_decoder_new, csv.readcol, csv.readrow, csv.writecol, csv_encoder_new
Create an object for writing CSV files
csv = csv_encoder_new ()
The csv_encoder_new
function returns an object that may
be used for creating a CSV file.
delim
: Character used for the field delimiter (Default: ','
)
quote
: Character used for quoting fields (Default: '"'
)
quoteall
: Quote all field values
quotesome
: Quote only those fields where quoting is necessary
writecol
: write one or more columns to a file. For more information
about this method, see the documentation for csv.writecol
.
x = [0:2*PI:#100]; csv = csv_encoder_new (;delim='|'); csv.writecol ("sinx.csv", x, sin(x) ; names=["x", "sin of x"]);
The set_float_format
function may be used to specify the
format used where writing floating point numbers to the CSV file.
csv_writecol, csv_encoder_new, csv_readcol
Write to a file using a CSV format
csv_writecol(file|fp, datalist | datastruct | col1,...,colN)
This function write a one or more data columns to a file or open
file descriptor using a comma-separated-value format. The data
values may be expressed in several ways: as a list of a column
values (datalist
), a structure whose fields specify the
column values (datastruct
), or passed explicitly as
individual values (col1
,...colN
).
delim
: character used for the delimiter (Default: ','
)
names
: An array of strings used to name the columns
noheader
: Do not write a header to the file
quote
: character used for the quoting fields (Default: '"'
)
quoteall
: Quote all field values
quotesome
: Quote only those fields where quoting is necessary
rdb
: Write in an rdb-style TAB delimited format
Unless the noheader
qualifier is given, a header containing
the names of the columns will be written if either the names
qualifier is present, or the data values are passed as a structure.
In the latter case, the structure field names will be used for the
column names.
The function will throw an IOError
exception if a write error occurs.
For the purposes of this example, assume that three arrays are
given: time
, temp
, humidity
, which represent a
time series of temperature and humidity. Here are three equivalent
ways of writing the arrays to a CSV file:
% Via a structure
data = struct {time=time, temperature=temp, humidity=humidity};
csv_writecol ("weather.dat", data);
% Via a list
data = {time, temp, humidity};
csv_writecol ("weather.dat", data
;names=["time","temperature","humidity"]);
% Via explicit arguments
csv_writecol ("weather.dat", time, temp, humidity
;names=["time","temperature","humidity"]);
This function is a simple wrapper around csv_encoder_new
and
csv.writecol
.
csv_encoder_new, csv.writecol, csv_readcol
Write to a file using a CSV format
csv.writecol(file|fp, datalist | datastruct | col1,...,colN)
The csv.writecol
function method may be used to write one of
more data columns to a file or file descriptor via a CSV encoder
object instantiated using the csv_encoder_new
function.
The data values may be expressed in several ways: as a list of a
column values (datalist
), a structure whose fields specify
the column values (datastruct
), or passed explicitly as
individual values (col1
,...colN
).
An IOError
exception will be thrown if a write error occurs.
delim
: character used for the delimiter (Default: ','
)
names
: An array of strings used for names of the columns
noheader
: Do not write a header to the file
quoteall
: Quote all field values
quotesome
: Quote only those fields where quoting is necessary
rdb
: Write in an rdb-style TAB delimited format
Unless the noheader
qualifier is given, a header containing
the names of the columns will be written if either the names
qualifier is present, or the data values are passed as a structure.
In the latter case, the structure field names will be used for the
column names.
For the purposes of this example, assume that three arrays are
given: time
, temp
, humidity
, which represent a
time series of temperature and humidity. Here are three equivalent
ways of writing the arrays to a TAB delimited file:
csv = csv_encoder_new (;quote='\t');
% Via a structure
data = struct {time=time, temperature=temp, humidity=humidity};
csv.writecol ("weather.dat", data);
% Via a list
data = {time, temp, humidity};
csv.writecol ("weather.dat", data
;names=["time","temperature","humidity"]);
% Via explicit arguments
csv.writecol ("weather.dat", time, temp, humidity
;names=["time","temperature","humidity"]);
csv_encoder_new, csv_writecol, csv_readcol