Next Previous Contents

4. CSV Module

This module allows a S-Lang script to read and write comma-separated-value, tab-delimited files, etc. Use require("csv") to load it.

4.1 csv_decoder_new

Synopsis

Instantiate a parser for CSV data

Usage

obj = cvs_decoder_new (filename|File_Type|Strings[])

Description

This function instantiates an object that may be used to parse and read so-called comma-separated-value (CSV) data. It requires a single argument, which may be the name of a file, an open file pointer, or an array of strings.

Qualifiers

delim : character used for the delimiter (Default: ',')

quote : character used for the quoting fields (Default: '"')

skiplines : number of lines to skip before parsing (Default: 0)

comment : lines beginning with this string will be skipped

blankrows : default for how blank rows should be handled (Default: "skip")

Methods

readrow : Read and parse a row from the CSV object

readcol : Read one or more columns from the CVS object

Example

See the documentation for the cvs.readcol and cvs.readrow methods for examples.

Notes

The current implementation assumes the CSV format specified according to RFC 4180.

It is important to understand the difference between a ROW and a LINE in a CSV formatted file: a row may span more than one line in a file. The skiplines qualifier specifies the number of LINES to be skipped, not ROWS.

CSV files have no notion of data-types: all field values are strings. For this reason, the type qualifier introduces an extra layer that is not part CSV format.

See Also

csv.readcol, csv.readrow

4.2 csv.readcol

Synopsis

Read one or more columns from a CSV file

Usage

datastruct = csv.readcol([columns])

Description

This function method may be used to read one or more columns from a comma-separated-value file. If passed with no arguments, all columns of the file will be returned. Otherwise, only those columns specified by the columns argument will be returned.

The return value is a structure with fields that correspond to the desired columns. The default is for the structure to have field names col1, col2, etc., where the integer suffix specifies the column number. The fields and header qualifiers may be used to specify a different set of names.

Qualifiers

fields : An array of field names to use for the returned structure

header : Array of strings that correspond to the header row

type : An scalar or array type-specifier, see below

typeN : Type-specifier for column N

snan : String value to use for an empty string element (Default: "")

inan : Integer value to use for an empty integer element (Default: 0)

lnan : Long int value to use for an empty long int element (Default: 0L)

fnan : Float value to use for an empty float element (Default: _NaN)

dnan : Double value to use for an empty double element (Default: _NaN)

nanN : Value used for an empty element in the column N

blankrows : How a blank row should be handled (Default: "skip")

The type-specifier is used to specify the type of a field. It must be one of the following characters:

     's' (String_Type)
     'i' (Int_Type)
     'l' (Long_Type)
     'f' (Float_Type)
     'd' (Double_Type)
If the value of the type qualifier is scalar, then all columns will default to use the corresponding type. If different types are desired, then an array of type-specifiers may be used. The length of the array must be the same as the number of columns to be returned. The typeN qualifier may be used to give the type of column N.

If the columns argument is string-valued, then the header qualifier must be supplied to provide a mapping from column names to column numbers. If it is present, it will also be used to give normalized field names to the returned structure. For normalization, the column name is first lower-cased, then all non-alphanumeric values are converted to "_", and excess underscore characters removed.

See the documentation for the csv.readrow for more information about how blank rows should be handled.

Example

Suppose that data.csv is a file that contains

    # The data below are from runs 6 and 7
    x,y,errx,erry,Notes - or - Comments
    10.2,0.5,,0.1,
    13.4,0.9,0.1,0.16,
    20.7,18.2,,0.3,Vacuum leak in beam line
    29.6,1.3,,0.31,
    31.2,1.2,0.11,0.33,"This data point
    taken from run 7"
This file consists of 8 lines and forms a CSV file with 6 rows. The first row consists of a single column, and the subsequent rows of consist of 5 columns. columns. Note that the last row is split across two lines. The row with the single column will be regarded as a comment in what follows.

The first step is to instantiate a parser object using:

    csv = csv_decoder_new ("data.csv" ;comment="#");
The use of the comment qualifier will cause all lines beginning with "#" to be skipped. Alternatively, the first line could have been skipped using
    csv = csv_decoder_new ("data.csv" ;skiplines=1);
The second row (also second line) in the file is the header line: it gives the names of the columns. It may be read using
    header = csv.readrow ();
The rest of the file consists of the data values. We want to read the first 4 columns as single precision (Float_Type) values, and the 5th as a string. One way to do this is
    table = csv.readcol (;type=['f','f','f','f','s']);
This will result in table set to a structure of the form
   struct { col1 = Float_Type[5],
            col2 = Float_Type[5],
            col3 = Float_Type[5],
            col4 = Float_Type[5],
            col5 = String_Type[5]
          }
The same result could also have been achieved using
    table = csv.readcol (;type='f', type5='s');
If the header qualifier is used, then
    table = csv.readcol (;type='f', type5='s', header=header);
would produce the structure
   struct {x=Float_Type[5],
           y=Float_Type[5],
           errx=Float_Type[5],
           erry=Float_Type[5],
           notes_or_comments=String_Type[5]
          }
Note how the "Notes -or- Comments" value was normalized.

To read just the x and y columns, either of the following may be used:

    table = csv.readcol ([1,2] ;type='f');
    table = csv.readcol (["x","y"] ;type='f', header=header);
The header qualifier was required in the last form to map the column names to numbers.

See Also

cvs_decoder_new, csv_readcol, readascii

4.3 csv.readrow

Synopsis

Read a row from a CSV file

Usage

row = csv.readrow ()

Description

The csv.readrow function method may be used to read the next row from the underlying CSV (comma-separated-value) parser object. The object must have already been instantiated using the cvs_decoder_new function. It returns the row data in the form of an array of strings. If the end of input it reached, NULL will be returned.

Qualifiers

blankrows : How a blank row should be handled (Default: "skip")

The blankrows qualifier is used to specify how a blank row should be handled. A blankrow is defined as a row made up of no characters except for the newline or carriage-return sequence. For example, the following 9 lines has one blank row that occurs on line 8:

     "12.3"
     "4

     5"
     "5.1"
     ""
     "7.2"

     "6.2"
If the value of "blankrow" is "skip", then blank rows will be ignored by the parser. If the value is "stop", then the row will be returned as an empty array of strings (length equal to 0). Otherwise the row will be treated as if it contained the empty string and returned as an array of length 1 with a value of "". The default behavior is to skip such rows.

See Also

csv_decoder_new, csv.readcol, csv_readcol

4.4 csv_readcol

Synopsis

Read one or more columns from a CSV file

Usage

Struct_Type csv_readcol (file|fp [,columns] ;qualifiers)

Description

This function may be used to read one or more of the columns in the specified CSV file. If the columns argument is present, then only those columns will be read; otherwise all columns in the file will be read. The columns will be returned in the form of a structure.

Qualifiers

This function supports all of the qualifiers supported by the csv_decoder_new function and the csv.readcol method. In addition, if the has_header qualifier is present, the first line processed (after skipping any lines implied by the skiplines and comment qualifiers) will be regarded as the header.

If the rdb qualifier is present, then assume that the file is in the so-called RDB file format. This is a tab-delimited format that consists of a line that contains the names of the fields, followed by a line that specifies the data types of the columns.

Example

   data = csv_readcol ("mirror.csv" ;comment="#", has_header, delim='|');
   data = csv_readcol ("foo.rdb" ; rdb);

See Also

csv_decoder_new, csv.readcol, csv.readrow, csv.writecol, csv_encoder_new

4.5 csv_encoder_new

Synopsis

Create an object for writing CSV files

Usage

csv = csv_encoder_new ()

Description

The csv_encoder_new function returns an object that may be used for creating a CSV file.

Qualifiers

delim : Character used for the field delimiter (Default: ',')

quote : Character used for quoting fields (Default: '"')

quoteall : Quote all field values

quotesome : Quote only those fields where quoting is necessary

Methods

writecol : write one or more columns to a file. For more information about this method, see the documentation for csv.writecol

.

Example

x = [0:2*PI:#100]; csv = csv_encoder_new (;delim='|'); csv.writecol ("sinx.csv", x, sin(x) ; names=["x", "sin of x"]);

Notes

The set_float_format function may be used to specify the format used where writing floating point numbers to the CSV file.

See Also

csv_writecol, csv_encoder_new, csv_readcol

4.6 csv_writecol

Synopsis

Write to a file using a CSV format

Usage

csv_writecol(file|fp, datalist | datastruct | col1,...,colN)

Description

This function write a one or more data columns to a file or open file descriptor using a comma-separated-value format. The data values may be expressed in several ways: as a list of a column values (datalist), a structure whose fields specify the column values (datastruct), or passed explicitly as individual values (col1,...colN).

Qualifiers

delim : character used for the delimiter (Default: ',')

names : An array of strings used to name the columns

noheader : Do not write a header to the file

quote : character used for the quoting fields (Default: '"')

quoteall : Quote all field values

quotesome : Quote only those fields where quoting is necessary

rdb : Write in an rdb-style TAB delimited format

Unless the noheader qualifier is given, a header containing the names of the columns will be written if either the names qualifier is present, or the data values are passed as a structure. In the latter case, the structure field names will be used for the column names.

The function will throw an IOError exception if a write error occurs.

Example

For the purposes of this example, assume that three arrays are given: time, temp, humidity, which represent a time series of temperature and humidity. Here are three equivalent ways of writing the arrays to a CSV file:

     % Via a structure
     data = struct {time=time, temperature=temp, humidity=humidity};
     csv_writecol ("weather.dat", data);

     % Via a list
     data = {time, temp, humidity};
     csv_writecol ("weather.dat", data
                   ;names=["time","temperature","humidity"]);

     % Via explicit arguments
     csv_writecol ("weather.dat", time, temp, humidity
                   ;names=["time","temperature","humidity"]);

Notes

This function is a simple wrapper around csv_encoder_new and csv.writecol.

See Also

csv_encoder_new, csv.writecol, csv_readcol

4.7 csv.writecol

Synopsis

Write to a file using a CSV format

Usage

csv.writecol(file|fp, datalist | datastruct | col1,...,colN)

Description

The csv.writecol function method may be used to write one of more data columns to a file or file descriptor via a CSV encoder object instantiated using the csv_encoder_new function.

The data values may be expressed in several ways: as a list of a column values (datalist), a structure whose fields specify the column values (datastruct), or passed explicitly as individual values (col1,...colN).

An IOError exception will be thrown if a write error occurs.

Qualifiers

delim : character used for the delimiter (Default: ',')

names : An array of strings used for names of the columns

noheader : Do not write a header to the file

quoteall : Quote all field values

quotesome : Quote only those fields where quoting is necessary

rdb : Write in an rdb-style TAB delimited format

Unless the noheader qualifier is given, a header containing the names of the columns will be written if either the names qualifier is present, or the data values are passed as a structure. In the latter case, the structure field names will be used for the column names.

Example

For the purposes of this example, assume that three arrays are given: time, temp, humidity, which represent a time series of temperature and humidity. Here are three equivalent ways of writing the arrays to a TAB delimited file:

     csv = csv_encoder_new (;quote='\t');

     % Via a structure
     data = struct {time=time, temperature=temp, humidity=humidity};
     csv.writecol ("weather.dat", data);

     % Via a list
     data = {time, temp, humidity};
     csv.writecol ("weather.dat", data
                   ;names=["time","temperature","humidity"]);

     % Via explicit arguments
     csv.writecol ("weather.dat", time, temp, humidity
                   ;names=["time","temperature","humidity"]);

See Also

csv_encoder_new, csv_writecol, csv_readcol


Next Previous Contents