Difference between revisions of "HOWTO get ns-3 data into SciPy"

From Nsnam
Jump to: navigation, search
(NetCDF Files)
(Ways to get data into the framework)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
= Introduction =
 +
 +
[http://www.scipy.org SciPy] is open-source Python software for mathematics, science, and engineering.  The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines.
 +
 
= Ways to get data into the framework =
 
= Ways to get data into the framework =
  
Line 89: Line 93:
 
* Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
 
* Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
  
The following is an ASCII representation of an example NetCDF file that shows the type of information that can be stored in a NetCDF files:
+
The following is an ASCII representation of a netCDF file that shows the type of information that can be stored in netCDF files:
  
 
     netcdf sfc_pres_temp {
 
     netcdf sfc_pres_temp {
Line 199: Line 203:
  
 
= Ways to visualize data =
 
= Ways to visualize data =
 +
 +
There are several packages available to produce interactive screen graphics (use the mouse to zoom, orient, and fine-tune) and publication-quality printed plots, in 2D, 3D, and 4D (animations):
 +
 +
* '''2D Plotting with Matplotlib:''' Matplotlib is the preferred package for 2D graphics.
 +
* '''3D Plotting with Matplotlib:''' Simple 3D plots using matplotlib and its now-included 3D capabilities.
 +
* '''3D plotting with Mayavi:''' Advanced 3D data visualization with MayaVi2 (and TVTK): a very powerful interactive scientific data visualizer.
 +
* '''Python Imaging Library:''' Create/manipulate images as numpy array's.
 +
* '''Plotting with xplt:''' xplt is very fast but less flexible than matplotlib. It allows simple 3-d surface visualizations as well. It is based on pygist (included) and is available under the sandbox directory in SVN scipy.
 +
* '''Mat3d:''' Simple 3D plotting using an OpenGL backend.
 +
* '''Line Integral Convolution code in cython:''' for visualizing vector fields
 +
* '''VTK volume rendering:''' This is a simple example that show how to use VTK to volume render your three dimensional numpy arrays.
  
 
= Ways to analyze data =
 
= Ways to analyze data =
 +
 +
SciPy has the following capabilities related to analyzing data:
 +
 +
* '''stats module''': Basic statistical functions.
 +
 +
* '''leastsq function:'''  Mathematical method that finds the parameters that give an optimal fit to real data using the Levenberg-Marquandt algorithm for non-linear least-squares optimization.
 +
 +
* '''Interface to R for Advanced Data Analysis''': Via RPy, SciPy can interface to the R statistical package for more advanced data analysis.  R is an open-source data analysis and statistics program, with an incredible variety of packages for all sorts of analyses but a steep learning curve to use.

Latest revision as of 18:47, 1 February 2011

Introduction

SciPy is open-source Python software for mathematics, science, and engineering. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines.

Ways to get data into the framework

Files

Text Files

Use numpy.loadtxt to load arrays from text files.

Here is a description of numpy.loadtxt from its help message:

  loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False)
      Load data from a text file.
      
      Each row in the text file must have the same number of values.
      
      Parameters
      ----------
      fname : file or string
          File or filename to read.  If the filename extension is ``.gz`` or
          ``.bz2``, the file is first decompressed.
      dtype : data-type
          Data type of the resulting array.  If this is a record data-type,
          the resulting array will be 1-dimensional, and each row will be
          interpreted as an element of the array.   In this case, the number
          of columns used must match the number of fields in the data-type.
      comments : string, optional
          The character used to indicate the start of a comment.
      delimiter : string, optional
          The string used to separate values.  By default, this is any
          whitespace.
      converters : {}
          A dictionary mapping column number to a function that will convert
          that column to a float.  E.g., if column 0 is a date string:
          ``converters = {0: datestr2num}``. Converters can also be used to
          provide a default value for missing data:
          ``converters = {3: lambda s: float(s or 0)}``.
      skiprows : int
          Skip the first `skiprows` lines.
      usecols : sequence
          Which columns to read, with 0 being the first.  For example,
          ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
      unpack : bool
          If True, the returned array is transposed, so that arguments may be
          unpacked using ``x, y, z = loadtxt(...)``
      
      Returns
      -------
      out : ndarray
          Data read from the text file.
      
      See Also
      --------
      scipy.io.loadmat : reads Matlab(R) data files
      
      Examples
      --------
      >>> from StringIO import StringIO   # StringIO behaves like a file object
      >>> c = StringIO("0 1\n2 3")
      >>> np.loadtxt(c)
      array([[ 0.,  1.],
             [ 2.,  3.]])
      
      >>> d = StringIO("M 21 72\nF 35 58")
      >>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
      ...                      'formats': ('S1', 'i4', 'f4')})
      array([('M', 21, 72.0), ('F', 35, 58.0)],
            dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')])
      
      >>> c = StringIO("1,0,2\n3,0,4")
      >>> x,y = np.loadtxt(c, delimiter=',', usecols=(0,2), unpack=True)
      >>> x
      array([ 1.,  3.])
      >>> y
      array([ 2.,  4.])

Note that Comma Separated Value (CSV) files can be read by specifying the comma character as the delimiter for numpy.loadtxt.

NetCDF Files

SciPy can handle Network Common Data Form (NetCDF) files, which use a self-describing, machine-independent data format that supports the creation, access, and sharing of array-oriented scientific data.

NetCDF data is:

  • Self-Describing. A netCDF file includes information about the data it contains.
  • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
  • Scalable. A small subset of a large dataset may be accessed efficiently.
  • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
  • Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
  • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.

The following is an ASCII representation of a netCDF file that shows the type of information that can be stored in netCDF files:

    netcdf sfc_pres_temp {
    dimensions:
    	latitude = 6 ;
    	longitude = 12 ;
    variables:
    	float latitude(latitude) ;
    		latitude:units = "degrees_north" ;
    	float longitude(longitude) ;
    		longitude:units = "degrees_east" ;
    	float pressure(latitude, longitude) ;
    		pressure:units = "hPa" ;
    	float temperature(latitude, longitude) ;
    		temperature:units = "celsius" ;
    data:
    
     latitude = 25, 30, 35, 40, 45, 50 ;
    
     longitude = -125, -120, -115, -110, -105, -100, -95, -90, -85, -80, -75, -70 ;
    
     pressure =
      900, 906, 912, 918, 924, 930, 936, 942, 948, 954, 960, 966,
      901, 907, 913, 919, 925, 931, 937, 943, 949, 955, 961, 967,
      902, 908, 914, 920, 926, 932, 938, 944, 950, 956, 962, 968,
      903, 909, 915, 921, 927, 933, 939, 945, 951, 957, 963, 969,
      904, 910, 916, 922, 928, 934, 940, 946, 952, 958, 964, 970,
      905, 911, 917, 923, 929, 935, 941, 947, 953, 959, 965, 971 ;
    
     temperature =
      9, 10.5, 12, 13.5, 15, 16.5, 18, 19.5, 21, 22.5, 24, 25.5,
      9.25, 10.75, 12.25, 13.75, 15.25, 16.75, 18.25, 19.75, 21.25, 22.75, 24.25,
        25.75,
      9.5, 11, 12.5, 14, 15.5, 17, 18.5, 20, 21.5, 23, 24.5, 26,
      9.75, 11.25, 12.75, 14.25, 15.75, 17.25, 18.75, 20.25, 21.75, 23.25, 24.75,
        26.25,
      10, 11.5, 13, 14.5, 16, 17.5, 19, 20.5, 22, 23.5, 25, 26.5,
      10.25, 11.75, 13.25, 14.75, 16.25, 17.75, 19.25, 20.75, 22.25, 23.75,
        25.25, 26.75 ;
    }

MATLAB Data Files

SciPy can handle MATLAB format data files.

See the MATLAB web site for details on MATLAB format files.

Matrix Market Files

SciPy can handle Matrix Market (MM) format files, which are a set of human readable, ASCII-based file formats designed to facilitate the exchange of matrix data.

If you had the following sparse matrix,

            1    0      0       6      0     
            0   10.5    0       0      0     
            0    0    .015      0      0     
            0  250.5    0     -280    33.32  
            0    0      0       0     12     

then it would be represented as follows:

  %%MatrixMarket matrix coordinate real general
  %=================================================================================
  %
  % This ASCII file represents a sparse MxN matrix with L 
  % nonzeros in the following Matrix Market format:
  %
  % +----------------------------------------------+
  % |%%MatrixMarket matrix coordinate real general | <--- header line
  % |%                                             | <--+
  % |% comments                                    |    |-- 0 or more comment lines
  % |%                                             | <--+         
  % |    M  N  L                                   | <--- rows, columns, entries
  % |    I1  J1  A(I1, J1)                         | <--+
  % |    I2  J2  A(I2, J2)                         |    |
  % |    I3  J3  A(I3, J3)                         |    |-- L lines
  % |        . . .                                 |    |
  % |    IL JL  A(IL, JL)                          | <--+
  % +----------------------------------------------+   
  %
  % Indices are 1-based, i.e. A(1,1) is the first element.
  %
  %=================================================================================
    5  5  8
      1     1   1.000e+00
      2     2   1.050e+01
      3     3   1.500e-02
      1     4   6.000e+00
      4     2   2.505e+02
      4     4  -2.800e+02
      4     5   3.332e+01
      5     5   1.200e+01

Databases

SciPy can interface with PyTables, a hierarchical database package designed to efficiently manage large amounts of data using HDF5.

Pipes

Scipy does not have any direct support for pipes.

Python, however, can read the standard output of external pipes and write to standard input for other pipes to use. So, you could use the Python code that calls scipy to interact with pipes in the normal way.

Sockets

SciPy does not have any direct support for sockets.

Python, however, does with the standard socket module. So, you could use Python to interact with a socket using the standard Python socket module.

Ways to visualize data

There are several packages available to produce interactive screen graphics (use the mouse to zoom, orient, and fine-tune) and publication-quality printed plots, in 2D, 3D, and 4D (animations):

  • 2D Plotting with Matplotlib: Matplotlib is the preferred package for 2D graphics.
  • 3D Plotting with Matplotlib: Simple 3D plots using matplotlib and its now-included 3D capabilities.
  • 3D plotting with Mayavi: Advanced 3D data visualization with MayaVi2 (and TVTK): a very powerful interactive scientific data visualizer.
  • Python Imaging Library: Create/manipulate images as numpy array's.
  • Plotting with xplt: xplt is very fast but less flexible than matplotlib. It allows simple 3-d surface visualizations as well. It is based on pygist (included) and is available under the sandbox directory in SVN scipy.
  • Mat3d: Simple 3D plotting using an OpenGL backend.
  • Line Integral Convolution code in cython: for visualizing vector fields
  • VTK volume rendering: This is a simple example that show how to use VTK to volume render your three dimensional numpy arrays.

Ways to analyze data

SciPy has the following capabilities related to analyzing data:

  • stats module: Basic statistical functions.
  • leastsq function: Mathematical method that finds the parameters that give an optimal fit to real data using the Levenberg-Marquandt algorithm for non-linear least-squares optimization.
  • Interface to R for Advanced Data Analysis: Via RPy, SciPy can interface to the R statistical package for more advanced data analysis. R is an open-source data analysis and statistics program, with an incredible variety of packages for all sorts of analyses but a steep learning curve to use.