HOWTO get ns-3 data into SciPy
Contents
Introduction
SciPy is open-source Python software for mathematics, science, and engineering. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines.
Ways to get data into the framework
Files
Text Files
Use numpy.loadtxt to load arrays from text files.
Here is a description of numpy.loadtxt from its help message:
loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False) Load data from a text file. Each row in the text file must have the same number of values. Parameters ---------- fname : file or string File or filename to read. If the filename extension is ``.gz`` or ``.bz2``, the file is first decompressed. dtype : data-type Data type of the resulting array. If this is a record data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array. In this case, the number of columns used must match the number of fields in the data-type. comments : string, optional The character used to indicate the start of a comment. delimiter : string, optional The string used to separate values. By default, this is any whitespace. converters : {} A dictionary mapping column number to a function that will convert that column to a float. E.g., if column 0 is a date string: ``converters = {0: datestr2num}``. Converters can also be used to provide a default value for missing data: ``converters = {3: lambda s: float(s or 0)}``. skiprows : int Skip the first `skiprows` lines. usecols : sequence Which columns to read, with 0 being the first. For example, ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns. unpack : bool If True, the returned array is transposed, so that arguments may be unpacked using ``x, y, z = loadtxt(...)`` Returns ------- out : ndarray Data read from the text file. See Also -------- scipy.io.loadmat : reads Matlab(R) data files Examples -------- >>> from StringIO import StringIO # StringIO behaves like a file object >>> c = StringIO("0 1\n2 3") >>> np.loadtxt(c) array([[ 0., 1.], [ 2., 3.]]) >>> d = StringIO("M 21 72\nF 35 58") >>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'), ... 'formats': ('S1', 'i4', 'f4')}) array([('M', 21, 72.0), ('F', 35, 58.0)], dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')]) >>> c = StringIO("1,0,2\n3,0,4") >>> x,y = np.loadtxt(c, delimiter=',', usecols=(0,2), unpack=True) >>> x array([ 1., 3.]) >>> y array([ 2., 4.])
Note that Comma Separated Value (CSV) files can be read by specifying the comma character as the delimiter for numpy.loadtxt.
NetCDF Files
SciPy can handle Network Common Data Form (NetCDF) files, which use a self-describing, machine-independent data format that supports the creation, access, and sharing of array-oriented scientific data.
NetCDF data is:
- Self-Describing. A netCDF file includes information about the data it contains.
- Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
- Scalable. A small subset of a large dataset may be accessed efficiently.
- Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
- Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
- Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.
The following is an ASCII representation of a netCDF file that shows the type of information that can be stored in netCDF files:
netcdf sfc_pres_temp { dimensions: latitude = 6 ; longitude = 12 ; variables: float latitude(latitude) ; latitude:units = "degrees_north" ; float longitude(longitude) ; longitude:units = "degrees_east" ; float pressure(latitude, longitude) ; pressure:units = "hPa" ; float temperature(latitude, longitude) ; temperature:units = "celsius" ; data: latitude = 25, 30, 35, 40, 45, 50 ; longitude = -125, -120, -115, -110, -105, -100, -95, -90, -85, -80, -75, -70 ; pressure = 900, 906, 912, 918, 924, 930, 936, 942, 948, 954, 960, 966, 901, 907, 913, 919, 925, 931, 937, 943, 949, 955, 961, 967, 902, 908, 914, 920, 926, 932, 938, 944, 950, 956, 962, 968, 903, 909, 915, 921, 927, 933, 939, 945, 951, 957, 963, 969, 904, 910, 916, 922, 928, 934, 940, 946, 952, 958, 964, 970, 905, 911, 917, 923, 929, 935, 941, 947, 953, 959, 965, 971 ; temperature = 9, 10.5, 12, 13.5, 15, 16.5, 18, 19.5, 21, 22.5, 24, 25.5, 9.25, 10.75, 12.25, 13.75, 15.25, 16.75, 18.25, 19.75, 21.25, 22.75, 24.25, 25.75, 9.5, 11, 12.5, 14, 15.5, 17, 18.5, 20, 21.5, 23, 24.5, 26, 9.75, 11.25, 12.75, 14.25, 15.75, 17.25, 18.75, 20.25, 21.75, 23.25, 24.75, 26.25, 10, 11.5, 13, 14.5, 16, 17.5, 19, 20.5, 22, 23.5, 25, 26.5, 10.25, 11.75, 13.25, 14.75, 16.25, 17.75, 19.25, 20.75, 22.25, 23.75, 25.25, 26.75 ; }
MATLAB Data Files
SciPy can handle MATLAB format data files.
See the MATLAB web site for details on MATLAB format files.
Matrix Market Files
SciPy can handle Matrix Market (MM) format files, which are a set of human readable, ASCII-based file formats designed to facilitate the exchange of matrix data.
If you had the following sparse matrix,
1 0 0 6 0 0 10.5 0 0 0 0 0 .015 0 0 0 250.5 0 -280 33.32 0 0 0 0 12
then it would be represented as follows:
%%MatrixMarket matrix coordinate real general %================================================================================= % % This ASCII file represents a sparse MxN matrix with L % nonzeros in the following Matrix Market format: % % +----------------------------------------------+ % |%%MatrixMarket matrix coordinate real general | <--- header line % |% | <--+ % |% comments | |-- 0 or more comment lines % |% | <--+ % | M N L | <--- rows, columns, entries % | I1 J1 A(I1, J1) | <--+ % | I2 J2 A(I2, J2) | | % | I3 J3 A(I3, J3) | |-- L lines % | . . . | | % | IL JL A(IL, JL) | <--+ % +----------------------------------------------+ % % Indices are 1-based, i.e. A(1,1) is the first element. % %================================================================================= 5 5 8 1 1 1.000e+00 2 2 1.050e+01 3 3 1.500e-02 1 4 6.000e+00 4 2 2.505e+02 4 4 -2.800e+02 4 5 3.332e+01 5 5 1.200e+01
Databases
SciPy can interface with PyTables, a hierarchical database package designed to efficiently manage large amounts of data using HDF5.
Pipes
Scipy does not have any direct support for pipes.
Python, however, can read the standard output of external pipes and write to standard input for other pipes to use. So, you could use the Python code that calls scipy to interact with pipes in the normal way.
Sockets
SciPy does not have any direct support for sockets.
Python, however, does with the standard socket module. So, you could use Python to interact with a socket using the standard Python socket module.
Ways to visualize data
There are several packages available to produce interactive screen graphics (use the mouse to zoom, orient, and fine-tune) and publication-quality printed plots, in 2D, 3D, and 4D (animations):
- 2D Plotting with Matplotlib: Matplotlib is the preferred package for 2D graphics.
- 3D Plotting with Matplotlib: Simple 3D plots using matplotlib and its now-included 3D capabilities.
- 3D plotting with Mayavi: Advanced 3D data visualization with MayaVi2 (and TVTK): a very powerful interactive scientific data visualizer.
- Python Imaging Library: Create/manipulate images as numpy array's.
- Plotting with xplt: xplt is very fast but less flexible than matplotlib. It allows simple 3-d surface visualizations as well. It is based on pygist (included) and is available under the sandbox directory in SVN scipy.
- Mat3d: Simple 3D plotting using an OpenGL backend.
- Line Integral Convolution code in cython: for visualizing vector fields
- VTK volume rendering: This is a simple example that show how to use VTK to volume render your three dimensional numpy arrays.
Ways to analyze data
SciPy has the following capabilities related to analyzing data:
- stats module: Basic statistical functions.
- leastsq function: Mathematical method that finds the parameters that give an optimal fit to real data using the Levenberg-Marquandt algorithm for non-linear least-squares optimization.
- Interface to R for Advanced Data Analysis: Via RPy, SciPy can interface to the R statistical package for more advanced data analysis. R is an open-source data analysis and statistics program, with an incredible variety of packages for all sorts of analyses but a steep learning curve to use.