A Discrete-Event Network Simulator
API
Loading...
Searching...
No Matches
csv-reader.h
Go to the documentation of this file.
1/*
2 * Copyright (c) 2019 Lawrence Livermore National Laboratory
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License version 2 as
6 * published by the Free Software Foundation;
7 *
8 * This program is distributed in the hope that it will be useful,
9 * but WITHOUT ANY WARRANTY; without even the implied warranty of
10 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 * GNU General Public License for more details.
12 *
13 * You should have received a copy of the GNU General Public License
14 * along with this program; if not, write to the Free Software
15 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
16 *
17 * Author: Mathew Bielejeski <bielejeski1@llnl.gov>
18 */
19
20#ifndef NS3_CSV_READER_H_
21#define NS3_CSV_READER_H_
22
23#include <cstddef>
24#include <cstdint>
25#include <fstream>
26#include <istream>
27#include <string>
28#include <vector>
29
30/**
31 * \file
32 * \ingroup csvreader
33 *
34 * ns3::CsvReader declaration
35 *
36 */
37namespace ns3
38{
39
40/**
41 * \ingroup core
42 * \defgroup csvreader CSV File Reader
43 *
44 * A way to extract data from simple csv files.
45 */
46
47/**
48 * \ingroup csvreader
49 *
50 * Provides functions for parsing and extracting data from
51 * Comma Separated Value (CSV) formatted text files.
52 * This parser is somewhat more relaxed than \RFC{4180};
53 * see below for a list of the differences.
54 * In particular it is possible to set the delimiting character at construction,
55 * enabling parsing of tab-delimited streams or other formats with delimiters.
56 *
57 * \note Excel may generate "CSV" files with either ',' or ';' delimiter
58 * depending on the locale: if ',' is the decimal mark then ';' is the list
59 * separator and used to read/write "CSV" files.
60 *
61 * To use this facility, construct a CsvReader from either a file path
62 * or \c std::istream, then FetchNextRow(), and finally GetValue()
63 * to extract specific values from the row.
64 *
65 * For example:
66 * \code
67 * CsvReader csv (filePath);
68 * while (csv.FetchNextRow ())
69 * {
70 * // Ignore blank lines
71 * if (csv.IsBlankRow ())
72 * {
73 * continue;
74 * }
75 *
76 * // Expecting three values
77 * double x, y, z;
78 * bool ok = csv.GetValue (0, x);
79 * ok |= csv.GetValue (1, y);
80 * ok |= csv.GetValue (2, z);
81 * if (!ok)
82 * {
83 * // Handle error, then
84 * continue;
85 * }
86 *
87 * // Do something with values
88 *
89 * } // while FetchNextRow
90 * \endcode
91 *
92 * As another example, supposing we need a vector from each row,
93 * the middle of the previous example would become:
94 * \code
95 * std::vector<double> v (n);
96 * bool ok = true;
97 * for (std::size_t i = 0; i < v.size (); ++i)
98 * {
99 * ok |= csv.GetValue (i, v[i]);
100 * }
101 * if (!ok) ...
102 * \endcode
103 *
104 *
105 * File Format
106 * ===========
107 *
108 * This parser implements \RFC{4180}, but with several restrictions removed;
109 * see below for differences. All the formatting features described next
110 * are illustrated in the examples which which follow.
111 *
112 * Comments
113 * --------
114 *
115 * The hash character (#) is used to indicate the start of a comment. Comments
116 * are not parsed by the reader. Comments are treated as either an empty column
117 * or part of an existing column depending on where the comment is located.
118 * Comments that are found at the end of a line containing data are ignored.
119 *
120 * 1,2 # This comment ignored, leaving two data columns
121 *
122 * Lines that contain a comment and no data are treated as rows with a single
123 * empty column, meaning that ColumnCount will return 1 and
124 * GetValue() will return an empty string.
125 *
126 * # This row treated as a single empty column, returning an empty string.
127 * "" # So is this
128 *
129 * IsBlankRow() will return \c true in either of these cases.
130 *
131 * Quoted Columns
132 * --------------
133 *
134 * Columns with string data which contain the delimiter character or
135 * the hash character can be wrapped in double quotes to prevent CsvReader
136 * from treating them as special characters.
137 *
138 * 3,string without delimiter,"String with comma ',' delimiter"
139 *
140 * Double quotes can be escaped
141 * by doubling up the quotes inside a quoted field. See example 6 below for
142 * a demonstration.
143 *
144 * Whitespace
145 * ----------
146 *
147 * Leading and trailing whitespace are ignored by the reader and are not
148 * stored in the column data.
149 *
150 * 4,5 , 6 # Columns contain '4', '5', '6'
151 *
152 * If leading or trailing whitespace are important
153 * for a column, wrap the column in double quotes as discussed above.
154 *
155 * 7,"8 "," 9" # Columns contain '7', '8 ', ' 9'
156 *
157 * Trailing Delimiter
158 * ------------------
159 *
160 * Trailing delimiters are ignored; they do _not_ result in an empty column.
161 *
162 *
163 * Differences from RFC 4180
164 * -------------------------
165 * Section 2.1
166 * - Line break can be LF or CRLF
167 *
168 * Section 2.3
169 * - Non-parsed lines are allowed anywhere, not just as a header.
170 * - Lines do not all have to contain the same number fields.
171 *
172 * Section 2.4
173 * - Characters other than comma can be used to separate fields.
174 * - Lines do not all have to contain the same number fields.
175 * - Leading/trailing spaces are stripped from the field
176 * unless the whitespace is wrapped in double quotes.
177 * - A trailing delimiter on a line is not an error.
178 *
179 * Section 2.6
180 * - Quoted fields cannot contain line breaks
181 *
182 * Examples
183 * --------
184 * \par Example 1: Basic
185 * \code
186 * # Column 1: Product
187 * # Column 2: Price
188 * widget, 12.5
189 * \endcode
190 *
191 * \par Example 2: Comment at end of line
192 * \code
193 * # Column 1: Product
194 * # Column 2: Price
195 * broken widget, 12.5 # this widget is broken
196 * \endcode
197 *
198 * \par Example 3: Delimiter in double quotes
199 * \code
200 * # Column 1: Product
201 * # Column 2: Price
202 * # Column 3: Count
203 * # Column 4: Date
204 * widget, 12.5, 100, "November 6, 2018"
205 * \endcode
206 *
207 * \par # Example 4: Hash character in double quotes
208 * \code
209 * # Column 1: Key
210 * # Column 2: Value
211 * # Column 3: Description
212 * count, 5, "# of widgets currently in stock"
213 * \endcode
214 *
215 * \par Example 5: Extra whitespace
216 * \code
217 * # Column 1: Key
218 * # Column 2: Value
219 * # Column 3: Description
220 * count , 5 ,"# of widgets in stock"
221 * \endcode
222 *
223 * \par Example 6: Escaped quotes
224 * \code
225 * # Column 1: Key
226 * # Column 2: Description
227 * # The value returned for Column 2 will be: String with "embedded" quotes
228 * foo, "String with ""embedded"" quotes"
229 * \endcode
230 */
232{
233 public:
234 /**
235 * Constructor
236 *
237 * Opens the file specified in the filepath argument and
238 * reads data from it.
239 *
240 * \param filepath Path to a file containing CSV data.
241 * \param delimiter Character used to separate fields in the data file.
242 */
243 CsvReader(const std::string& filepath, char delimiter = ',');
244
245 /**
246 * Constructor
247 *
248 * Reads csv data from the supplied input stream.
249 *
250 * \param stream Input stream containing csv data.
251 * \param delimiter Character used to separate fields in the data stream.
252 */
253 CsvReader(std::istream& stream, char delimiter = ',');
254
255 /**
256 * Destructor
257 */
258 virtual ~CsvReader();
259
260 /**
261 * Returns the number of columns in the csv data.
262 *
263 * \return Number of columns
264 */
265 std::size_t ColumnCount() const;
266
267 /**
268 * The number of lines that have been read.
269 *
270 * \return The number of lines that have been read.
271 */
272 std::size_t RowNumber() const;
273
274 /**
275 * Returns the delimiter character specified during object construction.
276 *
277 * \return Character used as the column separator.
278 */
279 char Delimiter() const;
280
281 /**
282 * Reads one line from the input until a new line is encountered.
283 * The read data is stored in a cache which is accessed by the
284 * GetValue functions to extract fields from the data.
285 *
286 * \return \c true if a line was read successfully or \c false if the
287 * read failed or reached the end of the file.
288 */
289 bool FetchNextRow();
290
291 /**
292 * Attempt to convert from the string data in the specified column
293 * to the specified data type.
294 *
295 * \tparam T The data type of the output variable.
296 *
297 * \param [in] columnIndex Index of the column to fetch.
298 * \param [out] value Location where the converted data will be stored.
299 *
300 * \return \c true if the specified column has data and the data
301 * was converted to the specified data type.
302 */
303 template <class T>
304 bool GetValue(std::size_t columnIndex, T& value) const;
305
306 /**
307 * Check if the current row is blank.
308 * A blank row can consist of any combination of
309 *
310 * - Whitespace
311 * - Comment
312 * - Quoted empty string `""`
313 *
314 * \returns \c true if the input row is a blank line.
315 */
316 bool IsBlankRow() const;
317
318 private:
319 /**
320 * Attempt to convert from the string data stored at the specified column
321 * index into the specified type.
322 *
323 * \param input [in] String value to be converted.
324 * \param value [out] Location where the converted value will be stored.
325 *
326 * \return \c true if the column exists and the conversion succeeded,
327 * \c false otherwise.
328 */
329 /** @{ */
330 bool GetValueAs(std::string input, double& value) const;
331
332 bool GetValueAs(std::string input, float& value) const;
333
334 bool GetValueAs(std::string input, signed char& value) const;
335
336 bool GetValueAs(std::string input, short& value) const;
337
338 bool GetValueAs(std::string input, int& value) const;
339
340 bool GetValueAs(std::string input, long& value) const;
341
342 bool GetValueAs(std::string input, long long& value) const;
343
344 bool GetValueAs(std::string input, std::string& value) const;
345
346 bool GetValueAs(std::string input, unsigned char& value) const;
347
348 bool GetValueAs(std::string input, unsigned short& value) const;
349
350 bool GetValueAs(std::string input, unsigned int& value) const;
351
352 bool GetValueAs(std::string input, unsigned long& value) const;
353
354 bool GetValueAs(std::string input, unsigned long long& value) const;
355 /** @} */
356
357 /**
358 * Returns \c true if the supplied character matches the delimiter.
359 *
360 * \param c Character to check.
361 * \return \c true if \pname{c} is the delimiter character,
362 * \c false otherwise.
363 */
364 bool IsDelimiter(char c) const;
365
366 /**
367 * Scans the string and splits it into individual columns based on the delimiter.
368 *
369 * \param [in] line String containing delimiter separated data.
370 */
371 void ParseLine(const std::string& line);
372
373 /**
374 * Extracts the data for one column in a csv row.
375 *
376 * \param begin Iterator to the first character in the row.
377 * \param end Iterator to the last character in the row.
378 * \return A tuple containing the content of the column and an iterator
379 * pointing to the position in the row where the column ended.
380 */
381 std::tuple<std::string, std::string::const_iterator> ParseColumn(
382 std::string::const_iterator begin,
383 std::string::const_iterator end);
384
385 /**
386 * Container of CSV data. Each entry represents one field in a row
387 * of data. The fields are stored in the same order that they are
388 * encountered in the CSV data.
389 */
390 typedef std::vector<std::string> Columns;
391
392 char m_delimiter; //!< Character used to separate fields.
393 std::size_t m_rowsRead; //!< Number of lines processed.
394 Columns m_columns; //!< Fields extracted from the current line.
395 bool m_blankRow; //!< Line contains no data (blank line or comment only).
396 std::ifstream m_fileStream; //!< File stream containing the data.
397
398 /**
399 * Pointer to the input stream containing the data.
400 */
401 std::istream* m_stream;
402
403}; // class CsvReader
404
405/****************************************************
406 * Template implementations.
407 ***************************************************/
408
409template <class T>
410bool
411CsvReader::GetValue(std::size_t columnIndex, T& value) const
412{
413 if (columnIndex >= ColumnCount())
414 {
415 return false;
416 }
417
418 std::string cell = m_columns[columnIndex];
419
420 return GetValueAs(std::move(cell), value);
421}
422
423} // namespace ns3
424
425#endif // NS3_CSV_READER_H_
Provides functions for parsing and extracting data from Comma Separated Value (CSV) formatted text fi...
Definition: csv-reader.h:232
virtual ~CsvReader()
Destructor.
Definition: csv-reader.cc:92
bool GetValue(std::size_t columnIndex, T &value) const
Attempt to convert from the string data in the specified column to the specified data type.
Definition: csv-reader.h:411
std::size_t RowNumber() const
The number of lines that have been read.
Definition: csv-reader.cc:105
char Delimiter() const
Returns the delimiter character specified during object construction.
Definition: csv-reader.cc:113
std::istream * m_stream
Pointer to the input stream containing the data.
Definition: csv-reader.h:401
bool IsDelimiter(char c) const
Returns true if the supplied character matches the delimiter.
Definition: csv-reader.cc:298
void ParseLine(const std::string &line)
Scans the string and splits it into individual columns based on the delimiter.
Definition: csv-reader.cc:306
std::size_t ColumnCount() const
Returns the number of columns in the csv data.
Definition: csv-reader.cc:97
std::size_t m_rowsRead
Number of lines processed.
Definition: csv-reader.h:393
std::ifstream m_fileStream
File stream containing the data.
Definition: csv-reader.h:396
bool m_blankRow
Line contains no data (blank line or comment only).
Definition: csv-reader.h:395
bool FetchNextRow()
Reads one line from the input until a new line is encountered.
Definition: csv-reader.cc:121
std::vector< std::string > Columns
Container of CSV data.
Definition: csv-reader.h:390
bool IsBlankRow() const
Check if the current row is blank.
Definition: csv-reader.cc:152
Columns m_columns
Fields extracted from the current line.
Definition: csv-reader.h:394
bool GetValueAs(std::string input, double &value) const
Attempt to convert from the string data stored at the specified column index into the specified type.
Definition: csv-reader.cc:158
char m_delimiter
Character used to separate fields.
Definition: csv-reader.h:392
std::tuple< std::string, std::string::const_iterator > ParseColumn(std::string::const_iterator begin, std::string::const_iterator end)
Extracts the data for one column in a csv row.
Definition: csv-reader.cc:336
Every class exported by the ns3 library is enclosed in the ns3 namespace.