randolf.ca  1.00
Randolf Richardson's C++ classes
Loading...
Searching...
No Matches
Atomize
1#pragma once
2
3#include <algorithm>
4#include <atomic>
5#include <cstring>
6#include <exception>
7#include <iostream>
8#include <vector>
9
10namespace randolf {
11
12 /*======================================================================*//**
13 @brief
14 The @ref Atomize class provides an object-oriented interface with array-style
15 access to a string, that was efficiently separated into atoms, and with more
16 granularity and functionality through the use of modes (see @ref mode for
17 deatils) and certain API methods, hence one could say "Atomize can split your
18 atoms safely."
19
20 When parsing a line or block of text, the following is assumed:
21
22 - parameters are separated by one or multiple consecutive whitespace
23 characters (space, null, tab, linefeed, carriage return)
24
25 - values that are enclosed within a set of quotation marks may include
26 whitespace characters that will then not be interpreted as delimiters
27
28 Data is interpreted in a single pass during instantiation or assignment, and
29 the interpretation algorithm is written in an optimized programming style to
30 ensure high efficiency.
31
32 There are no memory leaks, which, as it turns out, is particularly important
33 not only because the specialized parsing involved is often utilized in heavy
34 data processing loops where speed and reliability are needed, but also
35 because some of the libraries I've tested that provide similar functionality
36 leak memory or fail in other ways that were triggered by peculiar or fuzzed
37 data, which this class is not impacted by (primarily because I don't trust
38 data to always be "as expected").
39
40 @par Use case
41
42 Parsing command lines or configuration settings can become challenging when
43 multiple parameters are provided on a single line, and some of those
44 parameters include quoted text that contains spaces. This class handles all
45 of these scenarios and makes it easy to access each parameter in the same
46 manner that arrays and vectors are accessed.
47
48 @par Background
49
50 I created this class to make it easier to write internet server daemons.
51
52 @par Getting started
53
54 @author Randolf Richardson
55 @version 1.00
56 @par History
57 2025-Jan-20 v1.00 Initial version
58
59 @par Conventions
60 Lower-case letter "h" is regularly used in partial example code to represent
61 an instantiated rhostname object.
62
63 An ASCIIZ string is a C-string (char* array) that includes a terminating null
64 (0) character at the end.
65
66 @par Notes
67
68 I use the term "ASCIIZ string" to indicate an array of characters that's
69 terminated by a 0 (a.k.a., null). Although this is very much the same as a
70 C-string, the difference is that in many API functions a C-string must often
71 be accompanied by its length value. When referring to an ASCIIZ string, I'm
72 intentionally indicating that the length of the string is not needed because
73 the string is null-terminated. (This term was also commonly used in assembly
74 language programming in the 1970s, 1980s, and 1990s, and as far as I know is
75 still used by machine language programmers today.)
76
77 @par Examples
78
79 @code{.cpp}
80 #include <iostream> // std::cout, std::cerr, std::endl, etc.
81
82 #include <randolf/Atomize>
83
84 int main(int argc, char *argv[]) {
85 randolf::Atomize a("parameters key=value");
86 std::cout << "atom0: " << a.at(0) << std::endl;
87 std::cout << "atom1: " << a.at(1) << std::endl;
88 std::cout << "key1: " << a.at(1, 'k') << std::endl;
89 std::cout << "val1: " << a.at(1, 'v') << std::endl;
90 return EXIT_SUCCESS;
91 } // -x- int main -x-
92 @endcode
93
94 Parameter stacking is also supported (with methods that return @c Atomize*).
95 *///=========================================================================
96 class Atomize {
97
98 private:
99 // --------------------------------------------------------------------------
100 // Internal structures.
101 // --------------------------------------------------------------------------
102 struct __atom {
103 uint fpos = 0; // First position/offset (key also begins here)
104 uint flen = 0; // Full length
105 uint klen = 0; // Key length
106 uint vpos = 0; // Value position/offset (0 == not present)
107 uint vlen = 0; // Value length (do not use this to test for presence of value)
108 }; // -x- struct __atom -x-
109
110 // --------------------------------------------------------------------------
111 // Internal variables.
112 // --------------------------------------------------------------------------
113 char* __origin = nullptr; // Copy of original string
114 std::vector<__atom*> __data_points;
115 int __flags = 0;
116
117 // --------------------------------------------------------------------------
118 // Operator modes (all are disabled by default).
119 // --------------------------------------------------------------------------
120 bool mode_k = false; // Key
121 bool mode_v = false; // Value
122 bool mode_p = false; // Presence of key-value pair
123 bool mode_c = false; // Camel_Case
124 bool mode_f = false; // First letter upper-case
125 bool mode_l = false; // All lower-case
126 bool mode_u = false; // All upper-case
127
128 public:
129 /*======================================================================*//**
130 @brief
131 Optional flags that alter, modify, or enhance the operation of atomization
132 intake.
133 *///=========================================================================
134 enum ATOMIZE_FLAGS: int {
135
136 /*----------------------------------------------------------------------*//**
137 The ATOMIZE_DEFAULT flag isn't necessary, but it's included here for
138 completeness as it accomodates programming styles that prefer to emphasize
139 when defaults are being relied upon.
140 *///-------------------------------------------------------------------------
141 ATOMIZE_DEFAULT = 0,
142
143 /*----------------------------------------------------------------------*//**
144 Interpret all quotation marks (the default is to only utilize enclosing
145 quotation marks).
146 *///-------------------------------------------------------------------------
147 ATOMIZE_USE_ALL_QUOTES = 1,
148
149 /*----------------------------------------------------------------------*//**
150 Don't interpret quotation marks as grouping characters.
151 *///-------------------------------------------------------------------------
152 ATOMIZE_IGNORE_QUOTES = 2,
153
154 /*----------------------------------------------------------------------*//**
155 Delete quotation marks that function as grouping characters (this flag has no
156 effect when @ref ATOMIZE_IGNORE_QUOTES is set).
157 *///-------------------------------------------------------------------------
158 ATOMIZE_DELETE_QUOTES = 4,
159
160 }; // -x- enum ATOMIZE_FLAGS -x-
161
162 private:
163 /*======================================================================*//**
164 @brief
165 Assign new string.
166 *///=========================================================================
167 void __assign(
168 /// The intake ASCIIZ string
169 const char* intake,
170 /// The length of the intake string
171 const int len) {
172
173 // --------------------------------------------------------------------------
174 // Internal variables.
175 // --------------------------------------------------------------------------
176 __atom* atom = new __atom{}; // Data structure that will keep being replaced
177 bool k = false; // Key-value pair detection flag
178 bool q = false; // Begin in non-quote mode
179 bool w = true; // Begin in whitespace mode to skip any leading whitespace
180
181 // --------------------------------------------------------------------------
182 // Allocate internal memory.
183 // --------------------------------------------------------------------------
184 if (__origin != nullptr) __clear(); //::free(__origin); // Memory management
185 __origin = (char*)::calloc(1, len + 1); // Allocate memory
186
187 // --------------------------------------------------------------------------
188 // Primary loop.
189 // --------------------------------------------------------------------------
190 for (int i = 0; i < len; i++) {
191
192 // --------------------------------------------------------------------------
193 // Extract current charact and copy original string at the same time, which
194 // also creates an optimization opportunity for compilers to store "c" in a
195 // register, which is faster than accessing a memory location.
196 // --------------------------------------------------------------------------
197 char c = __origin[i] = intake[i];
198
199 // --------------------------------------------------------------------------
200 // Process character.
201 // --------------------------------------------------------------------------
202 switch (c) {
203 case '\0': // Whitespace: NULL
204 //if (len == 0) break; // End of ASCIIZ string
205 [[fallthrough]];
206 case ' ': // Whitespace: Space
207 [[fallthrough]];
208 case '\t': // Whitespace: Tab
209 [[fallthrough]];
210 case '\r': // Whitespace: Carriage Return
211 [[fallthrough]];
212 case '\n': // Whitespace: Linefeed
213 if (!w) { // End of atom
214 if (k) atom->vlen = i - atom->vpos;
215 else atom->klen = i - atom->fpos;
216 __data_points.push_back(atom); // Save current atom to vector
217 atom = new __atom{.fpos = (uint)i}; // Create new atom
218 k = false; // Disable key-value pair mode
219 w = true; // Enable whitespace mode
220 } // -x- if !w -x-
221 break;
222 case '"': //
223 goto __assign_default;
224 case '=': // Key-value pair detected
225 if (!k) { // Key-value pair mode is not already detected
226 k = true; // Enable key-value pair mode
227 atom->klen = w ? 0 : i - atom->fpos;
228 atom->vpos = i + 1; // Save position of value
229 } // -x- if !k -x-
230 __assign_default:
231 [[fallthrough]];
232 default: // Non-whitespace characters
233 if (w) { // White space mode is enabled
234 atom->fpos = i; // Save new starting position
235 w = false; // Disable whitespace mode
236 } // -x- if w -x-
237 if (k) atom->vlen++;
238 else atom->klen++;
239 atom->flen++;
240 } // -x- swtich str -x-
241
242 } // -x- for i -x-
243
244 // --------------------------------------------------------------------------
245 // Save final atom if it's not empty.
246 // --------------------------------------------------------------------------
247 if (atom->flen != 0) __data_points.push_back(atom);
248 else delete atom;
249
250 } // -x- void __assign -x-
251
252 /*======================================================================*//**
253 @brief
254 Clear internal data, but not flags or modes. Called by both the destructor,
255 the clear() method, and the various assign() methods.
256 *///=========================================================================
257 void __clear() {
258
259 // --------------------------------------------------------------------------
260 // Delete all data entry points. These need to be deleted separately because
261 // the std::vector won't automatically delete structures.
262 // --------------------------------------------------------------------------
263 for (int i = __data_points.size() - 1; i >= 0; i--)
264 delete __data_points.at(i);
265 __data_points.clear();
266
267 // --------------------------------------------------------------------------
268 // Free the original string, if one has been allocated (in most cases, there
269 // will be allocated memory here that needs to be freed).
270 // --------------------------------------------------------------------------
271 if (__origin != nullptr) {
272 ::free(__origin); // Memory management
273 __origin = nullptr;
274 } // -x- if !__origin -x-
275
276 } // -x- void __clear -x-
277
278 public:
279 /*======================================================================*//**
280 @brief
281 Instantiate an empty Atomize object, which is expected to be used with the
282 @ref assign method at some later point. (This is particularly useful for
283 defining a local Atomize object in a header file in a way that won't throw an
284 exception, including invalid mode codes {which will just be ignored}.)
285 *///=========================================================================
286 Atomize(
287 /// See @ref ATOMIZE_FLAGS for a list of options
288 const int flags = ATOMIZE_DEFAULT,
289 /// Set the modes (@c nullptr default means don't set the modes) @n
290 /// Granulatarity (default is to return the entire atom): @n
291 /// @c "\0" = entire atom (default) @n
292 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
293 /// @c "v" = value (will be empty if no key-value pair was detected) @n
294 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
295 /// Conversion options (default is for no conversion): @n
296 /// @c "c" = Camel_Case @n
297 /// @c "f" = First character in upper-case @n
298 /// @c "l" = all lower case @n
299 /// @c "u" = ALL UPPER CASE
300 const char* mode = nullptr) noexcept {
301 __flags = flags;
302 if (mode != nullptr) __mode(mode, false);
303 } // -x- constructor Atomize -x-
304
305 /*======================================================================*//**
306 @brief
307 Instantiate an empty Atomize object, which is expected to be used with the
308 @ref assign method at some later point. (This is particularly useful for
309 defining a local Atomize object in a header file in a way that won't throw an
310 exception, including invalid mode codes {which will just be ignored}.)
311 *///=========================================================================
312 Atomize(
313 /// See @ref ATOMIZE_FLAGS for a list of options
314 const int flags,
315 /// Set the modes (@c 0 default means don't set the modes) @n
316 /// Granulatarity (default is to return the entire atom): @n
317 /// @c "\0" = entire atom (default) @n
318 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
319 /// @c "v" = value (will be empty if no key-value pair was detected) @n
320 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
321 /// Conversion options (default is for no conversion): @n
322 /// @c "c" = Camel_Case @n
323 /// @c "f" = First character in upper-case @n
324 /// @c "l" = all lower case @n
325 /// @c "u" = ALL UPPER CASE
326 const char mode) noexcept {
327 __flags = flags;
328 if (mode != 0) {
329 char new_mode[]{mode, 0};
330 __mode(new_mode, false);
331 } // -x- if mode -x-
332 } // -x- constructor Atomize -x-
333
334 /*======================================================================*//**
335 @brief
336 Instantiate an Atomize object using the specified ASCIIZ string for intake.
337 @throws std::invalid_argument If the parameters are malformed in some way.
338 *///=========================================================================
339 Atomize(
340 /// The intake ASCIIZ string
341 const char* intake,
342 /// The length of the intake string@n
343 /// -1 = Measure ASCIIZ string
344 const int len = -1,
345 /// See @ref ATOMIZE_FLAGS for a list of options
346 const int flags = ATOMIZE_DEFAULT,
347 /// Set the modes (@c nullptr default means don't set the modes) @n
348 /// Granulatarity (default is to return the entire atom): @n
349 /// @c "\0" = entire atom (default) @n
350 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
351 /// @c "v" = value (will be empty if no key-value pair was detected) @n
352 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
353 /// Conversion options (default is for no conversion): @n
354 /// @c "c" = Camel_Case @n
355 /// @c "f" = First character in upper-case @n
356 /// @c "l" = all lower case @n
357 /// @c "u" = ALL UPPER CASE
358 const char* mode = nullptr) {
359 __flags = flags;
360 if (mode != nullptr) __mode(mode);
361 __assign(intake, len >= 0 ? len : std::strlen(intake));
362 } // -x- constructor Atomize -x-
363
364 /*======================================================================*//**
365 @brief
366 Instantiate an Atomize object using the specified ASCIIZ string for intake.
367 @throws std::invalid_argument If the parameters are malformed in some way.
368 *///=========================================================================
369 Atomize(
370 /// The intake ASCIIZ string
371 const char* intake,
372 /// The length of the intake string@n
373 /// -1 = Measure ASCIIZ string
374 const int len,
375 /// See @ref ATOMIZE_FLAGS for a list of options
376 const int flags,
377 /// Set the modes (@c 0 default means don't set the modes) @n
378 /// Granulatarity (default is to return the entire atom): @n
379 /// @c "\0" = entire atom (default) @n
380 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
381 /// @c "v" = value (will be empty if no key-value pair was detected) @n
382 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
383 /// Conversion options (default is for no conversion): @n
384 /// @c "c" = Camel_Case @n
385 /// @c "f" = First character in upper-case @n
386 /// @c "l" = all lower case @n
387 /// @c "u" = ALL UPPER CASE
388 const char mode) {
389 __flags = flags;
390 if (mode != 0) {
391 char new_mode[]{mode, 0};
392 __mode(new_mode, false);
393 } // -x- if mode -x-
394 __assign(intake, len >= 0 ? len : std::strlen(intake));
395 } // -x- constructor Atomize -x-
396
397 /*======================================================================*//**
398 @brief
399 Instantiate an Atomize object using the specified string for intake.
400 @throws std::invalid_argument If the parameters are malformed in some way.
401 *///=========================================================================
402 Atomize(
403 /// The intake C++ string
404 const std::string intake,
405 /// The length of the intake string@n
406 /// -1 = Obtain length from @c intake.size() method
407 const int len = -1,
408 /// See @ref ATOMIZE_FLAGS for a list of options
409 const int flags = ATOMIZE_DEFAULT,
410 /// Set the modes (@c nullptr default means don't set the modes) @n
411 /// Granulatarity (default is to return the entire atom): @n
412 /// @c "\0" = entire atom (default) @n
413 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
414 /// @c "v" = value (will be empty if no key-value pair was detected) @n
415 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
416 /// Conversion options (default is for no conversion): @n
417 /// @c "c" = Camel_Case @n
418 /// @c "f" = First character in upper-case @n
419 /// @c "l" = all lower case @n
420 /// @c "u" = ALL UPPER CASE
421 const char* mode = nullptr) {
422 __flags = flags;
423 if (mode != nullptr) __mode(mode);
424 __assign(intake.data(), len >= 0 ? len : intake.size());
425 } // -x- constructor Atomize -x-
426
427 /*======================================================================*//**
428 @brief
429 Instantiate an Atomize object using the specified string for intake.
430 @throws std::invalid_argument If the parameters are malformed in some way.
431 *///=========================================================================
432 Atomize(
433 /// The intake C++ string
434 const std::string intake,
435 /// The length of the intake string@n
436 /// -1 = Obtain length from @c intake.size() method
437 const int len,
438 /// See @ref ATOMIZE_FLAGS for a list of options
439 const int flags,
440 /// Set the modes (@c 0 default means don't set the modes) @n
441 /// Granulatarity (default is to return the entire atom): @n
442 /// @c "\0" = entire atom (default) @n
443 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
444 /// @c "v" = value (will be empty if no key-value pair was detected) @n
445 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
446 /// Conversion options (default is for no conversion): @n
447 /// @c "c" = Camel_Case @n
448 /// @c "f" = First character in upper-case @n
449 /// @c "l" = all lower case @n
450 /// @c "u" = ALL UPPER CASE
451 const char mode) {
452 __flags = flags;
453 if (mode != 0) {
454 char new_mode[]{mode, 0};
455 __mode(new_mode, false);
456 } // -x- if mode -x-
457 __assign(intake.data(), len >= 0 ? len : intake.size());
458 } // -x- constructor Atomize -x-
459
460 /*======================================================================*//**
461 @brief
462 Destructor.
463 *///=========================================================================
464 ~Atomize() noexcept {
465 __clear();
466 } // -x- constructor Atomize -x-
467
468 /*======================================================================*//**
469 @brief
470 Assign (and interpret) a new ASCIIZ string (flags and modes are inherited).
471 @throws std::invalid_argument If the parameters are malformed in some way.
472 @returns The same Atomize object so as to facilitate stacking
473 *///=========================================================================
474 Atomize* assign(
475 /// The intake ASCIIZ string
476 const char* intake,
477 /// The length of the intake string@n
478 /// -1 = Measure ASCIIZ string
479 const int len = -1) {
480 __clear();
481 __assign(intake, len >= 0 ? len : std::strlen(intake));
482 return this;
483 } // -x- Atomize* assign -x-
484
485 /*======================================================================*//**
486 @brief
487 Assign (and interpret) a new string (flags and modes are inherited).
488 @throws std::invalid_argument If the parameters are malformed in some way.
489 @returns The same Atomize object so as to facilitate stacking
490 *///=========================================================================
491 Atomize* assign(
492 /// The intake C++ string
493 const std::string intake,
494 /// The length of the intake string@n
495 /// -1 = Obtain length from @c intake.size() method
496 const int len = -1) {
497 __clear();
498 __assign(intake.data(), len >= 0 ? len : intake.size());
499 return this;
500 } // -x- Atomize* assign -x-
501
502 /*======================================================================*//**
503 @brief
504 Access to atoms, whilst utilizing the operator mode that was configured using
505 the @ref mode method.
506 Return an entire atom.
507 @throws std::out_of_range if the index is out-of-range
508 @returns Entire atom
509 @see get
510 @see get_key
511 @see get_value
512 @see operator[]
513 *///=========================================================================
514 std::string at(
515 /// Which atom to obtain (0 = first atom; negative values count backward from
516 /// the last atom in the internal array)
517 int index,
518 /// Temporarily override the current modes (@c nullptr default means don't
519 /// change modes) @n
520 /// Granulatarity (default is to return the entire atom): @n
521 /// @c "\0" = entire atom (default) @n
522 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
523 /// @c "v" = value (will be empty if no key-value pair was detected) @n
524 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
525 /// Conversion options (default is for no conversion): @n
526 /// @c "c" = Camel_Case @n
527 /// @c "f" = First character in upper-case @n
528 /// @c "l" = all lower case @n
529 /// @c "u" = ALL UPPER CASE
530 const char* mode = nullptr) {
531
532 // --------------------------------------------------------------------------
533 // Internal variables.
534 // --------------------------------------------------------------------------
535 if (index < 0) index = __data_points.size() + index;
536 __atom* atom = __data_points[index];
537 std::string previous_mode;
538
539 // --------------------------------------------------------------------------
540 // Save and change modes.
541 // --------------------------------------------------------------------------
542 if (mode != nullptr) {
543 previous_mode = this->mode(); // Save current mode
544 __mode(mode);
545 } // -x- if mode -x-
546
547 // --------------------------------------------------------------------------
548 // Presence of key-value pair (results in ignoring all other modes).
549 // --------------------------------------------------------------------------
550 if (mode_p) return atom->vpos != 0 ? "1" : "";
551
552 // --------------------------------------------------------------------------
553 // Key/Value mode.
554 // --------------------------------------------------------------------------
555 std::string str;
556 if (mode_k) str.assign(__origin, atom->fpos, atom->klen);
557 else if (mode_v) str.assign(__origin, atom->vpos, atom->vlen);
558 else str.assign(__origin, atom->fpos, atom->flen);
559
560 // --------------------------------------------------------------------------
561 // Conversion modes.
562 // --------------------------------------------------------------------------
563 char* data = str.data();
564 if (mode_l) { // All lower-case
565 for (int i = 0; i < str.size(); i++) {
566 char ch = data[i];
567 if (ch >= 'A' && ch <= 'Z') data[i] += 32; // Convert to lower-case
568 } // -x- for i -x-
569 } else if (mode_u) { // All upper-case
570 for (int i = 0; i < str.size(); i++) {
571 char ch = data[i];
572 if (ch >= 'a' && ch <= 'z') data[i] -= 32; // Convert to upper-case
573 } // -x- for i -x-
574 } else if (mode_c) { // Camel_Case
575 char pch = 0;
576 for (int i = 0; i < str.size(); i++) {
577 char ch = data[i];
578 if (pch < 'A' || (pch > 'Z' && pch < 'a') || pch > 'z') {
579 if (ch >= 'a' && ch <= 'z')
580 data[i] -= 32; // Convert to upper-case
581 } // -x- if pch -x-
582 pch = data[i];
583 } // -x- for i -x-
584 } else if (mode_f) { // First letter
585 char ch = data[0];
586 if (ch >= 'a' && ch <= 'z') data[0] -= 32; // Convert to upper-case
587 } // -x- if mode_c -x-
588
589 // --------------------------------------------------------------------------
590 // Restore previously saved modes.
591 // --------------------------------------------------------------------------
592 if (mode != nullptr) this->mode(previous_mode.data());
593
594 return str;
595 } // -x- std::string at -x-
596
597 /*======================================================================*//**
598 @brief
599 Clear this Atomize's underlying data and reset all states. This does not
600 reset nor alter flags or modes.
601 @returns The same Atomize object so as to facilitate stacking
602 *///=========================================================================
603 Atomize* clear() {
604 __clear();
605 return this;
606 } // -x- Atomize* clear -x-
607
608 /*======================================================================*//**
609 @brief
610 Confirm that there are no atoms.
611 @returns TRUE = no atoms@n
612 FALSE = at least one atom exists
613 @see size
614 *///=========================================================================
615 bool empty() {
616 return __data_points.empty();
617 } // -x- bool empty -x-
618
619 /*======================================================================*//**
620 @brief
621 Obtain current set of internal flags.
622 @returns Current flags, as defined in @ref ATOMIZE_FLAGS
623 @see flags(const int)
624 @see mode
625 *///=========================================================================
626 int flags() {
627 return __flags;
628 } // -x- int flags -x-
629
630 /*======================================================================*//**
631 @brief
632 Obtain current set of internal flags.
633 @returns The same Atomize object so as to facilitate stacking
634 @see flags
635 @see mode
636 *///=========================================================================
637 Atomize* flags(
638 /// See @ref ATOMIZE_FLAGS for a list of options
639 const int flags) {
640 __flags = flags;
641 return this;
642 } // -x- Atomize* flags -x-
643
644 /*======================================================================*//**
645 @brief
646 Return the entire atom.
647 @throws std::out_of_range if the index is out-of-range
648 @returns Key portion of atom (or the entire atom if a key-value pair wasn't
649 detected)
650 @see at
651 @see get_key
652 @see get_value
653 @see has_kv
654 @see operator[int]
655 *///=========================================================================
656 std::string get(
657 /// Which atom to obtain (0 = first atom; negative values count backward from
658 /// the last atom in the internal array)
659 int index) {
660 if (index < 0) index = __data_points.size() + index;
661 __atom* atom = __data_points[index];
662 return std::string(__origin, atom->fpos, atom->flen);
663 } // -x- std::string get -x-
664
665 /*======================================================================*//**
666 @brief
667 Return the key portion of an atom, or the entire atom if a key-vlue pair
668 wasn't detected.
669 @throws std::out_of_range if the index is out-of-range
670 @returns Key portion of atom (or the entire atom if a key-value pair wasn't
671 detected)
672 @see at
673 @see get
674 @see get_value
675 @see has_kv
676 @see operator[int]
677 *///=========================================================================
678 std::string get_key(
679 /// Which atom to obtain (0 = first atom; negative values count backward from
680 /// the last atom in the internal array)
681 int index) {
682 if (index < 0) index = __data_points.size() + index;
683 __atom* atom = __data_points[index];
684 return std::string(__origin, atom->fpos, atom->klen);
685 } // -x- std::string get_key -x-
686
687 /*======================================================================*//**
688 @brief
689 Return the value portion of an atom, or an empty string if a key-vlue pair
690 wasn't detected.
691 @throws std::out_of_range if the index is out-of-range
692 @returns Value portion of atom (or an empty string if a key-value pair wasn't
693 detected)
694 @see at
695 @see get
696 @see get_key
697 @see has_kv
698 @see operator[int]
699 *///=========================================================================
700 std::string get_value(
701 /// Which atom to obtain (0 = first atom; negative values count backward from
702 /// the last atom in the internal array)
703 int index) {
704 if (index < 0) index = __data_points.size() + index;
705 __atom* atom = __data_points[index];
706 return atom->vpos != 0 ? std::string(__origin, atom->vpos, atom->vlen) : "";
707 } // -x- std::string get_value -x-
708
709 /*======================================================================*//**
710 @brief
711 Indicates whether the specified atom was split into a key-value pair (if it
712 was, then the @c key and the @c value are delimited by the first instance of
713 an equal sign {`=`}).
714 @throws std::out_of_range if the index is out-of-range
715 @returns TRUE = key-value pair was detected by the parsing algorithm@n
716 FALSE = this atom was not split into a key-value pair
717 @see at
718 @see get
719 @see get_key
720 @see get_value
721 @see operator[int]
722 *///=========================================================================
723 bool has_kv(
724 /// Which atom to obtain (0 = first atom; negative values count backward from
725 /// the last atom in the internal array)
726 int index) {
727 if (index < 0) index = __data_points.size() + index;
728 __atom* atom = __data_points[index];
729 return atom->vpos != 0;
730 } // -x- bool has_kv -x-
731
732 /*======================================================================*//**
733 @brief
734 Get the operator modes that are set for the @ref operator[] operator.
735 @throws std::invalid_argument if an incorrect value is provided
736 @returns The same Atomize object so as to facilitate stacking
737 @see flags
738 @see mode(const char*)
739 *///=========================================================================
740 std::string mode() noexcept {
741 std::string modes;
742 if (mode_k) modes.append("k");
743 if (mode_v) modes.append("v");
744 if (mode_p) modes.append("p");
745 if (mode_c) modes.append("c");
746 if (mode_f) modes.append("f");
747 if (mode_l) modes.append("l");
748 if (mode_u) modes.append("u");
749 return modes;
750 } // -x- std::string mode -x-
751
752 private:
753 /*======================================================================*//**
754 @brief
755
756 Set the operator modes for use with the @ref operator[] operator (modes that
757 are not specified will be reset to their defaults).
758
759 Calling this method with @c "\0" as the parameter will result in resetting
760 all operator modes to the base defaults.
761 @throws std::invalid_argument if an incorrect value is provided
762 @returns The same Atomize object so as to facilitate stacking
763 *///=========================================================================
764 void __mode(
765 /// Granulatarity (default is to return the entire atom): @n
766 /// @c "\0" = entire atom (default) @n
767 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
768 /// @c "v" = value (will be empty if no key-value pair was detected) @n
769 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
770 /// Conversion options (default is for no conversion): @n
771 /// @c "c" = Camel_Case @n
772 /// @c "f" = First character in upper-case @n
773 /// @c "l" = all lower case @n
774 /// @c "u" = ALL UPPER CASE
775 const char* mode,
776 /// This is primarily used by the empty constructor@n
777 /// TRUE = invalid mode throws an std::invalid_argument exception (default) @n
778 /// FALSE = ignore invalid mode
779 const bool throw_exception = true) {
780
781 // --------------------------------------------------------------------------
782 // Clear all settings.
783 // --------------------------------------------------------------------------
784 mode_k = false;
785 mode_v = false;
786 mode_p = false;
787 mode_c = false;
788 mode_f = false;
789 mode_l = false;
790 mode_u = false;
791
792 // --------------------------------------------------------------------------
793 // Set modes (duplicates are effectively ignored, and 0 never shows up here
794 // because it terminates the string).
795 // --------------------------------------------------------------------------
796 const size_t len = std::strlen(mode);
797 for (int i = 0; i < len; i++) {
798 switch (mode[i]) {
799 case 'k':
800 mode_k = true;
801 break;
802 case 'v':
803 mode_v = true;
804 break;
805 case 'p':
806 mode_p = true;
807 break;
808 case 'c':
809 mode_c = true;
810 break;
811 case 'f':
812 mode_f = true;
813 break;
814 case 'l':
815 mode_l = true;
816 break;
817 case 'u':
818 mode_u = true;
819 break;
820 default:
821 if (throw_exception) throw std::invalid_argument("unrecognized oeprator_mode \"" + std::to_string(mode[i]) + "\"");
822 } // -x- switch mode[i] -x-
823 } // -x- for i -x-
824
825 } // -x- void mode -x-
826
827 public:
828 /*======================================================================*//**
829 @brief
830
831 Set the operator modes for use with the @ref operator[] operator (modes that
832 are not specified will be reset to their defaults).
833
834 Calling this method with @c "\0" as the parameter will result in resetting
835 all operator modes to the base defaults.
836 @throws std::invalid_argument if an incorrect value is provided
837 @returns The same Atomize object so as to facilitate stacking
838 @see flags
839 @see mode
840 *///=========================================================================
841 Atomize* mode(
842 /// Granulatarity (default is to return the entire atom): @n
843 /// @c "\0" = entire atom (default) @n
844 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
845 /// @c "v" = value (will be empty if no key-value pair was detected) @n
846 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
847 /// Conversion options (default is for no conversion): @n
848 /// @c "c" = Camel_Case @n
849 /// @c "f" = First character in upper-case @n
850 /// @c "l" = all lower case @n
851 /// @c "u" = ALL UPPER CASE
852 const char* mode) {
853 __mode(mode);
854 return this;
855 } // -x- Atomize* mode -x-
856
857 /*======================================================================*//**
858 @brief
859
860 Set the operator modes for use with the @ref operator[] operator (modes that
861 are not specified will be reset to their defaults).
862
863 Calling this method with @c "\0" as the parameter will result in resetting
864 all operator modes to the base defaults.
865 @throws std::invalid_argument if an incorrect value is provided
866 @returns The same Atomize object so as to facilitate stacking
867 @see flags
868 @see mode
869 *///=========================================================================
870 Atomize* mode(
871 /// Granulatarity (default is to return the entire atom): @n
872 /// @c "\0" = entire atom (default) @n
873 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
874 /// @c "v" = value (will be empty if no key-value pair was detected) @n
875 /// @c "p" = returns: "1" = is a key-value pair / "" = not a key-value pair@n
876 /// Conversion options (default is for no conversion): @n
877 /// @c "c" = Camel_Case @n
878 /// @c "f" = First character in upper-case @n
879 /// @c "l" = all lower case @n
880 /// @c "u" = ALL UPPER CASE
881 const char mode) {
882 char new_mode[]{mode, 0};
883 __mode(new_mode, false);
884 return this;
885 } // -x- Atomize* mode -x-
886
887 /*======================================================================*//**
888 @brief
889 Return the total quantity of atoms.
890 @returns Quantity of atoms
891 @see empty
892 *///=========================================================================
893 size_t size() {
894 return __data_points.size();
895 } // -x- size_t size -x-
896
897 /*======================================================================*//**
898 @brief
899 Generate an std::vector<std::string> that contains all atoms.
900 @returns std::string
901 *///=========================================================================
902 std::vector<std::string> to_vector(
903 /// FALSE = don't split key-value pairs (default) @n
904 /// TRUE = split key-value pairs into separate entries (for key names, the
905 /// equal sign will be included at the end of the string)
906 bool split_kv_pairs = false) noexcept {
907 std::vector<std::string> v;
908
909 // --------------------------------------------------------------------------
910 // Splitting key-value pairs is best handled in a separate loop.
911 // --------------------------------------------------------------------------
912 if (split_kv_pairs) {
913 for (int i = 0; i < __data_points.size(); i++) {
914 __atom* atom = __data_points[i];
915 if (atom->vpos != 0) { // Non-zero indicates that a key-value pair was detected
916 v.push_back(std::string(__origin, atom->fpos, atom->klen + 1)); // +1 includes equal sign
917 v.push_back(std::string(__origin, atom->vpos, atom->vlen));
918 } else { // No key-value pair was detected
919 v.push_back(std::string(__origin, atom->fpos, atom->flen));
920 } // -x- if atom->vpos -x-
921 } // -x- for i -x-
922 return v;
923 } // -x- if split_kv_pairs -x-
924
925 // --------------------------------------------------------------------------
926 // Full atoms requires is straight-forward.
927 // --------------------------------------------------------------------------
928 for (int i = 0; i < __data_points.size(); i++) {
929 __atom* atom = __data_points[i];
930 v.push_back(std::string(__origin, atom->fpos, atom->flen));
931 } // -x- for i -x-
932
933 return v;
934 } // -x- std::vector<std::string> to_vector -x-
935
936 /*======================================================================*//**
937 @brief
938 Array-style access to atoms, whilst utilizing the operator mode that was
939 configured using the @ref mode method.
940 @throws std::out_of_range if the index is out-of-range
941 @returns std::string
942 @see at
943 @see mode
944 *///=========================================================================
945 std::string operator[](
946 /// Index of character to access (0 = first atom; negative index values are
947 /// calculated in reverse, starting with -1 as the final atom)
948 int index) {
949 return at(index);
950 } // -x- std::string operator[] -x-
951
952 }; // -x- class Atomize -x-
953
954}; // -x- namespace randolf -x-