12 /*======================================================================*//**
14 The @ref Atomize class provides an object-oriented interface with array-style
15 access to a string, that was efficiently separated into atoms, and with more
16 granularity and functionality through the use of modes (see @ref mode for
17 deatils) and certain API methods, hence one could say "Atomize can split your
20 When parsing a line or block of text, the following is assumed:
22 - parameters are separated by one or multiple consecutive whitespace
23 characters (space, null, tab, linefeed, carriage return)
25 - values that are enclosed within a set of quotation marks may include
26 whitespace characters that will then not be interpreted as delimiters
28 Data is interpreted in a single pass during instantiation or assignment, and
29 the interpretation algorithm is written in an optimized programming style to
30 ensure high efficiency.
32 There are no memory leaks, which, as it turns out, is particularly important
33 not only because the specialized parsing involved is often utilized in heavy
34 data processing loops where speed and reliability are needed, but also
35 because some of the libraries I've tested that provide similar functionality
36 leak memory or fail in other ways that were triggered by peculiar or fuzzed
37 data, which this class is not impacted by (primarily because I don't trust
38 data to always be "as expected").
42 Parsing command lines or configuration settings can become challenging when
43 multiple parameters are provided on a single line, and some of those
44 parameters include quoted text that contains spaces. This class handles all
45 of these scenarios and makes it easy to access each parameter in the same
46 manner that arrays and vectors are accessed.
50 I created this class to make it easier to write internet server daemons.
54 @author Randolf Richardson
57 - 2025-Jan-20 v1.00 Initial version
58 - 2025-Feb-03 v1.00 Increased use of references and pointers
61 Lower-case letter "h" is regularly used in partial example code to represent
62 an instantiated rhostname object.
64 An ASCIIZ string is a C-string (char* array) that includes a terminating null
65 (0) character at the end.
69 I use the term "ASCIIZ string" to indicate an array of characters that's
70 terminated by a 0 (a.k.a., null). Although this is very much the same as a
71 C-string, the difference is that in many API functions a C-string must often
72 be accompanied by its length value. When referring to an ASCIIZ string, I'm
73 intentionally indicating that the length of the string is not needed because
74 the string is null-terminated. (This term was also commonly used in assembly
75 language programming in the 1970s, 1980s, and 1990s, and as far as I know is
76 still used by machine language programmers today.)
81 #include <iostream> // std::cout, std::cerr, std::endl, etc.
83 #include <randolf/Atomize>
85 int main(int argc, char *argv[]) {
86 randolf::Atomize a("parameters key=value");
87 std::cout << "atom0: " << a.at(0) << std::endl;
88 std::cout << "atom1: " << a.at(1) << std::endl;
89 std::cout << "key1: " << a.at(1, 'k') << std::endl;
90 std::cout << "val1: " << a.at(1, 'v') << std::endl;
95 Parameter stacking is also supported (with methods that return @c Atomize*).
96 *///=========================================================================
100 // --------------------------------------------------------------------------
101 // Internal structures.
102 // --------------------------------------------------------------------------
104 uint fpos = 0; // First position/offset (key also begins here)
105 uint flen = 0; // Full length
106 uint klen = 0; // Key length
107 uint vpos = 0; // Value position/offset (0 == not present)
108 uint vlen = 0; // Value length (do not use this to test for presence of value)
109 }; // -x- struct __atom -x-
111 // --------------------------------------------------------------------------
112 // Internal variables.
113 // --------------------------------------------------------------------------
114 char* __origin = nullptr; // Copy of original string
115 std::vector<__atom*> __data_points;
118 // --------------------------------------------------------------------------
119 // Operator modes (all are disabled by default).
120 // --------------------------------------------------------------------------
121 bool mode_k = false; // Key
122 bool mode_v = false; // Value
123 bool mode_p = false; // Presence of key-value pair
124 bool mode_c = false; // Camel_Case
125 bool mode_f = false; // First letter upper-case
126 bool mode_l = false; // All lower-case
127 bool mode_u = false; // All upper-case
128 #define ATOMIZE_MAX_MODES 8 // Number of modes + 1 (so include 0 for strnlen())
131 /*======================================================================*//**
133 Optional flags that alter, modify, or enhance the operation of atomization
135 *///=========================================================================
136 enum ATOMIZE_FLAGS: int {
138 /*----------------------------------------------------------------------*//**
139 The ATOMIZE_DEFAULT flag isn't necessary, but it's included here for
140 completeness as it accomodates programming styles that prefer to emphasize
141 when defaults are being relied upon.
142 *///-------------------------------------------------------------------------
145 /*----------------------------------------------------------------------*//**
146 Interpret all quotation marks (the default is to only utilize enclosing
148 *///-------------------------------------------------------------------------
149 ATOMIZE_USE_ALL_QUOTES = 1,
151 /*----------------------------------------------------------------------*//**
152 Don't interpret quotation marks as grouping characters.
153 *///-------------------------------------------------------------------------
154 ATOMIZE_IGNORE_QUOTES = 2,
156 /*----------------------------------------------------------------------*//**
157 Delete quotation marks that function as grouping characters (this flag has no
158 effect when @ref ATOMIZE_IGNORE_QUOTES is set).
159 *///-------------------------------------------------------------------------
160 ATOMIZE_DELETE_QUOTES = 4,
162 }; // -x- enum ATOMIZE_FLAGS -x-
165 /*======================================================================*//**
168 *///=========================================================================
170 /// The intake ASCIIZ string
172 /// The length of the intake string
175 // --------------------------------------------------------------------------
176 // Internal variables.
177 // --------------------------------------------------------------------------
178 __atom* atom = new __atom{}; // Transient data structure
179 bool k = false; // Key-value pair detection flag
180 //bool q = false; // Begin in non-quote mode
181 bool w = true; // Begin in whitespace mode to skip any leading whitespace
183 // --------------------------------------------------------------------------
184 // Allocate internal memory.
185 // --------------------------------------------------------------------------
186 if (__origin != nullptr) ::free(__origin); // Memory management
187 __origin = (char*)::calloc(1, len + 1); // Allocate memory
189 // --------------------------------------------------------------------------
191 // --------------------------------------------------------------------------
192 for (int i = 0; i < len; i++) {
194 // --------------------------------------------------------------------------
195 // Extract current charact and copy original string at the same time, which
196 // also creates an optimization opportunity for compilers to store "c" in a
197 // register, which is faster than accessing a memory location.
198 // --------------------------------------------------------------------------
199 char c = __origin[i] = intake[i];
201 // --------------------------------------------------------------------------
202 // Process character.
203 // --------------------------------------------------------------------------
205 case '\0': // Whitespace: NULL
206 //if (len == 0) break; // End of ASCIIZ string
207 case ' ': // Whitespace: Space
208 case '\t': // Whitespace: Tab
209 case '\r': // Whitespace: Carriage Return
210 case '\n': // Whitespace: Linefeed
211 if (!w) { // End of atom
212 if (k) atom->vlen = i - atom->vpos;
213 else atom->klen = i - atom->fpos;
214 __data_points.push_back(atom); // Save current atom to vector
215 atom = new __atom{.fpos = (uint)i}; // Create new atom
216 k = false; // Disable key-value pair mode
217 w = true; // Enable whitespace mode
220 case '=': // Key-value pair detected
221 if (!k) { // Key-value pair mode is not already detected
222 k = true; // Enable key-value pair mode
223 atom->klen = w ? 0 : i - atom->fpos;
224 atom->vpos = i + 1; // Save position of value
226 default: // Non-whitespace characters
227 if (w) { // White space mode is enabled
228 atom->fpos = i; // Save new starting position
229 w = false; // Disable whitespace mode
234 } // -x- swtich str -x-
238 // --------------------------------------------------------------------------
239 // Save final atom if it's not empty.
240 // --------------------------------------------------------------------------
241 if (atom->flen != 0) __data_points.push_back(atom);
244 } // -x- void __assign -x-
246 /*======================================================================*//**
248 Clear internal data. Called by destructor and assign() methods.
249 *///=========================================================================
252 // --------------------------------------------------------------------------
253 // Resources management: De-allocate all structures since a std::vector of
254 // pointers to these structures won't do this automatically when we clear()
255 // the entire vector.
256 // --------------------------------------------------------------------------
257 for (int i = (int)__data_points.size() - 1; i >= 0; i--) delete __data_points.at(i);
258 __data_points.clear();
260 // --------------------------------------------------------------------------
261 // Memory management.
262 // --------------------------------------------------------------------------
266 } // -x- void __clear -x-
268 /*======================================================================*//**
271 *///=========================================================================
272 void __mode(const char mode, const bool throw_exception = true) {
296 if (throw_exception) throw std::invalid_argument("unrecognized oeprator_mode \"" + std::to_string(mode) + "\"");
297 } // -x- switch mode -x-
298 } // -x- void __mode -x-
301 /*======================================================================*//**
303 Instantiate an empty Atomize object, which is expected to be used with the
304 @ref assign method at some later point. (This is particularly useful for
305 defining a local Atomize object in a header file in a way that won't throw an
306 exception, including invalid mode codes {which will just be ignored instead
307 of throwing an exception}.)
310 *///=========================================================================
312 /// See @ref ATOMIZE_FLAGS for a list of options
314 /// Granulatarity (default is to return the entire atom): @n
315 /// @c 0 = don't set any modes (default) @n
316 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
317 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
318 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
319 /// Conversion options (default is for no conversion): @n
320 /// @c 'c' = Camel_Case @n
321 /// @c 'f' = First character in upper-case @n
322 /// @c 'l' = all lower case @n
323 /// @c 'u' = ALL UPPER CASE
324 const char mode) noexcept {
327 } // -x- constructor Atomize -x-
329 /*======================================================================*//**
331 Instantiate an empty Atomize object, which is expected to be used with the
332 @ref assign method at some later point. (This is particularly useful for
333 defining a local Atomize object in a header file in a way that won't throw an
334 exception, including invalid mode codes {which will just be ignored instead
335 of throwing an exception}.)
338 *///=========================================================================
340 /// See @ref ATOMIZE_FLAGS for a list of options
341 const int flags = ATOMIZE_DEFAULT,
342 /// Granulatarity (default is to return the entire atom): @n
343 /// @c nullptr = don't set any modes @n
344 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
345 /// @c "v" = value (will be empty if no key-value pair was detected) @n
346 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
347 /// Conversion options (default is for no conversion): @n
348 /// @c "c" = Camel_Case @n
349 /// @c "f" = First character in upper-case @n
350 /// @c "l" = all lower case @n
351 /// @c "u" = ALL UPPER CASE
352 const char* mode = nullptr) noexcept {
354 if (mode != nullptr) {
355 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
356 for (int i = 0; i < mode_len; i++) __mode(mode[i], false);
358 } // -x- constructor Atomize -x-
360 /*======================================================================*//**
362 Instantiate an Atomize object using the specified ASCIIZ string for intake.
363 @throws std::invalid_argument If the parameters are malformed in some way.
366 *///=========================================================================
368 /// The intake ASCIIZ string
370 /// The length of the intake string@n
371 /// -1 = Measure ASCIIZ string
373 /// See @ref FLAGS for a list of options
375 /// Granulatarity (default is to return the entire atom): @n
376 /// @c 0 = don't set any modes (default) @n
377 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
378 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
379 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
380 /// Conversion options (default is for no conversion): @n
381 /// @c 'c' = Camel_Case @n
382 /// @c 'f' = First character in upper-case @n
383 /// @c 'l' = all lower case @n
384 /// @c 'u' = ALL UPPER CASE
388 __assign(intake, len >= 0 ? len : std::strlen(intake));
389 } // -x- constructor Atomize -x-
391 /*======================================================================*//**
393 Instantiate an Atomize object using the specified ASCIIZ string for intake.
394 @throws std::invalid_argument If the parameters are malformed in some way.
397 *///=========================================================================
399 /// The intake ASCIIZ string
401 /// The length of the intake string@n
402 /// -1 = Measure ASCIIZ string
404 /// See @ref FLAGS for a list of options
405 const int flags = ATOMIZE_DEFAULT,
406 /// Granulatarity (default is to return the entire atom): @n
407 /// @c nullptr = don't set any modes @n
408 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
409 /// @c "v" = value (will be empty if no key-value pair was detected) @n
410 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
411 /// Conversion options (default is for no conversion): @n
412 /// @c "c" = Camel_Case @n
413 /// @c "f" = First character in upper-case @n
414 /// @c "l" = all lower case @n
415 /// @c "u" = ALL UPPER CASE
416 const char* mode = nullptr) {
418 if (mode != nullptr) {
419 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
420 for (int i = 0; i < mode_len; i++) __mode(mode[i], false);
422 __assign(intake, len >= 0 ? len : std::strlen(intake));
423 } // -x- constructor Atomize -x-
425 /*======================================================================*//**
427 Instantiate an Atomize object using the specified string for intake.
428 @throws std::invalid_argument If the parameters are malformed in some way.
431 *///=========================================================================
433 /// The intake C++ string
434 const std::string& intake,
435 /// The length of the intake string@n
436 /// -1 = Obtain length from @c intake.size() method
438 /// See @ref FLAGS for a list of options
440 /// Granulatarity (default is to return the entire atom): @n
441 /// @c 0 = don't set any modes (default) @n
442 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
443 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
444 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
445 /// Conversion options (default is for no conversion): @n
446 /// @c 'c' = Camel_Case @n
447 /// @c 'f' = First character in upper-case @n
448 /// @c 'l' = all lower case @n
449 /// @c 'u' = ALL UPPER CASE
453 __assign(intake.data(), len >= 0 ? len : intake.size());
454 } // -x- constructor Atomize -x-
456 /*======================================================================*//**
458 Instantiate an Atomize object using the specified string for intake.
459 @throws std::invalid_argument If the parameters are malformed in some way.
462 *///=========================================================================
464 /// The intake C++ string
465 const std::string& intake,
466 /// The length of the intake string@n
467 /// -1 = Obtain length from @c intake.size() method
469 /// See @ref FLAGS for a list of options
470 const int flags = ATOMIZE_DEFAULT,
471 /// Granulatarity (default is to return the entire atom): @n
472 /// @c nullptr = don't set any modes @n
473 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
474 /// @c "v" = value (will be empty if no key-value pair was detected) @n
475 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
476 /// Conversion options (default is for no conversion): @n
477 /// @c "c" = Camel_Case @n
478 /// @c "f" = First character in upper-case @n
479 /// @c "l" = all lower case @n
480 /// @c "u" = ALL UPPER CASE
481 const char* mode = nullptr) {
483 if (mode != nullptr) {
484 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
485 for (int i = 0; i < mode_len; i++) __mode(mode[i], false);
487 __assign(intake.data(), len >= 0 ? len : intake.size());
488 } // -x- constructor Atomize -x-
490 /*======================================================================*//**
494 *///=========================================================================
495 ~Atomize() noexcept {
497 if (__origin != nullptr) ::free(__origin); // Memory management
498 } // -x- constructor Atomize -x-
500 /*======================================================================*//**
502 Assign (and interpret) a new ASCIIZ string (flags and modes are inherited).
503 @throws std::invalid_argument If the parameters are malformed in some way.
504 @returns The same Atomize object so as to facilitate stacking
505 *///=========================================================================
507 /// The intake ASCIIZ string
509 /// The length of the intake string@n
510 /// -1 = Measure ASCIIZ string
511 const int len = -1) {
513 __assign(intake, len >= 0 ? len : std::strlen(intake));
515 } // -x- Atomize& assign -x-
517 /*======================================================================*//**
519 Assign (and interpret) a new string (flags and modes are inherited).
520 @throws std::invalid_argument If the parameters are malformed in some way.
521 @returns The same Atomize object so as to facilitate stacking
522 *///=========================================================================
524 /// The intake C++ string
525 const std::string& intake,
526 /// The length of the intake string@n
527 /// -1 = Obtain length from @c intake.size() method
528 const int len = -1) {
530 __assign(intake.data(), len >= 0 ? len : intake.size());
532 } // -x- Atomize& assign -x-
534 /*======================================================================*//**
536 Access to atoms, whilst utilizing the operator mode that was configured using
537 the @ref mode method.
538 Return an entire atom.
539 @throws std::out_of_range if the index is out-of-range
545 @returns Entire atom, or portion, depending on the mode
546 *///=========================================================================
548 /// Which atom to obtain (0 = first atom; negative values count backward from
549 /// the last atom in the internal array)
551 /// Granulatarity (default is to return the entire atom): @n
552 /// @c 0 = don't change any modes (default) @n
553 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
554 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
555 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
556 /// Conversion options (default is for no conversion): @n
557 /// @c 'c' = Camel_Case @n
558 /// @c 'f' = First character in upper-case @n
559 /// @c 'l' = all lower case @n
560 /// @c 'u' = ALL UPPER CASE
563 // --------------------------------------------------------------------------
564 // Internal variables.
565 // --------------------------------------------------------------------------
567 if (index < 0) index = __data_points.size() + index;
568 __atom* atom = __data_points[index];
570 // --------------------------------------------------------------------------
572 // --------------------------------------------------------------------------
573 std::string old_modes(mode == 0 ? "" : this->mode());
574 if (mode != 0) this->mode(mode);
576 // --------------------------------------------------------------------------
577 // Presence of key-value pair (results in ignoring all other modes).
578 // --------------------------------------------------------------------------
580 if (atom->vpos != 0) str.assign("1");
582 } // -x- if mode_p -x-
584 // --------------------------------------------------------------------------
586 // --------------------------------------------------------------------------
587 if (mode_k) str.assign(__origin, atom->fpos, atom->klen);
588 else if (mode_v) str.assign(__origin, atom->vpos, atom->vlen);
589 else str.assign(__origin, atom->fpos, atom->flen);
591 // --------------------------------------------------------------------------
593 // --------------------------------------------------------------------------
594 char* data = str.data();
595 if (mode_l) { // All lower-case
596 for (size_t i = 0; i < str.size(); i++) {
598 if (ch >= 'A' && ch <= 'Z') data[i] += 32; // Convert to lower-case
600 } else if (mode_u) { // All upper-case
601 for (size_t i = 0; i < str.size(); i++) {
603 if (ch >= 'a' && ch <= 'z') data[i] -= 32; // Convert to upper-case
605 } else if (mode_c) { // Camel_Case
607 for (size_t i = 0; i < str.size(); i++) {
609 if (pch < 'A' || (pch > 'Z' && pch < 'a') || pch > 'z') {
610 if (ch >= 'a' && ch <= 'z')
611 data[i] -= 32; // Convert to upper-case
615 } else if (mode_f) { // First letter
617 if (ch >= 'a' && ch <= 'z') data[0] -= 32; // Convert to upper-case
618 } // -x- if mode_c -x-
620 // --------------------------------------------------------------------------
622 // --------------------------------------------------------------------------
623 if (mode != 0) this->mode(old_modes.data());
626 } // -x- std::string at -x-
628 /*======================================================================*//**
630 Access to atoms, whilst utilizing the operator mode that was configured using
631 the @ref mode method.
632 Return an entire atom.
633 @throws std::out_of_range if the index is out-of-range
639 @returns Entire atom, or portion, depending on the mode
640 *///=========================================================================
642 /// Which atom to obtain (0 = first atom; negative values count backward from
643 /// the last atom in the internal array)
645 /// Granulatarity (default is to return the entire atom): @n
646 /// @c nullptr = don't change any modes (default) @n
647 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
648 /// @c "v" = value (will be empty if no key-value pair was detected) @n
649 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
650 /// Conversion options (default is for no conversion): @n
651 /// @c "c" = Camel_Case @n
652 /// @c "f" = First character in upper-case @n
653 /// @c "l" = all lower case @n
654 /// @c "u" = ALL UPPER CASE
655 const char* mode = nullptr) {
657 // --------------------------------------------------------------------------
658 // Internal variables.
659 // --------------------------------------------------------------------------
661 if (index < 0) index = __data_points.size() + index;
662 __atom* atom = __data_points[index];
664 // --------------------------------------------------------------------------
666 // --------------------------------------------------------------------------
667 std::string old_modes(mode == nullptr ? "" : this->mode());
668 if (mode != nullptr) this->mode(mode);
670 // --------------------------------------------------------------------------
671 // Presence of key-value pair (results in ignoring all other modes).
672 // --------------------------------------------------------------------------
674 if (atom->vpos != 0) str.assign("1");
676 } // -x- if mode_p -x-
678 // --------------------------------------------------------------------------
680 // --------------------------------------------------------------------------
681 if (mode_k) str.assign(__origin, atom->fpos, atom->klen);
682 else if (mode_v) str.assign(__origin, atom->vpos, atom->vlen);
683 else str.assign(__origin, atom->fpos, atom->flen);
685 // --------------------------------------------------------------------------
687 // --------------------------------------------------------------------------
688 char* data = str.data();
689 if (mode_l) { // All lower-case
690 for (size_t i = 0; i < str.size(); i++) {
692 if (ch >= 'A' && ch <= 'Z') data[i] += 32; // Convert to lower-case
694 } else if (mode_u) { // All upper-case
695 for (size_t i = 0; i < str.size(); i++) {
697 if (ch >= 'a' && ch <= 'z') data[i] -= 32; // Convert to upper-case
699 } else if (mode_c) { // Camel_Case
701 for (size_t i = 0; i < str.size(); i++) {
703 if (pch < 'A' || (pch > 'Z' && pch < 'a') || pch > 'z') {
704 if (ch >= 'a' && ch <= 'z')
705 data[i] -= 32; // Convert to upper-case
709 } else if (mode_f) { // First letter
711 if (ch >= 'a' && ch <= 'z') data[0] -= 32; // Convert to upper-case
712 } // -x- if mode_c -x-
714 // --------------------------------------------------------------------------
716 // --------------------------------------------------------------------------
717 if (mode != nullptr) this->mode(old_modes.data());
720 } // -x- std::string at -x-
722 /*======================================================================*//**
724 Clear this Atomize's underlying data and reset all states.
725 @returns The same Atomize object so as to facilitate stacking
728 *///=========================================================================
732 } // -x- Atomize& clear -x-
734 /*======================================================================*//**
736 Confirm that there are no atoms.
737 @returns TRUE = no atoms@n
738 FALSE = at least one atom exists
741 *///=========================================================================
743 return __data_points.empty();
744 } // -x- bool empty -x-
746 /*======================================================================*//**
748 Obtain current set of internal flags.
749 @returns Current flags, as defined in @ref ATOMIZE_FLAGS
751 *///=========================================================================
754 } // -x- int flags -x-
756 /*======================================================================*//**
758 Obtain current set of internal flags.
759 @returns The same Atomize object so as to facilitate stacking
761 *///=========================================================================
763 /// See @ref FLAGS for a list of options
764 const int flags = ATOMIZE_DEFAULT) {
767 } // -x- Atomize& flags -x-
769 /*======================================================================*//**
771 Return the entire atom.
772 @throws std::out_of_range if the index is out-of-range
773 @returns Key portion of atom (or the entire atom if a key-value pair wasn't
781 *///=========================================================================
783 /// Which atom to obtain (0 = first atom; negative values count backward from
784 /// the last atom in the internal array)
786 if (index < 0) index = __data_points.size() + index;
787 __atom* atom = __data_points[index];
788 return std::string(__origin, atom->fpos, atom->flen);
789 } // -x- std::string get -x-
791 /*======================================================================*//**
793 Return the key portion of an atom, or the entire atom if a key-vlue pair
795 @throws std::out_of_range if the index is out-of-range
796 @returns Key portion of atom (or the entire atom if a key-value pair wasn't
804 *///=========================================================================
806 /// Which atom to obtain (0 = first atom; negative values count backward from
807 /// the last atom in the internal array)
809 if (index < 0) index = __data_points.size() + index;
810 __atom* atom = __data_points[index];
811 return std::string(__origin, atom->fpos, atom->klen);
812 } // -x- std::string get_key -x-
814 /*======================================================================*//**
816 Return the value portion of an atom, or an empty string if a key-vlue pair
818 @throws std::out_of_range if the index is out-of-range
819 @returns Value portion of atom (or an empty string if a key-value pair wasn't
827 *///=========================================================================
828 std::string get_value(
829 /// Which atom to obtain (0 = first atom; negative values count backward from
830 /// the last atom in the internal array)
832 if (index < 0) index = __data_points.size() + index;
833 __atom* atom = __data_points[index];
834 return std::string(atom->vpos != 0 ? std::string(__origin, atom->vpos, atom->vlen) : "");
835 } // -x- std::string get_value -x-
837 /*======================================================================*//**
839 Indicates whether the specified atom was split into a key-value pair (if it
840 was, then the @c key and the @c value are delimited by the first instance of
841 an equal sign {`=`}).
842 @throws std::out_of_range if the index is out-of-range
843 @returns TRUE = key-value pair was detected by the parsing algorithm
844 FALSE = this atom was not split into a key-value pair
851 *///=========================================================================
853 /// Which atom to obtain (0 = first atom; negative values count backward from
854 /// the last atom in the internal array)
856 if (index < 0) index = __data_points.size() + index;
857 __atom* atom = __data_points[index];
858 return atom->vpos != 0;
859 } // -x- bool has_kv -x-
861 /*======================================================================*//**
863 Get the operator modes that are set for the @ref operator[] operator.
864 @throws std::invalid_argument if an incorrect value is provided
865 @returns The same Atomize object so as to facilitate stacking
873 @see mode(const char*)
874 *///=========================================================================
875 std::string mode() noexcept {
877 if (mode_k) modes.append("k");
878 if (mode_v) modes.append("v");
879 if (mode_p) modes.append("p");
880 if (mode_c) modes.append("c");
881 if (mode_f) modes.append("f");
882 if (mode_l) modes.append("l");
883 if (mode_u) modes.append("u");
885 } // -x- std::string mode -x-
887 /*======================================================================*//**
889 Set the operator modes for use with the @ref operator[] operator (modes that
890 are not specified will be reset to their defaults).
892 Calling this method with @c 0 as the parameter will result in resetting all
893 operator modes to their defaults.
894 @throws std::invalid_argument if an incorrect value is provided
895 @returns The same Atomize object so as to facilitate stacking
903 *///=========================================================================
905 /// Granulatarity (default is to return the entire atom): @n
906 /// @c nullptr = clear all modes (default) @n
907 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
908 /// @c "v" = value (will be empty if no key-value pair was detected) @n
909 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
910 /// Conversion options (default is for no conversion): @n
911 /// @c "c" = Camel_Case @n
912 /// @c "f" = First character in upper-case @n
913 /// @c "l" = all lower case @n
914 /// @c "u" = ALL UPPER CASE
917 // --------------------------------------------------------------------------
918 // Clear all settings.
919 // --------------------------------------------------------------------------
928 // --------------------------------------------------------------------------
930 // --------------------------------------------------------------------------
931 if (mode == nullptr) return *this;
933 // --------------------------------------------------------------------------
934 // Set modes (duplicates are effectively ignored, and 0 never shows up here
935 // because it terminates the string).
936 // --------------------------------------------------------------------------
937 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
938 for (int i = 0; i < mode_len; i++) __mode(mode[i]);
941 } // -x- Atomize& mode -x-
943 /*======================================================================*//**
945 Set the operator modes for use with the @ref operator[] operator (modes that
946 are not specified will be reset to their defaults).
948 Calling this method with @c 0 as the parameter will result in resetting all
949 operator modes to their defaults.
950 @throws std::invalid_argument if an incorrect value is provided
951 @returns The same Atomize object so as to facilitate stacking
959 *///=========================================================================
961 /// Granulatarity (default is to return the entire atom): @n
962 /// @c 0 = entire atom (default) @n
963 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
964 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
965 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
966 /// Conversion options (default is for no conversion): @n
967 /// @c 'c' = Camel_Case @n
968 /// @c 'f' = First character in upper-case @n
969 /// @c 'l' = all lower case @n
970 /// @c 'u' = ALL UPPER CASE
973 // --------------------------------------------------------------------------
975 // --------------------------------------------------------------------------
984 // --------------------------------------------------------------------------
986 // --------------------------------------------------------------------------
990 } // -x- Atomize& mode -x-
992 /*======================================================================*//**
994 Return the total quantity of atoms.
995 @returns Quantity of atoms
998 *///=========================================================================
1000 return __data_points.size();
1001 } // -x- size_t size -x-
1003 /*======================================================================*//**
1005 Generate an std::vector<std::string> that contains all atoms.
1006 @returns std::string
1007 *///=========================================================================
1008 std::vector<std::string> to_vector(
1009 /// FALSE = don't split key-value pairs (default) @n
1010 /// TRUE = split key-value pairs into separate entries (for key names, the
1011 /// equal sign will be included at the end of the string)
1012 bool split_kv_pairs = false) noexcept {
1013 std::vector<std::string> v;
1015 // --------------------------------------------------------------------------
1016 // Splitting key-value pairs is best handled in a separate loop.
1017 // --------------------------------------------------------------------------
1018 if (split_kv_pairs) {
1019 for (size_t i = 0; i < __data_points.size(); i++) {
1020 __atom* atom = __data_points[i];
1021 if (atom->vpos != 0) { // Non-zero indicates that a key-value pair was detected
1022 v.push_back(std::string(__origin, atom->fpos, atom->klen + 1)); // +1 includes equal sign
1023 v.push_back(std::string(__origin, atom->vpos, atom->vlen));
1024 } else { // No key-value pair was detected
1025 v.push_back(std::string(__origin, atom->fpos, atom->flen));
1026 } // -x- if atom->vpos -x-
1029 } // -x- if split_kv_pairs -x-
1031 // --------------------------------------------------------------------------
1032 // Full atoms requires is straight-forward.
1033 // --------------------------------------------------------------------------
1034 for (size_t i = 0; i < __data_points.size(); i++) {
1035 __atom* atom = __data_points[i];
1036 v.push_back(std::string(__origin, atom->fpos, atom->flen));
1040 } // -x- std::vector<std::string> to_vector -x-
1042 /*======================================================================*//**
1044 Array-style access to atoms, whilst utilizing the operator mode that was
1045 configured using the @ref mode method.
1046 @throws std::out_of_range if the index is out-of-range
1047 @returns std::string
1054 @see operator[](int)
1055 *///=========================================================================
1056 std::string operator[](
1057 /// Index of character to access (0 = first atom; negative index values are
1058 /// calculated in reverse, starting with -1 as the final atom)
1061 } // -x- std::string operator[] -x-
1063 }; // -x- class Atomize -x-
1065}; // -x- namespace randolf -x-