randolf.ca  1.00
Randolf Richardson's C++ classes
Loading...
Searching...
No Matches
Atomize
1#pragma once
2
3#include <algorithm>
4#include <atomic>
5#include <cstring>
6#include <exception>
7#include <iostream>
8#include <vector>
9
10namespace randolf {
11
12 /*======================================================================*//**
13 @brief
14 The @ref Atomize class provides an object-oriented interface with array-style
15 access to a string, that was efficiently separated into atoms, and with more
16 granularity and functionality through the use of modes (see @ref mode for
17 deatils) and certain API methods, hence one could say "Atomize can split your
18 atoms safely.".
19
20 When parsing a line or block of text, the following is assumed:
21
22 - parameters are separated by one or multiple consecutive whitespace
23 characters (space, null, tab, linefeed, carriage return)
24
25 - values that are enclosed within a set of quotation marks may include
26 whitespace characters that will then not be interpreted as delimiters
27
28 Data is interpreted in a single pass during instantiation or assignment, and
29 the interpretation algorithm is written in an optimized programming style to
30 ensure high efficiency.
31
32 There are no memory leaks, which, as it turns out, is particularly important
33 not only because the specialized parsing involved is often utilized in heavy
34 data processing loops where speed and reliability are needed, but also
35 because some of the libraries I've tested that provide similar functionality
36 leak memory or fail in other ways that were triggered by peculiar or fuzzed
37 data, which this class is not impacted by (primarily because I don't trust
38 data to always be "as expected").
39
40 @par Use case
41
42 Parsing command lines or configuration settings can become challenging when
43 multiple parameters are provided on a single line, and some of those
44 parameters include quoted text that contains spaces. This class handles all
45 of these scenarios and makes it easy to access each parameter in the same
46 manner that arrays and vectors are accessed.
47
48 @par Background
49
50 I created this class to make it easier to write internet server daemons.
51
52 @par Getting started
53
54 @author Randolf Richardson
55 @version 1.00
56 @par History
57 - 2025-Jan-20 v1.00 Initial version
58 - 2025-Feb-03 v1.00 Increased use of references and pointers
59
60 @par Conventions
61 Lower-case letter "h" is regularly used in partial example code to represent
62 an instantiated rhostname object.
63
64 An ASCIIZ string is a C-string (char* array) that includes a terminating null
65 (0) character at the end.
66
67 @par Notes
68
69 I use the term "ASCIIZ string" to indicate an array of characters that's
70 terminated by a 0 (a.k.a., null). Although this is very much the same as a
71 C-string, the difference is that in many API functions a C-string must often
72 be accompanied by its length value. When referring to an ASCIIZ string, I'm
73 intentionally indicating that the length of the string is not needed because
74 the string is null-terminated. (This term was also commonly used in assembly
75 language programming in the 1970s, 1980s, and 1990s, and as far as I know is
76 still used by machine language programmers today.)
77
78 @par Examples
79
80 @code{.cpp}
81 #include <iostream> // std::cout, std::cerr, std::endl, etc.
82
83 #include <randolf/Atomize>
84
85 int main(int argc, char *argv[]) {
86 randolf::Atomize a("parameters key=value");
87 std::cout << "atom0: " << a.at(0) << std::endl;
88 std::cout << "atom1: " << a.at(1) << std::endl;
89 std::cout << "key1: " << a.at(1, 'k') << std::endl;
90 std::cout << "val1: " << a.at(1, 'v') << std::endl;
91 return EXIT_SUCCESS;
92 } // -x- int main -x-
93 @endcode
94
95 Parameter stacking is also supported (with methods that return @c Atomize*).
96 *///=========================================================================
97 class Atomize {
98
99 private:
100 // --------------------------------------------------------------------------
101 // Internal structures.
102 // --------------------------------------------------------------------------
103 struct __atom {
104 uint fpos = 0; // First position/offset (key also begins here)
105 uint flen = 0; // Full length
106 uint klen = 0; // Key length
107 uint vpos = 0; // Value position/offset (0 == not present)
108 uint vlen = 0; // Value length (do not use this to test for presence of value)
109 }; // -x- struct __atom -x-
110
111 // --------------------------------------------------------------------------
112 // Internal variables.
113 // --------------------------------------------------------------------------
114 char* __origin = nullptr; // Copy of original string
115 std::vector<__atom*> __data_points;
116 int __flags = 0;
117
118 // --------------------------------------------------------------------------
119 // Operator modes (all are disabled by default).
120 // --------------------------------------------------------------------------
121 bool mode_k = false; // Key
122 bool mode_v = false; // Value
123 bool mode_p = false; // Presence of key-value pair
124 bool mode_c = false; // Camel_Case
125 bool mode_f = false; // First letter upper-case
126 bool mode_l = false; // All lower-case
127 bool mode_u = false; // All upper-case
128 #define ATOMIZE_MAX_MODES 8 // Number of modes + 1 (so include 0 for strnlen())
129
130 public:
131 /*======================================================================*//**
132 @brief
133 Optional flags that alter, modify, or enhance the operation of atomization
134 intake.
135 *///=========================================================================
136 enum ATOMIZE_FLAGS: int {
137
138 /*----------------------------------------------------------------------*//**
139 The ATOMIZE_DEFAULT flag isn't necessary, but it's included here for
140 completeness as it accomodates programming styles that prefer to emphasize
141 when defaults are being relied upon.
142 *///-------------------------------------------------------------------------
143 ATOMIZE_DEFAULT = 0,
144
145 /*----------------------------------------------------------------------*//**
146 Interpret all quotation marks (the default is to only utilize enclosing
147 quotation marks).
148 *///-------------------------------------------------------------------------
149 ATOMIZE_USE_ALL_QUOTES = 1,
150
151 /*----------------------------------------------------------------------*//**
152 Don't interpret quotation marks as grouping characters.
153 *///-------------------------------------------------------------------------
154 ATOMIZE_IGNORE_QUOTES = 2,
155
156 /*----------------------------------------------------------------------*//**
157 Delete quotation marks that function as grouping characters (this flag has no
158 effect when @ref ATOMIZE_IGNORE_QUOTES is set).
159 *///-------------------------------------------------------------------------
160 ATOMIZE_DELETE_QUOTES = 4,
161
162 }; // -x- enum ATOMIZE_FLAGS -x-
163
164 private:
165 /*======================================================================*//**
166 @brief
167 Assign new string.
168 *///=========================================================================
169 void __assign(
170 /// The intake ASCIIZ string
171 const char* intake,
172 /// The length of the intake string
173 const int len) {
174
175 // --------------------------------------------------------------------------
176 // Internal variables.
177 // --------------------------------------------------------------------------
178 __atom* atom = new __atom{}; // Transient data structure
179 bool k = false; // Key-value pair detection flag
180 //bool q = false; // Begin in non-quote mode
181 bool w = true; // Begin in whitespace mode to skip any leading whitespace
182
183 // --------------------------------------------------------------------------
184 // Allocate internal memory.
185 // --------------------------------------------------------------------------
186 if (__origin != nullptr) ::free(__origin); // Memory management
187 __origin = (char*)::calloc(1, len + 1); // Allocate memory
188
189 // --------------------------------------------------------------------------
190 // Primary loop.
191 // --------------------------------------------------------------------------
192 for (int i = 0; i < len; i++) {
193
194 // --------------------------------------------------------------------------
195 // Extract current charact and copy original string at the same time, which
196 // also creates an optimization opportunity for compilers to store "c" in a
197 // register, which is faster than accessing a memory location.
198 // --------------------------------------------------------------------------
199 char c = __origin[i] = intake[i];
200
201 // --------------------------------------------------------------------------
202 // Process character.
203 // --------------------------------------------------------------------------
204 switch (c) {
205 case '\0': // Whitespace: NULL
206 //if (len == 0) break; // End of ASCIIZ string
207 case ' ': // Whitespace: Space
208 case '\t': // Whitespace: Tab
209 case '\r': // Whitespace: Carriage Return
210 case '\n': // Whitespace: Linefeed
211 if (!w) { // End of atom
212 if (k) atom->vlen = i - atom->vpos;
213 else atom->klen = i - atom->fpos;
214 __data_points.push_back(atom); // Save current atom to vector
215 atom = new __atom{.fpos = (uint)i}; // Create new atom
216 k = false; // Disable key-value pair mode
217 w = true; // Enable whitespace mode
218 } // -x- if !w -x-
219 break;
220 case '=': // Key-value pair detected
221 if (!k) { // Key-value pair mode is not already detected
222 k = true; // Enable key-value pair mode
223 atom->klen = w ? 0 : i - atom->fpos;
224 atom->vpos = i + 1; // Save position of value
225 } // -x- if !k -x-
226 default: // Non-whitespace characters
227 if (w) { // White space mode is enabled
228 atom->fpos = i; // Save new starting position
229 w = false; // Disable whitespace mode
230 } // -x- if w -x-
231 if (k) atom->vlen++;
232 else atom->klen++;
233 atom->flen++;
234 } // -x- swtich str -x-
235
236 } // -x- for i -x-
237
238 // --------------------------------------------------------------------------
239 // Save final atom if it's not empty.
240 // --------------------------------------------------------------------------
241 if (atom->flen != 0) __data_points.push_back(atom);
242 else delete atom;
243
244 } // -x- void __assign -x-
245
246 /*======================================================================*//**
247 @brief
248 Clear internal data. Called by destructor and assign() methods.
249 *///=========================================================================
250 void __clear() {
251
252 // --------------------------------------------------------------------------
253 // Resources management: De-allocate all structures since a std::vector of
254 // pointers to these structures won't do this automatically when we clear()
255 // the entire vector.
256 // --------------------------------------------------------------------------
257 for (int i = (int)__data_points.size() - 1; i >= 0; i--) delete __data_points.at(i);
258 __data_points.clear();
259
260 // --------------------------------------------------------------------------
261 // Memory management.
262 // --------------------------------------------------------------------------
263 ::free(__origin);
264 __origin = nullptr;
265
266 } // -x- void __clear -x-
267
268 /*======================================================================*//**
269 @brief
270 Set a mode.
271 *///=========================================================================
272 void __mode(const char mode, const bool throw_exception = true) {
273 switch (mode) {
274 case 'k':
275 mode_k = true;
276 break;
277 case 'v':
278 mode_v = true;
279 break;
280 case 'p':
281 mode_p = true;
282 break;
283 case 'c':
284 mode_c = true;
285 break;
286 case 'f':
287 mode_f = true;
288 break;
289 case 'l':
290 mode_l = true;
291 break;
292 case 'u':
293 mode_u = true;
294 break;
295 default:
296 if (throw_exception) throw std::invalid_argument("unrecognized oeprator_mode \"" + std::to_string(mode) + "\"");
297 } // -x- switch mode -x-
298 } // -x- void __mode -x-
299
300 public:
301 /*======================================================================*//**
302 @brief
303 Instantiate an empty Atomize object, which is expected to be used with the
304 @ref assign method at some later point. (This is particularly useful for
305 defining a local Atomize object in a header file in a way that won't throw an
306 exception, including invalid mode codes {which will just be ignored instead
307 of throwing an exception}.)
308 @see assign
309 @see mode
310 *///=========================================================================
311 Atomize(
312 /// See @ref ATOMIZE_FLAGS for a list of options
313 const int flags,
314 /// Granulatarity (default is to return the entire atom): @n
315 /// @c 0 = don't set any modes (default) @n
316 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
317 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
318 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
319 /// Conversion options (default is for no conversion): @n
320 /// @c 'c' = Camel_Case @n
321 /// @c 'f' = First character in upper-case @n
322 /// @c 'l' = all lower case @n
323 /// @c 'u' = ALL UPPER CASE
324 const char mode) noexcept {
325 __flags = flags;
326 __mode(mode, false);
327 } // -x- constructor Atomize -x-
328
329 /*======================================================================*//**
330 @brief
331 Instantiate an empty Atomize object, which is expected to be used with the
332 @ref assign method at some later point. (This is particularly useful for
333 defining a local Atomize object in a header file in a way that won't throw an
334 exception, including invalid mode codes {which will just be ignored instead
335 of throwing an exception}.)
336 @see assign
337 @see mode
338 *///=========================================================================
339 Atomize(
340 /// See @ref ATOMIZE_FLAGS for a list of options
341 const int flags = ATOMIZE_DEFAULT,
342 /// Granulatarity (default is to return the entire atom): @n
343 /// @c nullptr = don't set any modes @n
344 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
345 /// @c "v" = value (will be empty if no key-value pair was detected) @n
346 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
347 /// Conversion options (default is for no conversion): @n
348 /// @c "c" = Camel_Case @n
349 /// @c "f" = First character in upper-case @n
350 /// @c "l" = all lower case @n
351 /// @c "u" = ALL UPPER CASE
352 const char* mode = nullptr) noexcept {
353 __flags = flags;
354 if (mode != nullptr) {
355 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
356 for (int i = 0; i < mode_len; i++) __mode(mode[i], false);
357 } // -x- if mode -x-
358 } // -x- constructor Atomize -x-
359
360 /*======================================================================*//**
361 @brief
362 Instantiate an Atomize object using the specified ASCIIZ string for intake.
363 @throws std::invalid_argument If the parameters are malformed in some way.
364 @see assign
365 @see mode
366 *///=========================================================================
367 Atomize(
368 /// The intake ASCIIZ string
369 const char* intake,
370 /// The length of the intake string@n
371 /// -1 = Measure ASCIIZ string
372 const int len,
373 /// See @ref FLAGS for a list of options
374 const int flags,
375 /// Granulatarity (default is to return the entire atom): @n
376 /// @c 0 = don't set any modes (default) @n
377 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
378 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
379 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
380 /// Conversion options (default is for no conversion): @n
381 /// @c 'c' = Camel_Case @n
382 /// @c 'f' = First character in upper-case @n
383 /// @c 'l' = all lower case @n
384 /// @c 'u' = ALL UPPER CASE
385 const char mode) {
386 __flags = flags;
387 __mode(mode, false);
388 __assign(intake, len >= 0 ? len : std::strlen(intake));
389 } // -x- constructor Atomize -x-
390
391 /*======================================================================*//**
392 @brief
393 Instantiate an Atomize object using the specified ASCIIZ string for intake.
394 @throws std::invalid_argument If the parameters are malformed in some way.
395 @see assign
396 @see mode
397 *///=========================================================================
398 Atomize(
399 /// The intake ASCIIZ string
400 const char* intake,
401 /// The length of the intake string@n
402 /// -1 = Measure ASCIIZ string
403 const int len = -1,
404 /// See @ref FLAGS for a list of options
405 const int flags = ATOMIZE_DEFAULT,
406 /// Granulatarity (default is to return the entire atom): @n
407 /// @c nullptr = don't set any modes @n
408 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
409 /// @c "v" = value (will be empty if no key-value pair was detected) @n
410 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
411 /// Conversion options (default is for no conversion): @n
412 /// @c "c" = Camel_Case @n
413 /// @c "f" = First character in upper-case @n
414 /// @c "l" = all lower case @n
415 /// @c "u" = ALL UPPER CASE
416 const char* mode = nullptr) {
417 __flags = flags;
418 if (mode != nullptr) {
419 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
420 for (int i = 0; i < mode_len; i++) __mode(mode[i], false);
421 } // -x- if mode -x-
422 __assign(intake, len >= 0 ? len : std::strlen(intake));
423 } // -x- constructor Atomize -x-
424
425 /*======================================================================*//**
426 @brief
427 Instantiate an Atomize object using the specified string for intake.
428 @throws std::invalid_argument If the parameters are malformed in some way.
429 @see assign
430 @see mode
431 *///=========================================================================
432 Atomize(
433 /// The intake C++ string
434 const std::string& intake,
435 /// The length of the intake string@n
436 /// -1 = Obtain length from @c intake.size() method
437 const int len,
438 /// See @ref FLAGS for a list of options
439 const int flags,
440 /// Granulatarity (default is to return the entire atom): @n
441 /// @c 0 = don't set any modes (default) @n
442 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
443 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
444 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
445 /// Conversion options (default is for no conversion): @n
446 /// @c 'c' = Camel_Case @n
447 /// @c 'f' = First character in upper-case @n
448 /// @c 'l' = all lower case @n
449 /// @c 'u' = ALL UPPER CASE
450 const char mode) {
451 __flags = flags;
452 __mode(mode, false);
453 __assign(intake.data(), len >= 0 ? len : intake.size());
454 } // -x- constructor Atomize -x-
455
456 /*======================================================================*//**
457 @brief
458 Instantiate an Atomize object using the specified string for intake.
459 @throws std::invalid_argument If the parameters are malformed in some way.
460 @see assign
461 @see mode
462 *///=========================================================================
463 Atomize(
464 /// The intake C++ string
465 const std::string& intake,
466 /// The length of the intake string@n
467 /// -1 = Obtain length from @c intake.size() method
468 const int len = -1,
469 /// See @ref FLAGS for a list of options
470 const int flags = ATOMIZE_DEFAULT,
471 /// Granulatarity (default is to return the entire atom): @n
472 /// @c nullptr = don't set any modes @n
473 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
474 /// @c "v" = value (will be empty if no key-value pair was detected) @n
475 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
476 /// Conversion options (default is for no conversion): @n
477 /// @c "c" = Camel_Case @n
478 /// @c "f" = First character in upper-case @n
479 /// @c "l" = all lower case @n
480 /// @c "u" = ALL UPPER CASE
481 const char* mode = nullptr) {
482 __flags = flags;
483 if (mode != nullptr) {
484 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
485 for (int i = 0; i < mode_len; i++) __mode(mode[i], false);
486 } // -x- if mode -x-
487 __assign(intake.data(), len >= 0 ? len : intake.size());
488 } // -x- constructor Atomize -x-
489
490 /*======================================================================*//**
491 @brief
492 Destructor.
493 @see clear
494 *///=========================================================================
495 ~Atomize() noexcept {
496 __clear();
497 if (__origin != nullptr) ::free(__origin); // Memory management
498 } // -x- constructor Atomize -x-
499
500 /*======================================================================*//**
501 @brief
502 Assign (and interpret) a new ASCIIZ string (flags and modes are inherited).
503 @throws std::invalid_argument If the parameters are malformed in some way.
504 @returns The same Atomize object so as to facilitate stacking
505 *///=========================================================================
506 Atomize& assign(
507 /// The intake ASCIIZ string
508 const char* intake,
509 /// The length of the intake string@n
510 /// -1 = Measure ASCIIZ string
511 const int len = -1) {
512 __clear();
513 __assign(intake, len >= 0 ? len : std::strlen(intake));
514 return *this;
515 } // -x- Atomize& assign -x-
516
517 /*======================================================================*//**
518 @brief
519 Assign (and interpret) a new string (flags and modes are inherited).
520 @throws std::invalid_argument If the parameters are malformed in some way.
521 @returns The same Atomize object so as to facilitate stacking
522 *///=========================================================================
523 Atomize& assign(
524 /// The intake C++ string
525 const std::string& intake,
526 /// The length of the intake string@n
527 /// -1 = Obtain length from @c intake.size() method
528 const int len = -1) {
529 __clear();
530 __assign(intake.data(), len >= 0 ? len : intake.size());
531 return *this;
532 } // -x- Atomize& assign -x-
533
534 /*======================================================================*//**
535 @brief
536 Access to atoms, whilst utilizing the operator mode that was configured using
537 the @ref mode method.
538 Return an entire atom.
539 @throws std::out_of_range if the index is out-of-range
540 @see get
541 @see get_key
542 @see get_value
543 @see mode
544 @see operator[](int)
545 @returns Entire atom, or portion, depending on the mode
546 *///=========================================================================
547 std::string at(
548 /// Which atom to obtain (0 = first atom; negative values count backward from
549 /// the last atom in the internal array)
550 int index,
551 /// Granulatarity (default is to return the entire atom): @n
552 /// @c 0 = don't change any modes (default) @n
553 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
554 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
555 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
556 /// Conversion options (default is for no conversion): @n
557 /// @c 'c' = Camel_Case @n
558 /// @c 'f' = First character in upper-case @n
559 /// @c 'l' = all lower case @n
560 /// @c 'u' = ALL UPPER CASE
561 const char mode) {
562
563 // --------------------------------------------------------------------------
564 // Internal variables.
565 // --------------------------------------------------------------------------
566 std::string str;
567 if (index < 0) index = __data_points.size() + index;
568 __atom* atom = __data_points[index];
569
570 // --------------------------------------------------------------------------
571 // Save mode.
572 // --------------------------------------------------------------------------
573 std::string old_modes(mode == 0 ? "" : this->mode());
574 if (mode != 0) this->mode(mode);
575
576 // --------------------------------------------------------------------------
577 // Presence of key-value pair (results in ignoring all other modes).
578 // --------------------------------------------------------------------------
579 if (mode_p) {
580 if (atom->vpos != 0) str.assign("1");
581 return str;
582 } // -x- if mode_p -x-
583
584 // --------------------------------------------------------------------------
585 // Key/Value mode.
586 // --------------------------------------------------------------------------
587 if (mode_k) str.assign(__origin, atom->fpos, atom->klen);
588 else if (mode_v) str.assign(__origin, atom->vpos, atom->vlen);
589 else str.assign(__origin, atom->fpos, atom->flen);
590
591 // --------------------------------------------------------------------------
592 // Conversion modes.
593 // --------------------------------------------------------------------------
594 char* data = str.data();
595 if (mode_l) { // All lower-case
596 for (size_t i = 0; i < str.size(); i++) {
597 char ch = data[i];
598 if (ch >= 'A' && ch <= 'Z') data[i] += 32; // Convert to lower-case
599 } // -x- for i -x-
600 } else if (mode_u) { // All upper-case
601 for (size_t i = 0; i < str.size(); i++) {
602 char ch = data[i];
603 if (ch >= 'a' && ch <= 'z') data[i] -= 32; // Convert to upper-case
604 } // -x- for i -x-
605 } else if (mode_c) { // Camel_Case
606 char pch = 0;
607 for (size_t i = 0; i < str.size(); i++) {
608 char ch = data[i];
609 if (pch < 'A' || (pch > 'Z' && pch < 'a') || pch > 'z') {
610 if (ch >= 'a' && ch <= 'z')
611 data[i] -= 32; // Convert to upper-case
612 } // -x- if pch -x-
613 pch = data[i];
614 } // -x- for i -x-
615 } else if (mode_f) { // First letter
616 char ch = data[0];
617 if (ch >= 'a' && ch <= 'z') data[0] -= 32; // Convert to upper-case
618 } // -x- if mode_c -x-
619
620 // --------------------------------------------------------------------------
621 // Restore mode.
622 // --------------------------------------------------------------------------
623 if (mode != 0) this->mode(old_modes.data());
624
625 return str;
626 } // -x- std::string at -x-
627
628 /*======================================================================*//**
629 @brief
630 Access to atoms, whilst utilizing the operator mode that was configured using
631 the @ref mode method.
632 Return an entire atom.
633 @throws std::out_of_range if the index is out-of-range
634 @see get
635 @see get_key
636 @see get_value
637 @see mode
638 @see operator[](int)
639 @returns Entire atom, or portion, depending on the mode
640 *///=========================================================================
641 std::string at(
642 /// Which atom to obtain (0 = first atom; negative values count backward from
643 /// the last atom in the internal array)
644 int index,
645 /// Granulatarity (default is to return the entire atom): @n
646 /// @c nullptr = don't change any modes (default) @n
647 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
648 /// @c "v" = value (will be empty if no key-value pair was detected) @n
649 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
650 /// Conversion options (default is for no conversion): @n
651 /// @c "c" = Camel_Case @n
652 /// @c "f" = First character in upper-case @n
653 /// @c "l" = all lower case @n
654 /// @c "u" = ALL UPPER CASE
655 const char* mode = nullptr) {
656
657 // --------------------------------------------------------------------------
658 // Internal variables.
659 // --------------------------------------------------------------------------
660 std::string str;
661 if (index < 0) index = __data_points.size() + index;
662 __atom* atom = __data_points[index];
663
664 // --------------------------------------------------------------------------
665 // Save mode.
666 // --------------------------------------------------------------------------
667 std::string old_modes(mode == nullptr ? "" : this->mode());
668 if (mode != nullptr) this->mode(mode);
669
670 // --------------------------------------------------------------------------
671 // Presence of key-value pair (results in ignoring all other modes).
672 // --------------------------------------------------------------------------
673 if (mode_p) {
674 if (atom->vpos != 0) str.assign("1");
675 return str;
676 } // -x- if mode_p -x-
677
678 // --------------------------------------------------------------------------
679 // Key/Value mode.
680 // --------------------------------------------------------------------------
681 if (mode_k) str.assign(__origin, atom->fpos, atom->klen);
682 else if (mode_v) str.assign(__origin, atom->vpos, atom->vlen);
683 else str.assign(__origin, atom->fpos, atom->flen);
684
685 // --------------------------------------------------------------------------
686 // Conversion modes.
687 // --------------------------------------------------------------------------
688 char* data = str.data();
689 if (mode_l) { // All lower-case
690 for (size_t i = 0; i < str.size(); i++) {
691 char ch = data[i];
692 if (ch >= 'A' && ch <= 'Z') data[i] += 32; // Convert to lower-case
693 } // -x- for i -x-
694 } else if (mode_u) { // All upper-case
695 for (size_t i = 0; i < str.size(); i++) {
696 char ch = data[i];
697 if (ch >= 'a' && ch <= 'z') data[i] -= 32; // Convert to upper-case
698 } // -x- for i -x-
699 } else if (mode_c) { // Camel_Case
700 char pch = 0;
701 for (size_t i = 0; i < str.size(); i++) {
702 char ch = data[i];
703 if (pch < 'A' || (pch > 'Z' && pch < 'a') || pch > 'z') {
704 if (ch >= 'a' && ch <= 'z')
705 data[i] -= 32; // Convert to upper-case
706 } // -x- if pch -x-
707 pch = data[i];
708 } // -x- for i -x-
709 } else if (mode_f) { // First letter
710 char ch = data[0];
711 if (ch >= 'a' && ch <= 'z') data[0] -= 32; // Convert to upper-case
712 } // -x- if mode_c -x-
713
714 // --------------------------------------------------------------------------
715 // Restore mode.
716 // --------------------------------------------------------------------------
717 if (mode != nullptr) this->mode(old_modes.data());
718
719 return str;
720 } // -x- std::string at -x-
721
722 /*======================================================================*//**
723 @brief
724 Clear this Atomize's underlying data and reset all states.
725 @returns The same Atomize object so as to facilitate stacking
726 @see empty
727 @see size
728 *///=========================================================================
729 Atomize& clear() {
730 __clear();
731 return *this;
732 } // -x- Atomize& clear -x-
733
734 /*======================================================================*//**
735 @brief
736 Confirm that there are no atoms.
737 @returns TRUE = no atoms@n
738 FALSE = at least one atom exists
739 @see clear
740 @see size
741 *///=========================================================================
742 bool empty() {
743 return __data_points.empty();
744 } // -x- bool empty -x-
745
746 /*======================================================================*//**
747 @brief
748 Obtain current set of internal flags.
749 @returns Current flags, as defined in @ref ATOMIZE_FLAGS
750 @see mode
751 *///=========================================================================
752 int flags() {
753 return __flags;
754 } // -x- int flags -x-
755
756 /*======================================================================*//**
757 @brief
758 Obtain current set of internal flags.
759 @returns The same Atomize object so as to facilitate stacking
760 @see mode
761 *///=========================================================================
762 Atomize& flags(
763 /// See @ref FLAGS for a list of options
764 const int flags = ATOMIZE_DEFAULT) {
765 __flags = flags;
766 return *this;
767 } // -x- Atomize& flags -x-
768
769 /*======================================================================*//**
770 @brief
771 Return the entire atom.
772 @throws std::out_of_range if the index is out-of-range
773 @returns Key portion of atom (or the entire atom if a key-value pair wasn't
774 detected)
775 @see at
776 @see get_key
777 @see get_value
778 @see has_kv
779 @see mode
780 @see operator[](int)
781 *///=========================================================================
782 std::string get(
783 /// Which atom to obtain (0 = first atom; negative values count backward from
784 /// the last atom in the internal array)
785 int index) {
786 if (index < 0) index = __data_points.size() + index;
787 __atom* atom = __data_points[index];
788 return std::string(__origin, atom->fpos, atom->flen);
789 } // -x- std::string get -x-
790
791 /*======================================================================*//**
792 @brief
793 Return the key portion of an atom, or the entire atom if a key-vlue pair
794 wasn't detected.
795 @throws std::out_of_range if the index is out-of-range
796 @returns Key portion of atom (or the entire atom if a key-value pair wasn't
797 detected)
798 @see at
799 @see get
800 @see get_value
801 @see has_kv
802 @see mode
803 @see operator[](int)
804 *///=========================================================================
805 std::string get_key(
806 /// Which atom to obtain (0 = first atom; negative values count backward from
807 /// the last atom in the internal array)
808 int index) {
809 if (index < 0) index = __data_points.size() + index;
810 __atom* atom = __data_points[index];
811 return std::string(__origin, atom->fpos, atom->klen);
812 } // -x- std::string get_key -x-
813
814 /*======================================================================*//**
815 @brief
816 Return the value portion of an atom, or an empty string if a key-vlue pair
817 wasn't detected.
818 @throws std::out_of_range if the index is out-of-range
819 @returns Value portion of atom (or an empty string if a key-value pair wasn't
820 detected)
821 @see at
822 @see get
823 @see get_key
824 @see has_kv
825 @see mode
826 @see operator[](int)
827 *///=========================================================================
828 std::string get_value(
829 /// Which atom to obtain (0 = first atom; negative values count backward from
830 /// the last atom in the internal array)
831 int index) {
832 if (index < 0) index = __data_points.size() + index;
833 __atom* atom = __data_points[index];
834 return std::string(atom->vpos != 0 ? std::string(__origin, atom->vpos, atom->vlen) : "");
835 } // -x- std::string get_value -x-
836
837 /*======================================================================*//**
838 @brief
839 Indicates whether the specified atom was split into a key-value pair (if it
840 was, then the @c key and the @c value are delimited by the first instance of
841 an equal sign {`=`}).
842 @throws std::out_of_range if the index is out-of-range
843 @returns TRUE = key-value pair was detected by the parsing algorithm
844 FALSE = this atom was not split into a key-value pair
845 @see at
846 @see get
847 @see get_key
848 @see get_value
849 @see mode
850 @see operator[](int)
851 *///=========================================================================
852 bool has_kv(
853 /// Which atom to obtain (0 = first atom; negative values count backward from
854 /// the last atom in the internal array)
855 int index) {
856 if (index < 0) index = __data_points.size() + index;
857 __atom* atom = __data_points[index];
858 return atom->vpos != 0;
859 } // -x- bool has_kv -x-
860
861 /*======================================================================*//**
862 @brief
863 Get the operator modes that are set for the @ref operator[] operator.
864 @throws std::invalid_argument if an incorrect value is provided
865 @returns The same Atomize object so as to facilitate stacking
866 @see at
867 @see flags
868 @see get
869 @see get_key
870 @see get_value
871 @see has_kv
872 @see operator[](int)
873 @see mode(const char*)
874 *///=========================================================================
875 std::string mode() noexcept {
876 std::string modes;
877 if (mode_k) modes.append("k");
878 if (mode_v) modes.append("v");
879 if (mode_p) modes.append("p");
880 if (mode_c) modes.append("c");
881 if (mode_f) modes.append("f");
882 if (mode_l) modes.append("l");
883 if (mode_u) modes.append("u");
884 return modes;
885 } // -x- std::string mode -x-
886
887 /*======================================================================*//**
888 @brief
889 Set the operator modes for use with the @ref operator[] operator (modes that
890 are not specified will be reset to their defaults).
891
892 Calling this method with @c 0 as the parameter will result in resetting all
893 operator modes to their defaults.
894 @throws std::invalid_argument if an incorrect value is provided
895 @returns The same Atomize object so as to facilitate stacking
896 @see at
897 @see flags
898 @see get
899 @see get_key
900 @see get_value
901 @see has_kv
902 @see operator[](int)
903 *///=========================================================================
904 Atomize& mode(
905 /// Granulatarity (default is to return the entire atom): @n
906 /// @c nullptr = clear all modes (default) @n
907 /// @c "k" = key (same as 0 if no key-value pair was detected) @n
908 /// @c "v" = value (will be empty if no key-value pair was detected) @n
909 /// @c "p" = returns "1" if key-value pair is present, or an empty string if not
910 /// Conversion options (default is for no conversion): @n
911 /// @c "c" = Camel_Case @n
912 /// @c "f" = First character in upper-case @n
913 /// @c "l" = all lower case @n
914 /// @c "u" = ALL UPPER CASE
915 const char* mode) {
916
917 // --------------------------------------------------------------------------
918 // Clear all settings.
919 // --------------------------------------------------------------------------
920 mode_k = false;
921 mode_v = false;
922 mode_p = false;
923 mode_c = false;
924 mode_f = false;
925 mode_l = false;
926 mode_u = false;
927
928 // --------------------------------------------------------------------------
929 // Syntax checks.
930 // --------------------------------------------------------------------------
931 if (mode == nullptr) return *this;
932
933 // --------------------------------------------------------------------------
934 // Set modes (duplicates are effectively ignored, and 0 never shows up here
935 // because it terminates the string).
936 // --------------------------------------------------------------------------
937 const int mode_len = ::strnlen(mode, ATOMIZE_MAX_MODES);
938 for (int i = 0; i < mode_len; i++) __mode(mode[i]);
939
940 return *this;
941 } // -x- Atomize& mode -x-
942
943 /*======================================================================*//**
944 @brief
945 Set the operator modes for use with the @ref operator[] operator (modes that
946 are not specified will be reset to their defaults).
947
948 Calling this method with @c 0 as the parameter will result in resetting all
949 operator modes to their defaults.
950 @throws std::invalid_argument if an incorrect value is provided
951 @returns The same Atomize object so as to facilitate stacking
952 @see at
953 @see flags
954 @see get
955 @see get_key
956 @see get_value
957 @see has_kv
958 @see operator[](int)
959 *///=========================================================================
960 Atomize& mode(
961 /// Granulatarity (default is to return the entire atom): @n
962 /// @c 0 = entire atom (default) @n
963 /// @c 'k' = key (same as 0 if no key-value pair was detected) @n
964 /// @c 'v' = value (will be empty if no key-value pair was detected) @n
965 /// @c 'p' = returns "1" if key-value pair is present, or an empty string if not
966 /// Conversion options (default is for no conversion): @n
967 /// @c 'c' = Camel_Case @n
968 /// @c 'f' = First character in upper-case @n
969 /// @c 'l' = all lower case @n
970 /// @c 'u' = ALL UPPER CASE
971 const char mode) {
972
973 // --------------------------------------------------------------------------
974 // Reset all modes.
975 // --------------------------------------------------------------------------
976 mode_k = false;
977 mode_v = false;
978 mode_p = false;
979 mode_c = false;
980 mode_f = false;
981 mode_l = false;
982 mode_u = false;
983
984 // --------------------------------------------------------------------------
985 // Set mode.
986 // --------------------------------------------------------------------------
987 __mode(mode);
988
989 return *this;
990 } // -x- Atomize& mode -x-
991
992 /*======================================================================*//**
993 @brief
994 Return the total quantity of atoms.
995 @returns Quantity of atoms
996 @see clear
997 @see empty
998 *///=========================================================================
999 size_t size() {
1000 return __data_points.size();
1001 } // -x- size_t size -x-
1002
1003 /*======================================================================*//**
1004 @brief
1005 Generate an std::vector<std::string> that contains all atoms.
1006 @returns std::string
1007 *///=========================================================================
1008 std::vector<std::string> to_vector(
1009 /// FALSE = don't split key-value pairs (default) @n
1010 /// TRUE = split key-value pairs into separate entries (for key names, the
1011 /// equal sign will be included at the end of the string)
1012 bool split_kv_pairs = false) noexcept {
1013 std::vector<std::string> v;
1014
1015 // --------------------------------------------------------------------------
1016 // Splitting key-value pairs is best handled in a separate loop.
1017 // --------------------------------------------------------------------------
1018 if (split_kv_pairs) {
1019 for (size_t i = 0; i < __data_points.size(); i++) {
1020 __atom* atom = __data_points[i];
1021 if (atom->vpos != 0) { // Non-zero indicates that a key-value pair was detected
1022 v.push_back(std::string(__origin, atom->fpos, atom->klen + 1)); // +1 includes equal sign
1023 v.push_back(std::string(__origin, atom->vpos, atom->vlen));
1024 } else { // No key-value pair was detected
1025 v.push_back(std::string(__origin, atom->fpos, atom->flen));
1026 } // -x- if atom->vpos -x-
1027 } // -x- for i -x-
1028 return v;
1029 } // -x- if split_kv_pairs -x-
1030
1031 // --------------------------------------------------------------------------
1032 // Full atoms requires is straight-forward.
1033 // --------------------------------------------------------------------------
1034 for (size_t i = 0; i < __data_points.size(); i++) {
1035 __atom* atom = __data_points[i];
1036 v.push_back(std::string(__origin, atom->fpos, atom->flen));
1037 } // -x- for i -x-
1038
1039 return v;
1040 } // -x- std::vector<std::string> to_vector -x-
1041
1042 /*======================================================================*//**
1043 @brief
1044 Array-style access to atoms, whilst utilizing the operator mode that was
1045 configured using the @ref mode method.
1046 @throws std::out_of_range if the index is out-of-range
1047 @returns std::string
1048 @see at
1049 @see get
1050 @see get_key
1051 @see get_value
1052 @see has_kv
1053 @see mode
1054 @see operator[](int)
1055 *///=========================================================================
1056 std::string operator[](
1057 /// Index of character to access (0 = first atom; negative index values are
1058 /// calculated in reverse, starting with -1 as the final atom)
1059 int index) {
1060 return at(index);
1061 } // -x- std::string operator[] -x-
1062
1063 }; // -x- class Atomize -x-
1064
1065}; // -x- namespace randolf -x-