Skip to content

Basic String Processing🔗

Application of String Processing🔗

  • Information Processing
  • Genomics
  • Communication Books
  • Programming Systems

Rules of the Game

For clarity there are several assumptions and definitions that we will make use of ahead.

  • Characters : A String is a sequence of characters. Characters are of type char.
  • Immutability : String objects are immutable. Their values doesn't change while assigning statements and as arguments and return values.
  • Indexing : Operation to extract a specified character from a string.
  • Length : Size of the string
  • Substring : extract a specified substring operation. We expect a constant time implementation of this.
  • Concatenation : Creating a new string by appending one string to another.
  • Character Arrays : We can alternately use an array of character as direct abstraction for string.

Main point to focus here is understanding the efficiency of each operation.

Not all languages provide same implementation of String for e.g. in C it may take linear time to determine the length of String.

Basic Problem Categorization in String Processing🔗

  • Searching and Matching
    • Pattern Matching
    • Exact Match
    • Approximate Match
  • Transformation
    • Reversal
    • Substitution
  • Parsing and Tokenization
    • Splitting
    • Parsing
  • Dynamic String Construction
    • Concatenation
    • Building Substrings
  • Structural Analysis

C++/Python Strings🔗

Feature Python C++
String Mutability Immutable strings (must create new) Mutable with std::string or char[].
Indexing 0-based; supports negative indexing. 0-based; no negative indexing.
Length O(1) via len(). O(1) via string.size().
Substring Extraction O(1) with slicing (s[start:end]). O(n), substr() creates a copy.
Concatenation O(n) due to creation of new strings. O(n) via + or append().
Character Arrays Not used, strings directly used. Commonly used with char[] or std::string.

Common String Operations🔗

Checking Palindrome🔗

  • Python : s == s[::-1] (Read about python slices)
  • C++ : use std::reverse() and compare

Converting Case🔗

  • Python : s.upper(), s.lower()
  • C++: Use transform() from <algorithms>

Sorting Characters:🔗

  • Python: sorted(s) returns a list
  • C++: sort(s.begin(), s.end())

Counting Frequency🔗

  • Python: collections.Counter(s) or s.count(ch)
  • C++: Use a frequency array for faster processing

NOTES on String Efficiency🔗

  • Avoid concatenation in loops
    • in python join() is faster
    • Use ostringstream or stringstream in C++ for efficient appending
  • Use Substrings Sparingly
    • python slices are efficient
    • but avoid using substr in C++ to avoid copy overhead
  • Precompute lengths
    • repeated calls to len or size are unnecessary, better to cache in a variable.

Problems🔗

  1. https://leetcode.com/problems/roman-to-integer/description/
  2. https://leetcode.com/problems/string-to-integer-atoi/ painful to process question :)
  3. https://leetcode.com/problems/encode-and-decode-strings/description/
  4. https://leetcode.com/problems/string-compression/description/