They’re the sequences that make up text. A String is a Sequence of characters. Since they’re sequences, many of the tips and tricks that work for Arrays also work for strings.

Famous String DS and Algos

// I’ve never actually used these but good to know and read about. Common data structures for looking up strings:

Trie/Prefix Tree
Suffix Tree Common string algorithms:
Rabin Karp for efficient searching of substring using a rolling hash
KMP for efficient searching of substring

Time complexity

A strings is an array of characters, so the time complexities of basic string operations will closely resemble that of array operations.

Operation	Big-O
Access	O(1)
Search	O(n)
Insert	O(n)
Remove	O(n)

Common Operations involving another string

Here we assume the other string is of length m.

Operation	Big-O	Note
Find substring	O(n.m)	This is the most naive case. There are more efficient algorithms for string searching such as the KMP algorithm
Concatenating strings	O(n + m)
Slice	O(m)
Split (by token)	O(n + m)
Strip (remove leading and trailing whitespaces)	O(n)

Common Edge Cases

Empty strings : ""
Strings of length 1 or 2.
Strings with Distinct characters.

Common Problems and Techniques

Counting Characters

Most common and straight-forward method of counting the frequency of characters in a string using a HashMap or HashTable.

If you need to keep a counter of characters, a common mistake is to say that the space complexity required for the counter is O(n). The space required for a counter of a string of latin characters is O(1) not O(n). This is because the upper bound is the range of characters, which is usually a fixed constant of 26. The input set is just lowercase Latin characters.

String of Unique Characters

One way to count characters in a string that’s rather cool is using a BitMask, a 26-bit bitmask to be specific.

mask = 0
for c in word:
  mask |= (1 << (ord(c) - ord('a')))

To determine if two strings have common characters, perform & on the two bitmasks. If the result is non-zero, i.e.. mask_a & mask_b > 0, then the two strings have common characters.

Anagram

An Anagram is a new string produced by rearranging the letters in the original string. Most coding problems are concerned with word anagrams with no spaces in between.

Checking anagram

Sort both strings and compare them. This should be O(n*logn) time complexity and O(log(n)) space.
Map each character to a prime number and multiply them together, then anagrams should have the same product. This takes O(n) time and O(1) space. Examples: Group Anagram
Frequency counting of characters will help to determine if two strings are anagrams. This also takes O(n) time and O(1) space.

Palindrome

A Palindrome is a word, phrase, number, or other sequence of characters which reads the same backward as forward. My favorite is “racecar”. When dealing with palindrome problems, it’s best to consider lowercase only. Review this constraint with the interviewer…

Two approaches

Reverse the string and it should be equal to itself.
Have two pointers at the start and end of the string. Move the pointers inward till they meet. At every point in time, the characters at both pointers should match. //This one's my favorite, I call it the #TwoPointerSqueeze method.

Hashtables don’t help here as the whole name of the game is Order.

For the TwoPointerSqueeze method remember to account for odd and even lengths… See String Problems

Tips

For substrings, you can terminate early once there is no match
For subsequences, use dynamic programming as there are overlapping subproblems. Check out this question

Abdu's Garden

Explorer

Strings