They’re the sequences that make up text. A String is a Sequence of characters. Since they’re sequences, many of the tips and tricks that work for Arrays also work for strings.
Famous String DS and Algos
// I’ve never actually used these but good to know and read about. Common data structures for looking up strings:
- Trie/Prefix Tree
- Suffix Tree Common string algorithms:
- Rabin Karp for efficient searching of substring using a rolling hash
- KMP for efficient searching of substring
Time complexity
A strings is an array of characters, so the time complexities of basic string operations will closely resemble that of array operations.
Operation | Big-O |
---|---|
Access | O(1) |
Search | O(n) |
Insert | O(n) |
Remove | O(n) |
Common Operations involving another string
Here we assume the other string is of length m
.
Operation | Big-O | Note |
---|---|---|
Find substring | O(n.m) | This is the most naive case. There are more efficient algorithms for string searching such as the KMP algorithm |
Concatenating strings | O(n + m) | |
Slice | O(m) | |
Split (by token) | O(n + m) | |
Strip (remove leading and trailing whitespaces) | O(n) |
Common Edge Cases
- Empty strings :
""
- Strings of length 1 or 2.
- Strings with Distinct characters.
Common Problems and Techniques
Counting Characters
Most common and straight-forward method of counting the frequency of characters in a string using a HashMap or HashTable.
If you need to keep a counter of characters, a common mistake is to say that the space complexity required for the counter is O(n). The space required for a counter of a string of latin characters is O(1) not O(n). This is because the upper bound is the range of characters, which is usually a fixed constant of 26. The input set is just lowercase Latin characters.
String of Unique Characters
One way to count characters in a string that’s rather cool is using a BitMask, a 26-bit bitmask to be specific.
mask = 0
for c in word:
mask |= (1 << (ord(c) - ord('a')))
To determine if two strings have common characters, perform &
on the two bitmasks. If the result is non-zero, i.e.. mask_a & mask_b > 0
, then the two strings have common characters.
Anagram
An Anagram is a new string produced by rearranging the letters in the original string. Most coding problems are concerned with word anagrams with no spaces in between.
Checking anagram
- Sort both strings and compare them. This should be
O(n*logn)
time complexity andO(log(n))
space. - Map each character to a prime number and multiply them together, then anagrams should have the same product. This takes O(n) time and O(1) space. Examples: Group Anagram
- Frequency counting of characters will help to determine if two strings are anagrams. This also takes O(n) time and O(1) space.
Palindrome
A Palindrome is a word, phrase, number, or other sequence of characters which reads the same backward as forward. My favorite is “racecar”. When dealing with palindrome problems, it’s best to consider lowercase only. Review this constraint with the interviewer…
Two approaches
- Reverse the string and it should be equal to itself.
- Have two pointers at the start and end of the string. Move the pointers inward till they meet. At every point in time, the characters at both pointers should match. //This one's my favorite, I call it the #TwoPointerSqueeze method.
Hashtables don’t help here as the whole name of the game is Order.
For the TwoPointerSqueeze method remember to account for odd and even lengths… See Strings
Tips
- For substrings, you can terminate early once there is no match
- For subsequences, use dynamic programming as there are overlapping subproblems. Check out this question
Essential questions
Recommended practice questions
Try these after completing the essentials