In the realm of computer science and algorithms, the quest to find patterns and repetitions in data is an ever-present challenge. One such intriguing pattern is the "Longest Repeating Subsequence", which we will delve into in this article and explore its significance.
What is the Longest Repeating Subsequence?
Before we dive into the complexities of finding the longest repeating subsequence, let us first clarify what it entails. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements, without changing the order of the remaining elements. Thus, a repeating subsequence is a subsequence that occurs at least twice within the original sequence.
A repeating subsequence refers to a sequence of characters that appears more than once within a given string, while the longest repeating subsequence denotes the maximum length of such a subsequence.
For example, consider the sequence ACTFABCT. The length of the longest repeating subsequence is 3.
The longest repeating subsequence is
ACT: A C T F A B C T
A C T F A B C T
Note that repeated characters hold a different index in the input string.
The longest repeating subsequence presents holds significant value in various applications, including data compression, and plagiarism detection.
By identifying and representing the longest repeating subsequence, one can significantly reduce the storage space required while preserving the original information. In textual data analysis, the longest repeating subsequence can be used to detect similarities between documents and identify instances of plagiarism.
How to Find the Longest Repeating Subsequence?
Discovering the longest repeating subsequence is a challenging task that requires efficient algorithms. One popular approach is based on dynamic programming.
The dynamic programming approach involves constructing a table to store intermediate results and building upon them to find the longest repeating subsequence. Let's outline the steps involved:
-
Define a table, say dp, of size (n+1) x (n+1), where n is the length of the given string. Initialize all elements of the table to 0.
-
Traverse the string character by character using two pointers, i and j. Compare each pair of characters at positions i and j.
-
If the characters at positions i and j are the same and the indices i and j are distinct, update the corresponding element in the table as dp[i][j] = 1 + dp[i-1][j-1].
-
If the characters at positions i and j are different, take the maximum of dp[i-1][j] and dp[i][j-1] and store it in dp[i][j].
-
Repeat steps 3 and 4 until all characters have been compared.
-
The bottom-right element of the table, i.e., dp[n][n], will hold the length of the longest repeating subsequence.
-
To reconstruct the longest repeating subsequence, start from dp[n][n] and trace back the characters by comparing dp[i][j] with dp[i-1][j-1]. If they are equal, append the corresponding character to the result and move diagonally up-left (i.e., decrement both i and j). Repeat this step until you reach the top-left corner of the table.
By following these steps, you can find the length of the longest repeating subsequence and also obtain the actual subsequence itself.
Implementation in C++
Here's an example of a C++ code to find the longest repeating subsequence in a given string:
#include<iostream> #include<vector> using namespace std; string longestRepeatingSubsequence(const string& str) { int n = str.length(); vector<vector<int>> dp(n + 1, vector<int>(n + 1, 0)); // Fill the dp table for (int i = 1; i <= n; i++) { for (int j = 1; j <= n; j++) { if (str[i - 1] == str[j - 1] && i != j) { dp[i][j] = 1 + dp[i - 1][j - 1]; } else { dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]); } } } // Reconstruct the longest repeating subsequence string result = ""; int i = n, j = n; while (i > 0 && j > 0) { if (dp[i][j] == dp[i - 1][j - 1] + 1) { result = str[i - 1] + result; i--; j--; } else if (dp[i][j] == dp[i - 1][j]) { i--; } else { j--; } } return result; } int main() { string str = "ACTFABCT"; string longestSubsequence = longestRepeatingSubsequence(str); cout << "Longest repeating subsequence: " << longestSubsequence << endl; return 0; }
Output:
Longest repeating subsequence: ACT
In this code, we define the 'longestRepeatingSubsequence' function that takes a string as input and returns the longest repeating subsequence. We use a dynamic programming approach to fill the dp table, considering the cases where characters match and where they don't.
Then, we reconstruct the subsequence by backtracking through the dp table. Finally, in the main
function, we provide an example string "ACTFABCT" and print the result.
Please note that this code is a basic implementation and may not be optimized for efficiency.
Time & Space Complexity
The code uses nested loops to fill the dp table, iterating over all characters of the input string. Therefore, The time complexity of the dynamic programming part is O(n^2), where n is the length of the input string.
The space complexity is determined by the size of the dp table. In this case, the dp table is a 2D vector of size (n+1) x (n+1), where n is the length of the input string. Hence, the space complexity is O(n^2) as well.
Optimized Approach to Find LRS
Here's an optimized approach to the code with improved space complexity. In this optimized code, we use two vectors, prev and curr, to store the values of the dynamic programming table row by row instead of creating a complete 2D table. This reduces the space complexity to O(n), where n is the length of the input string.
The logic for filling the dp table and reconstructing the subsequence remains the same as in the previous version. Here is the C++ implementation:
#include<iostream> #include<vector> #include<algorithm> using namespace std; string longestRepeatingSubsequence(const string& str) { int n = str.length(); vector<int> prev(n + 1, 0); vector<int> curr(n + 1, 0); // Fill the dp table for (int i = 1; i <= n; i++) { for (int j = 1; j <= n; j++) { if (str[i - 1] == str[j - 1] && i != j) { curr[j] = 1 + prev[j - 1]; } else { curr[j] = max(curr[j - 1], prev[j]); } } prev = curr; } // Reconstruct the longest repeating subsequence string result = ""; int i = n, j = n; while (i > 0 && j > 0) { if (str[i - 1] == str[j - 1] && i != j) { result = str[i - 1] + result; i--; j--; } else if (curr[j] == curr[j - 1]) { j--; } else { i--; } } return result; } int main() { string str = "ACTFABCT"; string longestSubsequence = longestRepeatingSubsequence(str); cout << "Longest repeating subsequence: " << longestSubsequence << endl; return 0; }
Output:
Longest repeating subsequence: ACT
Please note that the time complexity remains O(n^2) since we still need to fill the dp table using nested loops. However, the space complexity is now optimized to O(n), which is more efficient in terms of memory usage.
Conclusion
By employing dynamic programming techniques, it becomes feasible to efficiently identify the longest repeating subsequence and utilize it in diverse domains. As the field of computer science continues to evolve, the exploration of patterns like the longest repeating subsequence contributes to a deeper understanding of data and opens new possibilities for innovative solutions.