The purpose of the algorithm is to find the shortest edit script to convert array A to array B.
The whole process can be visualized with a graph:
Moving along the X axis removes an element from array A; moving along the Y axis adds an element from array B.
The graph also contains diagonal paths that correspond to matching elements in both arrays.
The starting point at (0, 0) represents the state in which array A (in this case, an array of characters) is unchanged. Tne end point at (A.Length, B.Length) represents the state in which arrays A and B are equal. Our goal is to find the shortest path from the top-left corner to the bottom-right corner.
The process is divided into 2 jobs:
- Finding the shortest path;
- Backtracking using data from the first step to determine the exact actions needed to transform array A into array B.
Here we need to find the shortest path from point (0, 0) to (A.Length, B.Length).
First, we need to create a few variables:
int maxSize = A.Length + B.Length + 1
- this is the maximum number of steps that the algorithm can take to transform A to B, plus the starting position (i.e., by removing each element from A and inserting each element from B);int[] editGraph = new int[maxSize]
- the array to store the best possible move at each step;int[][] trace = new int[maxSize][]
- the array that stores snapshots of editGraph at the end of each turn. Required to be able to backtrack properly;int offset = B.Length
- the offset that will be used to index editGraph;int minK = -B.Length
- the lowest possible value of K;int maxK = A.Length
-the highest possible value of K;int d = 0
- the number of steps taken during iteration.
In conjunction with (X, Y) coordinates, the algorithm uses another set of coordinates: (D, K), where D is the current step and K is the difference between X and Y.
Our job for each (D, K) node is to find the optimal path from previous nodes. For most nodes, we can access them in 2 ways: by moving downward (adding an element from B) from (D - 1, K + 1) or by moving rightward (removing an element from A) from (D - 1, K - 1). We do it by taking the highest value from the editGraph at indices: editGraph[offset + K + 1] and editGraph[offset + K - 1], respectively. If both values are equal, we choose the K - 1 value, which prioritizes deletion over insertion. If we decide to move rightward, we increment X by one. Once we've figured the X value, we calculate the Y value: Y = X - K. Having both X and Y, we can check for diagonals: compare A[X] and B[Y]. If they're the same, we increment both coordinates by one; repeat. In the end, we write the X value at the editGraph[offset + K] and move to the next K.
Note
We can observe a pattern: deleting an element will always increment K by one and inserting an element will always decrement K by one.
The exact algorithm will be:
- Create a K value in the range [-D; D] with increments of 2;
- Check if the K value is in the range of [minK; maxK]. If it isn't, the node is physically unreachable and should be skipped;
- Take the highest X value from adjacent nodes at indices K - 1 and K + 1;
- Calculate the Y value: Y = X - K;
- Check if the (X, Y) coordinates are both within the ranges of [0; A.Length] and [0; B.Length], respectively. If one of them isn't, then the node is invalid and should be skipped;
- Check if the A[X] is equal to B[Y]. If they are, increment X and Y by one. Repeat this step until A[X] is not equal to B[Y].
- Write a copy of the editGraph to the trace[d];
- Check if the (X, Y) coordinates are equal to the (A.Length, B.Length). If they are, the bottom-right corner has been reached, so return the trace.
We have strings "ABCABBA" and "CBABAC".
Initialize the editGraph: [-1, -1, -1, -1, -1, -1, 0, -1, -1, -1, -1, -1, -1, -1]. The offset (B.Length) is underlined.
- D = 1; K = -1, 1.
- K = -1. Since K is equal to -d, the only way to achieve (D, K) = (1, -1) is to move down from (0, 0). editGraph[offset + 0] is set to 0. Because we move downward, we don't modify X. Y = X - K = 0 - (-1) = 1: (X, Y) = (0, 1). There are no diagonals, so we don't modify any values.
editGraph[offset + (-1)] = X = 0;
- K = 1. Since K is equal to d, the only way to achieve (D, K) = (1, 1) is to move right from (0, 0). editGraph[offset + 0] is set to 0. Because we move rightward, we increment X by 1. Y = K - X = 1 - 1 = 0: (X, Y) = (1, 0). There are no diagonals, so we don't modify any values.
editGraph[offset + 1] = X = 1.
K = 1 is the last value. The graph is [-1, -1, -1, -1, -1, 0, 0, 1, -1, -1, -1, -1, -1, -1]. Copy the graph to trace[d] and increment d by one.
- D = 2; K = -2, 0, 2.
- K = -2. Since K is equal to -d, the only way to achieve (D, K) = (2, -2) is to move down from (1, -1). editGraph[offset + (-1)] is set to 0. Because we move downward, we don't modify X. Y = X - K = 0 - (-2) = 2: (X, Y) = (0, 2). There are 2 diagonals, so we add 2 to both X and Y: (2, 4).
editGraph[offset + (-2)] = X = 2.
- K = 0. We can achieve (D, K) = (2, 0) either by moving downward from (1, 1) or by moving rightward from (1, -1). Our available values are editGraph[offset + 1] and editGraph[offset + (-1)]: 1 and 0, respectively. The value at offset + 1 is greater, so we choose it. Because we move downward, we don't modify X. Y = X - K = 1 - 0 = 1: (X, Y) = (1, 1). There is 1 diagonal, so we add 1 to both X and Y: (2, 2).
editGraph[offset + 0] = X = 2.
- K = 2. Because K is equal to d, the only way to achieve (D, K) = (2, 2) is to move right from (1, 1). editGraph[offset + 1] is set to 1. Because we move rightward, we increment X by 1. Y = K - X = 2 - 2 = 0: (X, Y) = (2, 0). There is 1 diagonal, so we add 1 to both X and Y: (3, 1).
editGraph[offset + 2] = X = 3.
K = 2 is the last value. The graph is [-1, -1, -1, -1, 2, 0, 2, 1, 3, -1, -1, -1, -1, -1]. Copy the graph to trace[d] and increment d by one.
- D = 3, K = -3, -1, 1, 3.
- K = -3. K is equal to -d, downward move from (2, -2). X = 2, Y = 5. 1 diagonal move: (3, 6).
editGraph[offset + (-3)] = 3;
- K = -1. (3, -1) is either a downward move from (2, 0) or a rightward move from (2, -2). Values in the editGraph are the same, so we chose the rightward move: X = 2, add 1 because it's a rightward move. Y = 4: (3, 4). 1 diagonal: (4, 5).
editGraph[offset + (-1)] = 4;
- K = 1. (3, 1) is either a downward move from (2, 2) or a rightward move from (2, 0). The value at offset + 2 is greater, so we choose it. A downward move, no modifications are required: X = 3. Y = 2. 2 diagonals: (5, 4).
editGraph[offset + (1)] = 5;
- K = 3. K is equal to d, rightward move from (2, 2). X = 3, add 1 because it's a rightward move. Y = 1: (4, 1). 1 diagonal: (5, 2).
editGraph[offset + 3] = 5.
The graph is [-1, -1, -1, 3, 2, 4, 2, 5, 3, 5, -1, -1, -1, -1]. Copy the graph to trace[d] and increment d by one.
- D = 4, K = -4, -2, 0, 2, 4.
- K = -4. K is equal to -d, downward move from (3, -3). X = 3, Y = 7. Y is greater than the length of B, so we skip this K;
- K = -2. (4, -2) is either a downward move from (3, -1) or a rightward move from (3, -3). (3, -1) has the greater value, it's a downward move. X = 4, Y = 6. No diagonals;
- K = 0. (4, 0) is either a downward move from (3, 1) or a rightward move from (3, -1). (3, 1) has the greater value, it's a downward move. X = 5, Y = 5. No diagonals;
- K = 2. (4, 2) is either a downward move from (3, 3) or a rightward move from (3, 1). Both values are equal, so we choose the rightward move from (3, 1). X = 5, add 1 because it's a rightward move. Y = 4: (6, 4). 1 diagonal: (7, 5);
- K = 4. K is equal to d, rightward move from (3, 3). X = 5, add 1 because it's a downward move. Y = 2: (6, 2). 1 diagonal move: (7, 3).
The graph is [-1, -1, -1, 3, 4, 4, 5, 5, 7, 5, 7, -1, -1, -1]. Copy the graph to the trace[d] and increment d by one.
- D = 5, K = -5, -3, -1, 1, 3, 5.
- K = -5. K is equal to -d, downward move from (4, -4). Because editGraph[offset + -4] is uninitialized, we skip it;
- K = -3. (5, -3) is either a downward move from (4, -2) or a rightward move from (4, -4). (4, -2) has the greater value ((4, -4) wasn't initialized), it's a downward move. X = 4, Y = 7. Y is greater than B.Length, we skip it;
- K = -1. (5, -1) is either a downward move from (4, 0) or a rightward move from (4, -2). (4, 0) has the greater value, it's a downward move. X = 5, Y = 6. No diagonals;
- K = 1. (5, 1) is either a downward move from (4, 2) or a rightward move from (4, 0). (4, 2) has the greater value, it's a downward move. X = 7, Y = 6. Here we've achieved the bottom-right corner of the graph. Write a copy of the graph to the trace and return from the function.
The graph is [-1, -1, -1, 3, 4, 5, 5, 7, 7, 5, 7, -1, -1, -1]
Our goal here is to determine the exact sequence of deletions and insertions needed to transform A to B. We do this by moving towards (0, 0) from (A.Length, B.Length), using the same rules as for finding the shortest path.
We will need an enum with values: Start, Insert, Delete; We also need an array store the actions:
var actions = new Action[trace.Length - 1]
For each (D, K) we take nodes at (D - 1, K - 1) and (D, 1, K + 1) and then choose the one with the highest X value. Then, we either increment or decrement K, repeating this process until we reach (0, 0).
The exact algorithm will be:
- Check if (D, K) is equal to (0, 0). If so, we've reached the end; return the array.
- Take the graph at trace[d];
- Compare both adjacent nodes and choose the one with the highest X;
- If the graph[offset + K + 1] is greater than graph[offset + K - 1], then the move was insertion; increment K by 1. Otherwise, the move was deletion; decrement K by 1;
- Insert the action at actions[d].
- Decrement d by one.
We will use the trace from previous example. A = "ABCABBA", B = "CBABAC". The trace is:
- D = 0: [-1, -1, -1, -1, -1, -1, 0, -1, -1, -1, -1, -1, -1, -1];
- D = 1: [-1, -1, -1, -1, -1, 0, 0, 1, -1, -1, -1, -1, -1, -1];
- D = 2: [-1, -1, -1, -1, 2, 0, 2, 1, 3, -1, -1, -1, -1, -1];
- D = 3: [-1, -1, -1, 3, 2, 4, 2, 5, 3, 5, -1, -1, -1, -1];
- D = 4: [-1, -1, -1, 3, 4, 4, 5, 5, 7, 5, 7, -1, -1, -1];
- D = 5: [-1, -1, -1, 3, 4, 5, 5, 7, 7, 5, 7, -1, -1, -1].
K = A.Length - B.Length = 7 - 6 = 1; D = trace.Length - 1 = 6 - 1 = 5.
- D = 5, K = 1, graph: [-1, -1, -1, 3, 4, 5, 5, 7, 7, 5, 7, -1, -1, -1]. We can achieve (5, 1) either by moving downward from (4, 2) or rightward from (4, 0). (4,2) has the greater value, so we choose it. Since it's a downward move, so moves[5] = Action.Insert; K += 1.
- D = 4, K = 2, graph: [-1, -1, -1, 3, 4, 4, 5, 5, 7, 5, 7, -1, -1, -1]. We can achieve (4, 2) either by moving downwards from (3, 3) or rightwards from (3, 1). Both has the same value, so we choose the rightward move from (3, 1). Moves[4] = Action.Delete; K -= 1.
- D = 3, K = 1, graph: [-1, -1, -1, 3, 2, 4, 2, 5, 3, 5, -1, -1, -1, -1]. We can achieve (3, 1) either by moving downward from (2, 2) or rightwards from (2, 0). (2, 2) has the greater value, so we choose it. It's a downward move, so moves[3] = Action.Insert; K += 1.
- D = 2, K = 2: graph: [-1, -1, -1, -1, 2, 0, 2, 1, 3, -1, -1, -1, -1, -1]. K is equal to D, so it's a rightward move from (1, 1). Moves[2] = Action.Delete; K -= 1.
- D = 1, K = 1: graph: [-1, -1, -1, -1, -1, 0, 0, 1, -1, -1, -1, -1, -1, -1]. K is equal to D, so it's a rightward move from (0, 0). Moves[1] = Action.Delete; K -= 1.
- D = 0; K = 0. This is the starting point, we've finished. Moves[0] = Action.Start.
Our final array of actions: [Start, Delete, Delete, Insert, Delete, Insert].
We can use system's console to print out each change with a simple code that will iterate through each action:
- If it's a starting point, then we don't do anything;
- If it's a deletion, then we print A[aIndex++] with red background;
- If it's an insertion, then we print B[bIndex++] with green background.
After that, we print any matching elements if there are any.