Verify the well-formed-ness of a string containing open and shut parenthesis utilizing simply SQL
Validating if a string comprises balanced parenthesis is a sensible downside ensuing from string/expression parsing/validation in varied sensible situations. On this article, we take a look at easy methods to validate such a string containing solely open ‘(’ and shut ‘)’ parenthesis utilizing simply declarative SQL.
Earlier Article: Longest Rising Subsequence of an array in SQL
Given a string containing solely open and closed parenthesis, can you identify if these parenthesis are balanced? Being balanced means:
- Each open parenthesis ‘(’ has precisely one closing parenthesis ‘)’ after it
- A closing parenthesis ‘)’ is all the time matched with precisely one open parenthesis ‘(’ that seems earlier than it
The next are examples of legitimate balanced parenthesis strings:
- ((()()))
- ()()()()
- (()()(()()))
The next are instance of invalid balanced parenthesis:
- ((()( — right here, the primary 2 and the final open parenthesis don’t have an identical closing parenthesis
- ()()()()) — right here, the final closing parenthesis doesn’t match every other unmatched open parenthesis
The enter desk has 2 columns:
- idx: The issue index. i.e. the first string to verify could have idx = 1, the 2nd string to verify could have idx = 2, and so forth.
- parens: A string containing solely open and shut parenthesis. This string must be checked for well-formedness.
CREATE TABLE inputs(idx SERIAL PRIMARY KEY, parens TEXT NOT NULL);INSERT INTO inputs(parens) VALUES
('((()()))'), ('((()('), ('()()()())'), ('()()()()'), ('(()()(()()))');
SELECT * FROM inputs;
In crucial programming languages, this downside is normally solved by sustaining a stack which receives an ‘(’ each time it’s encountered within the enter string. Therefore, the stack comprises solely ‘(’ characters. Each time a ‘)’ is seen within the enter, it’s matched with an ‘(’ on the highest of the stack, and the ‘)’ character is popped off.
Since we now have just one kind of open (and shut) brackets, we are able to eradicate the stack and preserve only a counter, which counts what number of unmatched ‘(’ characters have been seen thus far. Each time an ‘(’ character is encountered, we increment the counter, and each time a ‘)’ character is encountered, we decrement the counter.
- Destructive counter worth: If the counter ever reaches a adverse (< 0) worth, it signifies an unmatched closing parenthesis.
- Last counter worth: After all of the characters within the enter string are processed, if the worth of the counter is != 0, it signifies an issue. A constructive worth signifies an unmatched ‘(’ character within the enter, and a adverse worth is already thought of above.
The SQL code for this resolution appears like this:
WITH as_split AS (
-- First break up every enter string such that each character is on its
-- personal row. We be certain that we tag every enter row with the unique
-- index of the enter string from which it got here in order that we all know which
-- downside it is part of.
SELECT
idx,
UNNEST(
STRING_TO_ARRAY(
parens, NULL
)
) AS ch
FROM inputs
),with_annotation AS (
-- Annotate characters from every downside (distinctive index) with its
-- place utilizing ROW_NUMBER(). Additionally annotate an open paren with
-- +1 and an in depth paren with a -1 quantity, in order that we are able to preserve
-- a working sum of those counters later.
SELECT
idx,
ch,
ROW_NUMBER() OVER(PARTITION BY idx) AS row_num,
CASE WHEN ch = '(' THEN +1 ELSE -1 END AS ctr
FROM as_split
),
with_running_sum AS (
-- Discover the working sum for characters in every downside. Be aware that we're
-- fixing all the issues directly (suppose "batch API") as an alternative of feeding
-- feeding every downside as soon as into the answer machine.
SELECT
idx,
ch,
ctr,
row_num,
SUM(ctr) OVER(PARTITION BY idx ORDER BY row_num ASC) AS running_sum,
MAX(row_num) OVER(PARTITION BY idx) AS max_row_num
FROM with_annotation
),
with_result AS (
-- The result's legitimate provided that we by no means hit a adverse working sum
-- (which signifies that we now have an additional shut paren than a corresponding
-- open paren) and if we finish with a working sum of 0. If we finish with a
-- working sum > 0 then we now have an extra open paren that isn't
-- matched with a corresponding shut paren.
SELECT
idx,
CASE WHEN MIN(running_sum) < 0 THEN TRUE ELSE FALSE END AS has_negative,
CASE WHEN SUM(
CASE WHEN row_num = max_row_num THEN running_sum ELSE 0 END
) = 0 THEN TRUE ELSE FALSE END AS ends_with_zero_sum
FROM with_running_sum
GROUP BY 1
)
SELECT
lhs.idx,
rhs.parens,
CASE
WHEN has_negative OR NOT ends_with_zero_sum THEN FALSE
ELSE TRUE END
AS is_valid_parens
FROM with_result lhs INNER JOIN inputs rhs
ON lhs.idx = rhs.idx
ORDER BY lhs.idx ASC;
There are fairly a couple of intermediate tables used within the resolution above to assist readability and to separate out the assorted processing steps.
That is the primary time we now have used the UNNEST key phrase in SQL. That is additionally the primary time I’ve written an answer that batch-processes a number of inputs and solves them directly. I leverage the idx subject which signifies the index of the enter string. All intermediate tables use the idx subject to separate out options to totally different issues.
Estimated Price: The estimated value for this question on a desk with 5 distinct enter rows is 45k. Most of this value appears to be coming from the usage of the window aggregation capabilities.
Whereas I’ve tagged the runtime to be O(n), it might rely upon how the database engine internally executes the question. For instance, if the engine notices that the task of the row_num column utilizing ROW_NUMBER() leads to rows which have a strictly growing worth for that column, and the database is ready to protect that row order throughout CTE tables, then it may well keep away from doing a form later when it encounters an ORDER BY clause within the window perform execution right here.
SUM(ctr) OVER(PARTITION BY idx ORDER BY row_num ASC) AS running_sum,
The ORDER BY within the OVER() clause above is important to make sure that we get a working sum, and never an general sum for your complete partition.