Why is there no permutation in Regexes? (Even if regular languages seem to be able to do this)
up vote
10
down vote
favorite
The Problem
There is no easy way to get a permutation with a regex.
Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.
Regex: Regular expression.
For verification:
"Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.
"How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.
"Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.- This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".
The kind of solution I am searching for
It should have the form:
- »aabc« (or anything else you could use a opening and closing parentheses)
- (aabc)! (similar to (abc)? but with another symbol in the end)
- [aabc]! (similar to [abc]+ but with another symbol in the end)
Advantages of these solutions
They are:
- easy
- adaptable
- reusable
Why this should exist
- Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.
- Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
The proof
- Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.
- Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).
Having a finite number of letters I can create this automaton: (Example. Formal: see below)
Grammar that accepts permutations of "abbc":
(sry for numbers on top, maybe someone knows how to make this part looking better)
s -> ah¹
s -> bh²
s -> ch³
h¹ -> bh¹¹
h¹ -> ch¹²
h² -> ah¹¹ (no typo! equivalence)
h² -> bh²²
h² -> ch²³
h³ -> ah¹²
h³ -> bh²³
h¹¹ -> bc
h¹¹ -> cb
h¹² -> bb
h²² -> ac
h²² -> ca
h²³ -> ab
h²³ -> ba
More formal: (using a finite-state-automaton but this could be made with grammar as well)
- A word q (with finite length) to which any permutation should reach an accepting state.
- X is the finite alphabet.
- Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".
- state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.
- F is a set of that states that are exact permutations of q.
So it is possible to create a finite-state automaton for accepting permutations of a given word.
Moving on with the proof
So I have proven that regular languages have the power to check for permutations, haven't I?
So why is there no approach to reach this with Regexes? It's a useful functionality.
formal-languages regular-languages regular-expressions
add a comment |
up vote
10
down vote
favorite
The Problem
There is no easy way to get a permutation with a regex.
Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.
Regex: Regular expression.
For verification:
"Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.
"How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.
"Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.- This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".
The kind of solution I am searching for
It should have the form:
- »aabc« (or anything else you could use a opening and closing parentheses)
- (aabc)! (similar to (abc)? but with another symbol in the end)
- [aabc]! (similar to [abc]+ but with another symbol in the end)
Advantages of these solutions
They are:
- easy
- adaptable
- reusable
Why this should exist
- Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.
- Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
The proof
- Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.
- Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).
Having a finite number of letters I can create this automaton: (Example. Formal: see below)
Grammar that accepts permutations of "abbc":
(sry for numbers on top, maybe someone knows how to make this part looking better)
s -> ah¹
s -> bh²
s -> ch³
h¹ -> bh¹¹
h¹ -> ch¹²
h² -> ah¹¹ (no typo! equivalence)
h² -> bh²²
h² -> ch²³
h³ -> ah¹²
h³ -> bh²³
h¹¹ -> bc
h¹¹ -> cb
h¹² -> bb
h²² -> ac
h²² -> ca
h²³ -> ab
h²³ -> ba
More formal: (using a finite-state-automaton but this could be made with grammar as well)
- A word q (with finite length) to which any permutation should reach an accepting state.
- X is the finite alphabet.
- Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".
- state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.
- F is a set of that states that are exact permutations of q.
So it is possible to create a finite-state automaton for accepting permutations of a given word.
Moving on with the proof
So I have proven that regular languages have the power to check for permutations, haven't I?
So why is there no approach to reach this with Regexes? It's a useful functionality.
formal-languages regular-languages regular-expressions
9
You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12
6
I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18
The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated:^(a()|a()|b()|c()){4}2345$
seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
Nov 18 at 6:13
5
@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04
add a comment |
up vote
10
down vote
favorite
up vote
10
down vote
favorite
The Problem
There is no easy way to get a permutation with a regex.
Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.
Regex: Regular expression.
For verification:
"Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.
"How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.
"Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.- This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".
The kind of solution I am searching for
It should have the form:
- »aabc« (or anything else you could use a opening and closing parentheses)
- (aabc)! (similar to (abc)? but with another symbol in the end)
- [aabc]! (similar to [abc]+ but with another symbol in the end)
Advantages of these solutions
They are:
- easy
- adaptable
- reusable
Why this should exist
- Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.
- Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
The proof
- Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.
- Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).
Having a finite number of letters I can create this automaton: (Example. Formal: see below)
Grammar that accepts permutations of "abbc":
(sry for numbers on top, maybe someone knows how to make this part looking better)
s -> ah¹
s -> bh²
s -> ch³
h¹ -> bh¹¹
h¹ -> ch¹²
h² -> ah¹¹ (no typo! equivalence)
h² -> bh²²
h² -> ch²³
h³ -> ah¹²
h³ -> bh²³
h¹¹ -> bc
h¹¹ -> cb
h¹² -> bb
h²² -> ac
h²² -> ca
h²³ -> ab
h²³ -> ba
More formal: (using a finite-state-automaton but this could be made with grammar as well)
- A word q (with finite length) to which any permutation should reach an accepting state.
- X is the finite alphabet.
- Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".
- state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.
- F is a set of that states that are exact permutations of q.
So it is possible to create a finite-state automaton for accepting permutations of a given word.
Moving on with the proof
So I have proven that regular languages have the power to check for permutations, haven't I?
So why is there no approach to reach this with Regexes? It's a useful functionality.
formal-languages regular-languages regular-expressions
The Problem
There is no easy way to get a permutation with a regex.
Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.
Regex: Regular expression.
For verification:
"Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.
"How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.
"Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.- This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".
The kind of solution I am searching for
It should have the form:
- »aabc« (or anything else you could use a opening and closing parentheses)
- (aabc)! (similar to (abc)? but with another symbol in the end)
- [aabc]! (similar to [abc]+ but with another symbol in the end)
Advantages of these solutions
They are:
- easy
- adaptable
- reusable
Why this should exist
- Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.
- Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
The proof
- Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.
- Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).
Having a finite number of letters I can create this automaton: (Example. Formal: see below)
Grammar that accepts permutations of "abbc":
(sry for numbers on top, maybe someone knows how to make this part looking better)
s -> ah¹
s -> bh²
s -> ch³
h¹ -> bh¹¹
h¹ -> ch¹²
h² -> ah¹¹ (no typo! equivalence)
h² -> bh²²
h² -> ch²³
h³ -> ah¹²
h³ -> bh²³
h¹¹ -> bc
h¹¹ -> cb
h¹² -> bb
h²² -> ac
h²² -> ca
h²³ -> ab
h²³ -> ba
More formal: (using a finite-state-automaton but this could be made with grammar as well)
- A word q (with finite length) to which any permutation should reach an accepting state.
- X is the finite alphabet.
- Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".
- state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.
- F is a set of that states that are exact permutations of q.
So it is possible to create a finite-state automaton for accepting permutations of a given word.
Moving on with the proof
So I have proven that regular languages have the power to check for permutations, haven't I?
So why is there no approach to reach this with Regexes? It's a useful functionality.
formal-languages regular-languages regular-expressions
formal-languages regular-languages regular-expressions
edited Nov 17 at 13:08
Glorfindel
1741110
1741110
asked Nov 17 at 6:46
Asqiir
1568
1568
9
You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12
6
I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18
The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated:^(a()|a()|b()|c()){4}2345$
seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
Nov 18 at 6:13
5
@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04
add a comment |
9
You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12
6
I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18
The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated:^(a()|a()|b()|c()){4}2345$
seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
Nov 18 at 6:13
5
@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04
9
9
You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12
You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12
6
6
I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18
I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18
The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated:
^(a()|a()|b()|c()){4}2345$
seems to work (see regex101.com/r/9URPpg/4/tests).– boboquack
Nov 18 at 6:13
The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated:
^(a()|a()|b()|c()){4}2345$
seems to work (see regex101.com/r/9URPpg/4/tests).– boboquack
Nov 18 at 6:13
5
5
@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04
@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04
add a comment |
3 Answers
3
active
oldest
votes
up vote
34
down vote
accepted
The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.
Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.
So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
7
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
3
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
1
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
1
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
– reinierpost
Nov 18 at 19:36
|
show 3 more comments
up vote
13
down vote
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
Your "proof" only looked at permutations of single words, which are finite languages.
Every finite language is regular (e.g. just by listing all of the members with a |
inbetween), but there are infinite regular languages (and those are generally the more interesting ones).
As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the *
operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).
The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.
add a comment |
up vote
7
down vote
Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).
So in some sense, there is no succinct way to specify all permutations of a word.
Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.
We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:
$x_iy_i in L$.- If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.
Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.
Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
1
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
1
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
1
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
1
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
|
show 1 more comment
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
34
down vote
accepted
The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.
Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.
So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
7
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
3
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
1
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
1
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
– reinierpost
Nov 18 at 19:36
|
show 3 more comments
up vote
34
down vote
accepted
The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.
Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.
So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
7
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
3
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
1
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
1
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
– reinierpost
Nov 18 at 19:36
|
show 3 more comments
up vote
34
down vote
accepted
up vote
34
down vote
accepted
The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.
Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.
So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.
The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.
Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.
So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.
edited Nov 19 at 10:03
Laurel
1034
1034
answered Nov 17 at 10:10
David Richerby
64.8k1597186
64.8k1597186
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
7
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
3
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
1
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
1
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
– reinierpost
Nov 18 at 19:36
|
show 3 more comments
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
7
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
3
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
1
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
1
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
– reinierpost
Nov 18 at 19:36
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
– Asqiir
Nov 17 at 13:11
7
7
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
@Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
– David Richerby
Nov 17 at 14:02
3
3
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
@Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
– David Richerby
Nov 17 at 14:08
1
1
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
– Asqiir
Nov 17 at 14:12
1
1
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the
!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.– reinierpost
Nov 18 at 19:36
Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the
!
operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.– reinierpost
Nov 18 at 19:36
|
show 3 more comments
up vote
13
down vote
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
Your "proof" only looked at permutations of single words, which are finite languages.
Every finite language is regular (e.g. just by listing all of the members with a |
inbetween), but there are infinite regular languages (and those are generally the more interesting ones).
As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the *
operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).
The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.
add a comment |
up vote
13
down vote
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
Your "proof" only looked at permutations of single words, which are finite languages.
Every finite language is regular (e.g. just by listing all of the members with a |
inbetween), but there are infinite regular languages (and those are generally the more interesting ones).
As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the *
operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).
The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.
add a comment |
up vote
13
down vote
up vote
13
down vote
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
Your "proof" only looked at permutations of single words, which are finite languages.
Every finite language is regular (e.g. just by listing all of the members with a |
inbetween), but there are infinite regular languages (and those are generally the more interesting ones).
As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the *
operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).
The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.
So my question is:
- (Why) Is my proof wrong?
- If it is right: Why is there no easy way to express permutations?
Your "proof" only looked at permutations of single words, which are finite languages.
Every finite language is regular (e.g. just by listing all of the members with a |
inbetween), but there are infinite regular languages (and those are generally the more interesting ones).
As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the *
operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).
The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.
answered Nov 18 at 0:23
Paŭlo Ebermann
47329
47329
add a comment |
add a comment |
up vote
7
down vote
Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).
So in some sense, there is no succinct way to specify all permutations of a word.
Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.
We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:
$x_iy_i in L$.- If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.
Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.
Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
1
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
1
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
1
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
1
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
|
show 1 more comment
up vote
7
down vote
Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).
So in some sense, there is no succinct way to specify all permutations of a word.
Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.
We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:
$x_iy_i in L$.- If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.
Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.
Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
1
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
1
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
1
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
1
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
|
show 1 more comment
up vote
7
down vote
up vote
7
down vote
Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).
So in some sense, there is no succinct way to specify all permutations of a word.
Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.
We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:
$x_iy_i in L$.- If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.
Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.
Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.
Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).
So in some sense, there is no succinct way to specify all permutations of a word.
Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.
We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:
$x_iy_i in L$.- If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.
Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.
Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.
edited Nov 17 at 8:36
answered Nov 17 at 8:06
Yuval Filmus
187k12176338
187k12176338
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
1
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
1
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
1
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
1
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
|
show 1 more comment
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
1
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
1
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
1
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
1
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
– Asqiir
Nov 17 at 8:15
1
1
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
– Yuval Filmus
Nov 17 at 8:17
1
1
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
– Asqiir
Nov 17 at 8:23
1
1
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
– Yuval Filmus
Nov 17 at 8:27
1
1
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
For future readers: this is not the correct answer! (Correct me if I'm wrong.) Look for the accepted one.
– Asqiir
Nov 17 at 14:13
|
show 1 more comment
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f100206%2fwhy-is-there-no-permutation-in-regexes-even-if-regular-languages-seem-to-be-ab%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
9
You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
Nov 17 at 8:12
6
I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
Nov 17 at 8:18
The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated:
^(a()|a()|b()|c()){4}2345$
seems to work (see regex101.com/r/9URPpg/4/tests).– boboquack
Nov 18 at 6:13
5
@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
Nov 18 at 19:04