Divide a set of strings into those which contain “a” and those which don't
up vote
7
down vote
favorite
I am beginning to learn Python on my own through edX and I was given this basic problem: "Given a list of cities, separate the list into two lists, namely one with cities containing the letter 'a' in their name and one with cities not containing the letter 'a'."
This is my approach. It works, but I can't help but feel like there is a much more concise way to get this result. If anybody could offer some other methods I would be interested:
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
for x in range(0, len(city)):
if city[x] == "a":
a_city.append(city)
break
else:
if x == len(city) - 1:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
I am also interested in learning more about the interaction between nested loops/conditionals and complexity if anybody has some resources.
python python-3.x
add a comment |
up vote
7
down vote
favorite
I am beginning to learn Python on my own through edX and I was given this basic problem: "Given a list of cities, separate the list into two lists, namely one with cities containing the letter 'a' in their name and one with cities not containing the letter 'a'."
This is my approach. It works, but I can't help but feel like there is a much more concise way to get this result. If anybody could offer some other methods I would be interested:
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
for x in range(0, len(city)):
if city[x] == "a":
a_city.append(city)
break
else:
if x == len(city) - 1:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
I am also interested in learning more about the interaction between nested loops/conditionals and complexity if anybody has some resources.
python python-3.x
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I am beginning to learn Python on my own through edX and I was given this basic problem: "Given a list of cities, separate the list into two lists, namely one with cities containing the letter 'a' in their name and one with cities not containing the letter 'a'."
This is my approach. It works, but I can't help but feel like there is a much more concise way to get this result. If anybody could offer some other methods I would be interested:
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
for x in range(0, len(city)):
if city[x] == "a":
a_city.append(city)
break
else:
if x == len(city) - 1:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
I am also interested in learning more about the interaction between nested loops/conditionals and complexity if anybody has some resources.
python python-3.x
I am beginning to learn Python on my own through edX and I was given this basic problem: "Given a list of cities, separate the list into two lists, namely one with cities containing the letter 'a' in their name and one with cities not containing the letter 'a'."
This is my approach. It works, but I can't help but feel like there is a much more concise way to get this result. If anybody could offer some other methods I would be interested:
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
for x in range(0, len(city)):
if city[x] == "a":
a_city.append(city)
break
else:
if x == len(city) - 1:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
I am also interested in learning more about the interaction between nested loops/conditionals and complexity if anybody has some resources.
python python-3.x
python python-3.x
edited Nov 21 at 12:46
Toby Speight
22.7k537109
22.7k537109
asked Nov 21 at 12:19
JacobCheverie
1386
1386
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
up vote
15
down vote
accepted
This usecase is actually covered by one of the itertools
recipes. itertools
is a package in the Python standard library that supplies fast and efficient tools for iterating over things or creating certain iterable things (like the combination of all pairs and such). It is an often used library and well worth it to get acquainted with.
The recipe is as follows:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
In your specific case you would use it like this:
if __name__ == "__main__":
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
no_a_city, a_city = map(list, partition(lambda city: "a" in city, cities))
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The map(list, ...)
part is needed because what the partition
function returns are generators that generate values on the fly. They can be consumed into a list
.
The predicate used is a lambda
function, an anonymous function which in this case returns truthy or falsy values. It is used to test each element of the iterable.
Instead of manually iterating over each name (even worse, over each index of each name, have a look at Loop Like A Native), I used the fact that strings support the in
operator.
I also added a if __name__ == "__main__":
guard to allow importing from this script from another script.
One thing you could have used in your code is the fact that for
loops have an optional else
clause which is run if no break
statement interrupted the loop:
a_city, no_a_city = ,
for city in cities:
for char in city:
if char == "a":
a_city.append(city)
break
else:
no_a_city.append(city)
As for complexity, this has the same complexity as your code. You have two nested for
loops, making this on average $mathcal{O}(nk)$ with $n$ being the number of cities and $k$ being the average length of the city names.
The in
operator for strings is $mathcal{O}(k)$ (it is just the same loop you wrote, but probably written in C) and it is used once per city. However, due to the tee
my code iterates twice over the cities, so would be $mathcal{O}(2nk)$, which in terms of algorithmic complexity is also $mathcal{O}(nk)$.
1
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
1
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visiblefor
loop does not make it linear in time (my solution has no visiblefor
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.
– Graipher
Nov 21 at 13:24
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use oftee
.
– Jared Goguen
Nov 22 at 18:51
add a comment |
up vote
12
down vote
The task here is to collate the list of cities according to a key. In this case the key can be 'a' in city
, which is True
if the city contains 'a'
and False
otherwise.
It's common to encounter processing tasks which require collation, for example to find words that are anagrams of each other we could collate the words in a dictionary according to their sorted letters.
There is a standard pattern for collation in Python, which is to use collections.defaultdict
and a loop. In the cities case, it goes like this:
from collections import defaultdict
with_a = defaultdict(list)
for city in cities:
with_a['a' in city].append(city)
After running this loop, with_a[True]
is the list of cities with 'a'
and with_a[False]
is the list of cities without.
I prefer this approach to itertools.partition
because it iterates over the input just once (whereas partition
iterates over the input twice), and it's clear how to generalize it to other kinds of key.
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
1
+1 Imo, this approach is easier to digest, and has better space and time complexity than theitertools.partition
approach.
– Jared Goguen
Nov 22 at 18:47
add a comment |
up vote
9
down vote
This is not a complete review, but rather two simpler alternatives that illustrate how you can cut the some of the inner control flow using the in
keyword. This keyword tests for membership of an object in an iterable. In this case, you want to test for membership of 'a'
in the name of each element of cities
.
The first takes the same approach as your code
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
if 'a' in city:
a_city.append(city)
else:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The membership test using in
is a drop-in replacement for the harder-to-read and more error-prone explicit loop over the characters.
An even cleaner solution makes use of the built-in set
data type. In simplistic terms, a set
is like a list
, except that it is not ordered and does not contain duplicates.
# a set is constructed with {}, unlike the the used for lists
cities = {"New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"}
# we can also construct one using the `set` builtin function (analogous to `list`)
a_city = set(city for city in cities if 'a' in city)
# subtracting one set from another is well-defined and does the obvious thing
no_a_city = cities - a_city
print("a_city:", a_city)
print("no_a_city:", no_a_city)
Check out the docs for a flavor of the kinds of rich membership and comparison that sets allow. These operations can typically be expected to be more efficient than equivalent algorithms on lists, mainly due to the fact that sets are guaranteed not to have duplicate elements.
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
1
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
1
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
add a comment |
up vote
4
down vote
List comprehensions are very well-suited for this task. The basic syntax is the following:
new_list = [ expression(x) for x in old_list if condition(x) ]
In this case we don't want to change the list elements, simply select them according to a condition. What should be this condition? Well, Python also provides a simple syntax to check if a string (or a list) contains an element, with the keyword in
,or not in
for the negation (this is the same keyword used for iterationg through a list, so be careful to not get confused).
With these tools, your code can fit on two lines, with no import required:
a_city = [ city for city in cities if "a" in city ]
no_a_city = [ city for city in cities if "a" not in city ]
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
15
down vote
accepted
This usecase is actually covered by one of the itertools
recipes. itertools
is a package in the Python standard library that supplies fast and efficient tools for iterating over things or creating certain iterable things (like the combination of all pairs and such). It is an often used library and well worth it to get acquainted with.
The recipe is as follows:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
In your specific case you would use it like this:
if __name__ == "__main__":
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
no_a_city, a_city = map(list, partition(lambda city: "a" in city, cities))
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The map(list, ...)
part is needed because what the partition
function returns are generators that generate values on the fly. They can be consumed into a list
.
The predicate used is a lambda
function, an anonymous function which in this case returns truthy or falsy values. It is used to test each element of the iterable.
Instead of manually iterating over each name (even worse, over each index of each name, have a look at Loop Like A Native), I used the fact that strings support the in
operator.
I also added a if __name__ == "__main__":
guard to allow importing from this script from another script.
One thing you could have used in your code is the fact that for
loops have an optional else
clause which is run if no break
statement interrupted the loop:
a_city, no_a_city = ,
for city in cities:
for char in city:
if char == "a":
a_city.append(city)
break
else:
no_a_city.append(city)
As for complexity, this has the same complexity as your code. You have two nested for
loops, making this on average $mathcal{O}(nk)$ with $n$ being the number of cities and $k$ being the average length of the city names.
The in
operator for strings is $mathcal{O}(k)$ (it is just the same loop you wrote, but probably written in C) and it is used once per city. However, due to the tee
my code iterates twice over the cities, so would be $mathcal{O}(2nk)$, which in terms of algorithmic complexity is also $mathcal{O}(nk)$.
1
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
1
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visiblefor
loop does not make it linear in time (my solution has no visiblefor
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.
– Graipher
Nov 21 at 13:24
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use oftee
.
– Jared Goguen
Nov 22 at 18:51
add a comment |
up vote
15
down vote
accepted
This usecase is actually covered by one of the itertools
recipes. itertools
is a package in the Python standard library that supplies fast and efficient tools for iterating over things or creating certain iterable things (like the combination of all pairs and such). It is an often used library and well worth it to get acquainted with.
The recipe is as follows:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
In your specific case you would use it like this:
if __name__ == "__main__":
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
no_a_city, a_city = map(list, partition(lambda city: "a" in city, cities))
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The map(list, ...)
part is needed because what the partition
function returns are generators that generate values on the fly. They can be consumed into a list
.
The predicate used is a lambda
function, an anonymous function which in this case returns truthy or falsy values. It is used to test each element of the iterable.
Instead of manually iterating over each name (even worse, over each index of each name, have a look at Loop Like A Native), I used the fact that strings support the in
operator.
I also added a if __name__ == "__main__":
guard to allow importing from this script from another script.
One thing you could have used in your code is the fact that for
loops have an optional else
clause which is run if no break
statement interrupted the loop:
a_city, no_a_city = ,
for city in cities:
for char in city:
if char == "a":
a_city.append(city)
break
else:
no_a_city.append(city)
As for complexity, this has the same complexity as your code. You have two nested for
loops, making this on average $mathcal{O}(nk)$ with $n$ being the number of cities and $k$ being the average length of the city names.
The in
operator for strings is $mathcal{O}(k)$ (it is just the same loop you wrote, but probably written in C) and it is used once per city. However, due to the tee
my code iterates twice over the cities, so would be $mathcal{O}(2nk)$, which in terms of algorithmic complexity is also $mathcal{O}(nk)$.
1
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
1
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visiblefor
loop does not make it linear in time (my solution has no visiblefor
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.
– Graipher
Nov 21 at 13:24
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use oftee
.
– Jared Goguen
Nov 22 at 18:51
add a comment |
up vote
15
down vote
accepted
up vote
15
down vote
accepted
This usecase is actually covered by one of the itertools
recipes. itertools
is a package in the Python standard library that supplies fast and efficient tools for iterating over things or creating certain iterable things (like the combination of all pairs and such). It is an often used library and well worth it to get acquainted with.
The recipe is as follows:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
In your specific case you would use it like this:
if __name__ == "__main__":
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
no_a_city, a_city = map(list, partition(lambda city: "a" in city, cities))
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The map(list, ...)
part is needed because what the partition
function returns are generators that generate values on the fly. They can be consumed into a list
.
The predicate used is a lambda
function, an anonymous function which in this case returns truthy or falsy values. It is used to test each element of the iterable.
Instead of manually iterating over each name (even worse, over each index of each name, have a look at Loop Like A Native), I used the fact that strings support the in
operator.
I also added a if __name__ == "__main__":
guard to allow importing from this script from another script.
One thing you could have used in your code is the fact that for
loops have an optional else
clause which is run if no break
statement interrupted the loop:
a_city, no_a_city = ,
for city in cities:
for char in city:
if char == "a":
a_city.append(city)
break
else:
no_a_city.append(city)
As for complexity, this has the same complexity as your code. You have two nested for
loops, making this on average $mathcal{O}(nk)$ with $n$ being the number of cities and $k$ being the average length of the city names.
The in
operator for strings is $mathcal{O}(k)$ (it is just the same loop you wrote, but probably written in C) and it is used once per city. However, due to the tee
my code iterates twice over the cities, so would be $mathcal{O}(2nk)$, which in terms of algorithmic complexity is also $mathcal{O}(nk)$.
This usecase is actually covered by one of the itertools
recipes. itertools
is a package in the Python standard library that supplies fast and efficient tools for iterating over things or creating certain iterable things (like the combination of all pairs and such). It is an often used library and well worth it to get acquainted with.
The recipe is as follows:
from itertools import filterfalse, tee
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
In your specific case you would use it like this:
if __name__ == "__main__":
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
no_a_city, a_city = map(list, partition(lambda city: "a" in city, cities))
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The map(list, ...)
part is needed because what the partition
function returns are generators that generate values on the fly. They can be consumed into a list
.
The predicate used is a lambda
function, an anonymous function which in this case returns truthy or falsy values. It is used to test each element of the iterable.
Instead of manually iterating over each name (even worse, over each index of each name, have a look at Loop Like A Native), I used the fact that strings support the in
operator.
I also added a if __name__ == "__main__":
guard to allow importing from this script from another script.
One thing you could have used in your code is the fact that for
loops have an optional else
clause which is run if no break
statement interrupted the loop:
a_city, no_a_city = ,
for city in cities:
for char in city:
if char == "a":
a_city.append(city)
break
else:
no_a_city.append(city)
As for complexity, this has the same complexity as your code. You have two nested for
loops, making this on average $mathcal{O}(nk)$ with $n$ being the number of cities and $k$ being the average length of the city names.
The in
operator for strings is $mathcal{O}(k)$ (it is just the same loop you wrote, but probably written in C) and it is used once per city. However, due to the tee
my code iterates twice over the cities, so would be $mathcal{O}(2nk)$, which in terms of algorithmic complexity is also $mathcal{O}(nk)$.
edited Nov 21 at 20:52
answered Nov 21 at 12:42
Graipher
22.6k53384
22.6k53384
1
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
1
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visiblefor
loop does not make it linear in time (my solution has no visiblefor
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.
– Graipher
Nov 21 at 13:24
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use oftee
.
– Jared Goguen
Nov 22 at 18:51
add a comment |
1
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
1
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visiblefor
loop does not make it linear in time (my solution has no visiblefor
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.
– Graipher
Nov 21 at 13:24
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use oftee
.
– Jared Goguen
Nov 22 at 18:51
1
1
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
Thank you Graipher, That is a lot to take in for a beginner, but I will work on dissecting some of it to improve my coding. I briefly checked out that link that you provided and already see some issues with what I am doing (as you pointed out, iterating over each name AND each index of each name) so thanks for that. You've given me a lot to look into. On complexity: using the link that you provided, I can see a way to write this same code using only one for loop. Does this imply that it is less complex than your code?
– JacobCheverie
Nov 21 at 13:07
1
1
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visible
for
loop does not make it linear in time (my solution has no visible for
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.– Graipher
Nov 21 at 13:24
@JacobCheverie: Yes, there is still some road left ahead of you, but don't despair. And no, just because there is only one visible
for
loop does not make it linear in time (my solution has no visible for
loop and yet it is not). I cannot think of a way that does not have to look at each character of each city name, but maybe you are more clever than I.– Graipher
Nov 21 at 13:24
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use of
tee
.– Jared Goguen
Nov 22 at 18:51
I'm not convinced that this approach makes sense; the main benefit of working with generators is a small memory footprint which is completely comprised by the use of
tee
.– Jared Goguen
Nov 22 at 18:51
add a comment |
up vote
12
down vote
The task here is to collate the list of cities according to a key. In this case the key can be 'a' in city
, which is True
if the city contains 'a'
and False
otherwise.
It's common to encounter processing tasks which require collation, for example to find words that are anagrams of each other we could collate the words in a dictionary according to their sorted letters.
There is a standard pattern for collation in Python, which is to use collections.defaultdict
and a loop. In the cities case, it goes like this:
from collections import defaultdict
with_a = defaultdict(list)
for city in cities:
with_a['a' in city].append(city)
After running this loop, with_a[True]
is the list of cities with 'a'
and with_a[False]
is the list of cities without.
I prefer this approach to itertools.partition
because it iterates over the input just once (whereas partition
iterates over the input twice), and it's clear how to generalize it to other kinds of key.
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
1
+1 Imo, this approach is easier to digest, and has better space and time complexity than theitertools.partition
approach.
– Jared Goguen
Nov 22 at 18:47
add a comment |
up vote
12
down vote
The task here is to collate the list of cities according to a key. In this case the key can be 'a' in city
, which is True
if the city contains 'a'
and False
otherwise.
It's common to encounter processing tasks which require collation, for example to find words that are anagrams of each other we could collate the words in a dictionary according to their sorted letters.
There is a standard pattern for collation in Python, which is to use collections.defaultdict
and a loop. In the cities case, it goes like this:
from collections import defaultdict
with_a = defaultdict(list)
for city in cities:
with_a['a' in city].append(city)
After running this loop, with_a[True]
is the list of cities with 'a'
and with_a[False]
is the list of cities without.
I prefer this approach to itertools.partition
because it iterates over the input just once (whereas partition
iterates over the input twice), and it's clear how to generalize it to other kinds of key.
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
1
+1 Imo, this approach is easier to digest, and has better space and time complexity than theitertools.partition
approach.
– Jared Goguen
Nov 22 at 18:47
add a comment |
up vote
12
down vote
up vote
12
down vote
The task here is to collate the list of cities according to a key. In this case the key can be 'a' in city
, which is True
if the city contains 'a'
and False
otherwise.
It's common to encounter processing tasks which require collation, for example to find words that are anagrams of each other we could collate the words in a dictionary according to their sorted letters.
There is a standard pattern for collation in Python, which is to use collections.defaultdict
and a loop. In the cities case, it goes like this:
from collections import defaultdict
with_a = defaultdict(list)
for city in cities:
with_a['a' in city].append(city)
After running this loop, with_a[True]
is the list of cities with 'a'
and with_a[False]
is the list of cities without.
I prefer this approach to itertools.partition
because it iterates over the input just once (whereas partition
iterates over the input twice), and it's clear how to generalize it to other kinds of key.
The task here is to collate the list of cities according to a key. In this case the key can be 'a' in city
, which is True
if the city contains 'a'
and False
otherwise.
It's common to encounter processing tasks which require collation, for example to find words that are anagrams of each other we could collate the words in a dictionary according to their sorted letters.
There is a standard pattern for collation in Python, which is to use collections.defaultdict
and a loop. In the cities case, it goes like this:
from collections import defaultdict
with_a = defaultdict(list)
for city in cities:
with_a['a' in city].append(city)
After running this loop, with_a[True]
is the list of cities with 'a'
and with_a[False]
is the list of cities without.
I prefer this approach to itertools.partition
because it iterates over the input just once (whereas partition
iterates over the input twice), and it's clear how to generalize it to other kinds of key.
answered Nov 21 at 16:54
Gareth Rees
44.6k3100180
44.6k3100180
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
1
+1 Imo, this approach is easier to digest, and has better space and time complexity than theitertools.partition
approach.
– Jared Goguen
Nov 22 at 18:47
add a comment |
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
1
+1 Imo, this approach is easier to digest, and has better space and time complexity than theitertools.partition
approach.
– Jared Goguen
Nov 22 at 18:47
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
This is seemingly high-powered. I will look into defaultdict some more to gain a better understanding of the code. When you mention your preference of this over partition, does that imply that this would be a more efficient/less complex approach?
– JacobCheverie
Nov 21 at 18:15
1
1
+1 Imo, this approach is easier to digest, and has better space and time complexity than the
itertools.partition
approach.– Jared Goguen
Nov 22 at 18:47
+1 Imo, this approach is easier to digest, and has better space and time complexity than the
itertools.partition
approach.– Jared Goguen
Nov 22 at 18:47
add a comment |
up vote
9
down vote
This is not a complete review, but rather two simpler alternatives that illustrate how you can cut the some of the inner control flow using the in
keyword. This keyword tests for membership of an object in an iterable. In this case, you want to test for membership of 'a'
in the name of each element of cities
.
The first takes the same approach as your code
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
if 'a' in city:
a_city.append(city)
else:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The membership test using in
is a drop-in replacement for the harder-to-read and more error-prone explicit loop over the characters.
An even cleaner solution makes use of the built-in set
data type. In simplistic terms, a set
is like a list
, except that it is not ordered and does not contain duplicates.
# a set is constructed with {}, unlike the the used for lists
cities = {"New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"}
# we can also construct one using the `set` builtin function (analogous to `list`)
a_city = set(city for city in cities if 'a' in city)
# subtracting one set from another is well-defined and does the obvious thing
no_a_city = cities - a_city
print("a_city:", a_city)
print("no_a_city:", no_a_city)
Check out the docs for a flavor of the kinds of rich membership and comparison that sets allow. These operations can typically be expected to be more efficient than equivalent algorithms on lists, mainly due to the fact that sets are guaranteed not to have duplicate elements.
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
1
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
1
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
add a comment |
up vote
9
down vote
This is not a complete review, but rather two simpler alternatives that illustrate how you can cut the some of the inner control flow using the in
keyword. This keyword tests for membership of an object in an iterable. In this case, you want to test for membership of 'a'
in the name of each element of cities
.
The first takes the same approach as your code
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
if 'a' in city:
a_city.append(city)
else:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The membership test using in
is a drop-in replacement for the harder-to-read and more error-prone explicit loop over the characters.
An even cleaner solution makes use of the built-in set
data type. In simplistic terms, a set
is like a list
, except that it is not ordered and does not contain duplicates.
# a set is constructed with {}, unlike the the used for lists
cities = {"New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"}
# we can also construct one using the `set` builtin function (analogous to `list`)
a_city = set(city for city in cities if 'a' in city)
# subtracting one set from another is well-defined and does the obvious thing
no_a_city = cities - a_city
print("a_city:", a_city)
print("no_a_city:", no_a_city)
Check out the docs for a flavor of the kinds of rich membership and comparison that sets allow. These operations can typically be expected to be more efficient than equivalent algorithms on lists, mainly due to the fact that sets are guaranteed not to have duplicate elements.
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
1
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
1
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
add a comment |
up vote
9
down vote
up vote
9
down vote
This is not a complete review, but rather two simpler alternatives that illustrate how you can cut the some of the inner control flow using the in
keyword. This keyword tests for membership of an object in an iterable. In this case, you want to test for membership of 'a'
in the name of each element of cities
.
The first takes the same approach as your code
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
if 'a' in city:
a_city.append(city)
else:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The membership test using in
is a drop-in replacement for the harder-to-read and more error-prone explicit loop over the characters.
An even cleaner solution makes use of the built-in set
data type. In simplistic terms, a set
is like a list
, except that it is not ordered and does not contain duplicates.
# a set is constructed with {}, unlike the the used for lists
cities = {"New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"}
# we can also construct one using the `set` builtin function (analogous to `list`)
a_city = set(city for city in cities if 'a' in city)
# subtracting one set from another is well-defined and does the obvious thing
no_a_city = cities - a_city
print("a_city:", a_city)
print("no_a_city:", no_a_city)
Check out the docs for a flavor of the kinds of rich membership and comparison that sets allow. These operations can typically be expected to be more efficient than equivalent algorithms on lists, mainly due to the fact that sets are guaranteed not to have duplicate elements.
This is not a complete review, but rather two simpler alternatives that illustrate how you can cut the some of the inner control flow using the in
keyword. This keyword tests for membership of an object in an iterable. In this case, you want to test for membership of 'a'
in the name of each element of cities
.
The first takes the same approach as your code
cities = ["New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"]
a_city, no_a_city = ,
for city in cities:
if 'a' in city:
a_city.append(city)
else:
no_a_city.append(city)
print("a_city:", a_city)
print("no_a_city:", no_a_city)
The membership test using in
is a drop-in replacement for the harder-to-read and more error-prone explicit loop over the characters.
An even cleaner solution makes use of the built-in set
data type. In simplistic terms, a set
is like a list
, except that it is not ordered and does not contain duplicates.
# a set is constructed with {}, unlike the the used for lists
cities = {"New York", "Shanghai", "Munich", "Tokyo", "Dubai", "Mexico City", "São Paulo", "Hyderabad"}
# we can also construct one using the `set` builtin function (analogous to `list`)
a_city = set(city for city in cities if 'a' in city)
# subtracting one set from another is well-defined and does the obvious thing
no_a_city = cities - a_city
print("a_city:", a_city)
print("no_a_city:", no_a_city)
Check out the docs for a flavor of the kinds of rich membership and comparison that sets allow. These operations can typically be expected to be more efficient than equivalent algorithms on lists, mainly due to the fact that sets are guaranteed not to have duplicate elements.
answered Nov 21 at 16:29
Endulum
28817
28817
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
1
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
1
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
add a comment |
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
1
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
1
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
Thanks @Endulum, I like the set approach. I can see where I would have to be careful using this, however. If I was not guaranteed to have distinct members then I assume that the set approach would discard some information and thus not be appropriate. When you define a_city, do I read this as I would read a mathematical set (set of all 'city' such that for 'city' in 'cities' if 'a' in 'city')? Without reading as so, it appears to me as a typo.
– JacobCheverie
Nov 21 at 18:09
1
1
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
Regarding the appropriateness of sets versus dictionaries: yes, if is desirable that you retain duplicates, then a set is likely not the right collection to choose.
– Endulum
Nov 21 at 18:19
1
1
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
Regarding the relationship to mathematical set notation, it should be read like (the set of all 'city' in 'cities' such that 'a' in 'city'). It is an alternative syntax for consisely building lists, sets, or other collections without an explicit loop. It is called a "comprehension", in this case a "set comprehension". If you have not learned about these yet, I expect you will before long. They are very popular among Python programmers.
– Endulum
Nov 21 at 18:26
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
I had touched on list comprehensions briefly last night while reading but I didn't recognize the form. I only asked about the relationship to mathematics because city is written twice at the start of the set definition which seems a bit hard to remember, but I think that this is just Python syntax I must get used to. Thanks again!
– JacobCheverie
Nov 21 at 19:06
add a comment |
up vote
4
down vote
List comprehensions are very well-suited for this task. The basic syntax is the following:
new_list = [ expression(x) for x in old_list if condition(x) ]
In this case we don't want to change the list elements, simply select them according to a condition. What should be this condition? Well, Python also provides a simple syntax to check if a string (or a list) contains an element, with the keyword in
,or not in
for the negation (this is the same keyword used for iterationg through a list, so be careful to not get confused).
With these tools, your code can fit on two lines, with no import required:
a_city = [ city for city in cities if "a" in city ]
no_a_city = [ city for city in cities if "a" not in city ]
add a comment |
up vote
4
down vote
List comprehensions are very well-suited for this task. The basic syntax is the following:
new_list = [ expression(x) for x in old_list if condition(x) ]
In this case we don't want to change the list elements, simply select them according to a condition. What should be this condition? Well, Python also provides a simple syntax to check if a string (or a list) contains an element, with the keyword in
,or not in
for the negation (this is the same keyword used for iterationg through a list, so be careful to not get confused).
With these tools, your code can fit on two lines, with no import required:
a_city = [ city for city in cities if "a" in city ]
no_a_city = [ city for city in cities if "a" not in city ]
add a comment |
up vote
4
down vote
up vote
4
down vote
List comprehensions are very well-suited for this task. The basic syntax is the following:
new_list = [ expression(x) for x in old_list if condition(x) ]
In this case we don't want to change the list elements, simply select them according to a condition. What should be this condition? Well, Python also provides a simple syntax to check if a string (or a list) contains an element, with the keyword in
,or not in
for the negation (this is the same keyword used for iterationg through a list, so be careful to not get confused).
With these tools, your code can fit on two lines, with no import required:
a_city = [ city for city in cities if "a" in city ]
no_a_city = [ city for city in cities if "a" not in city ]
List comprehensions are very well-suited for this task. The basic syntax is the following:
new_list = [ expression(x) for x in old_list if condition(x) ]
In this case we don't want to change the list elements, simply select them according to a condition. What should be this condition? Well, Python also provides a simple syntax to check if a string (or a list) contains an element, with the keyword in
,or not in
for the negation (this is the same keyword used for iterationg through a list, so be careful to not get confused).
With these tools, your code can fit on two lines, with no import required:
a_city = [ city for city in cities if "a" in city ]
no_a_city = [ city for city in cities if "a" not in city ]
answered Nov 22 at 10:22
Aetol
411
411
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208130%2fdivide-a-set-of-strings-into-those-which-contain-a-and-those-which-dont%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown