Glossaire

Sélectionnez l'un des mots clés à gauche…

Programming in PythonSets and Dictionaries

Temps de lecture: ~25 min

Sets

Sets are unordered collections of unique values. The main advantage of having a special type for sets is that the design of the data structure can be optimized for membership checking. Figuring out whether a given value is in a list requires going through each element in the list, so the amount of time it takes increases with the length of the list. By contrast, checking membership in a set can be done very quickly even if the set is large.

A = [1,2,3]
S = set(A)
2 in S # evaluates to true
S.remove(2) # removes 2
S.add(11) # puts 11 in the set
2 in S # evaluates to False now

Exercise
Make a set which contains the first 10,000 prime numbers.

Hint: It suffices to look for primes among the first 110,000 integers. Compare how long it takes to check whether a given number is in that set to the time it takes to compute whether the number is prime using sympy.isprime.

Note 1: The most reliable and efficient way to figure out how the timeit function works is to .

Note 2: The computation below takes some time to run (20 seconds, say). It returns a tuple when it's done.

import timeit
SETUP = """
from sympy import isprime
primes = [] # add your code
primesSet = set(primes)
"""
a = timeit.timeit("98779 in primes", setup = SETUP)
b = timeit.timeit("98779 in primesSet", setup = SETUP)
c = timeit.timeit("isprime(98779)", setup = SETUP)
a,b,c

Solution. To get exactly 10,000 primes, we index the list obtained by filtering out the composite numbers:

import timeit
SETUP = """
from sympy import isprime
primes = [k for k in range(2,110_000) if isprime(k)][:10000]
primesSet = set(primes)
"""
a = timeit.timeit("98779 in primes", setup = SETUP)
b = timeit.timeit("98779 in primesSet", setup = SETUP)
c = timeit.timeit("isprime(98779)", setup = SETUP)
a,b,c

Put the three methods in order from fastest to slowest:

List membership checking
Set membership checking
Computing from scratch

Dictionaries

The internal mechanism that sets use to check membership extremely fast is also useful when the information you want to retrieve is more complex than just True or False.

For example, suppose you want to store a collection of color names together with the RGB values for each one. We'll store the names as and the RGB triples as .

It's possible to do this by putting the names in a list and the values in a list of the same length:

names = ["fuchsia", "firebrick", "goldenrod"]
rgbs = [(256, 0, 256), (178, 34, 34), (218, 165, 32)]

However, this solution gets very tedious quickly. For example, modifying this structure requires .

The Python data structure tailored to the problem of encoding a map from one finite set to another is called a dictionary. Dictionary literals consist of a comma separated list of the desired input-output pairs (with each input and output separated by a colon) delimited by curly braces. For example, the dictionary encoding the map described above looks like this:

rgb = {
  "fuchsia": (256, 0, 256),
  "firebrick": (178, 34, 34),
  "goldenrod": (218, 165, 32)
}

The domain elements "fuchsia", "firebrick" and "goldenrod" are called the keys of the dictionary, and the codomain elements (256,0,256), (178,34,34), and (218,165,32) are called the values.

We can also form new dictionaries from lists of pairs using the dict function:

dict([
  ("fuchsia", (256, 0, 256)),
  ("firebrick", (178, 34, 34)),
  ("goldenrod", (218, 165, 32))
])

We can perform a dictionary lookup using indexing syntax: rgb["fuchsia"] returns (256,0,256). We can also change the value associated with a given key or introduce a new key-value pair using indexing and assignment:

rgb = {
  "fuchsia": (256, 0, 256),
  "firebrick": (178, 34, 34),
  "goldenrod": (218, 165, 32)
}
rgb["crimson"] = (220, 20, 60)
len(rgb)

The dict methods, keys and values, may be used to access the keys and values of a dictionary.

rgb = {
  "fuchsia": (256, 0, 256),
  "firebrick": (178, 34, 34),
  "goldenrod": (218, 165, 32)
}
rgb.keys()

Exercise
Consider a dictionary which encodes flight arrival times:

import datetime
arrival_times = {
  "JetBlue 924": datetime.time(7,9),
  "United 1282": datetime.time(7,42),
  "Southwest 196": datetime.time(7,3)
}

You can most easily use this dictionary to .

Suppose you want to reverse the lookup direction: for any given time, you want to see which flight arrives at that time. One problem is that .

Assuming that the codomain values are distinct, however, you can form a new dictionary that allows you to look up keys for values by mapping the reversed function over the key-value pairs of the dictionary (obtainable through items method).

Implement this idea in the block below. Check that your dictionary works by indexing it with datetime.time(7,9).

import datetime
arrival_times = {
  "JetBlue 924": datetime.time(7,9),
  "United 1282": datetime.time(7,42),
  "Southwest 196": datetime.time(7,3)
}

Solution. We use the dict function to convert the list of pairs back into a dictionary: dict(map(reversed, arrival_times.items())).

Exercises

Exercise
Python supports a dict comprehension construct which is very similar to a list comprehension. Here's a dictionary that maps each one-digit positive integer to its square:

square_dict = {k: k*k for k in range(1, 10)}

Use a dict comprehension to make a dictionary which maps each of the first 100 powers of 2 to its units digit.

Solution. We convert to a string, get the last character, and convert back to an integer:

  {2**k: int(str(2**k)[-1]) for k in range(100)}

Exercise
Suppose you want to store student IDs in a part of a web application where the main thing you need to do is check whether an ID input by a student is a valid student ID (so you can flag it if it has been mistyped). Among the given options, the best data structure for this purpose would be a .

Solution. This is an ideal use case for sets. Lists and tuples will be slower for checking membership, and dictionaries aren't quite appropriate because it isn't clear what the values would be.

Bruno
Bruno Bruno