Implementing, Solving, and Visualizing the Traveling Salesman Problem with Python

Contents

1.1. Installing the dependencies For people in a hurry For organized people 1.2. Math becomes code 1.2.1. Sets 1.2.2. Parameters 1.2.3. Decision variables 1.2.4. Objective function 1.2.5. Constraints 1.2.6. Final inspection of the model

📖 If you didn’t read the previous article, worry not. The mathematical formulation is also stated (but not derived) here, with each model component next to its code implementation. If you don’t understand where things come from or mean, please read the article of “sprint 2”, and if you’d like more context, read the article for “sprint 1“.

Python is used as it’s the top language in data science, and Pyomo as it’s one of the best (open-source) libraries that deal effectively with large-scale models.

In this section, I’ll go through each model component defined in the formulation, and explain how it is translated to Pyomo code. I’ve tried to not leave any gaps, but if you feel otherwise, please leave a question in the comments.

Disclaimer: The target reader is expected to be new to Pyomo, and even to modeling, so to lower their cognitive burden, concise and simple implementation is prioritized over programming best practices. The goal now is to teach optimization modeling, not software engineering. The code is incrementally improved in future iterations as this proof-of-concept evolves into a more sophisticated MVP.

1.1. Installing the dependencies

For people in a hurry

Install (or make sure you already have installed) the libraries pyomo, networkx and pandas, and the package glpk.

📝 The package glpk contains the GLPK solver, which is an (open source) external solver we will use to optimize the models we create. Pyomo is used to create models of problems and pass them to GLPK, which will run the algorithms that carry out the optimization process itself. GLPK then returns the solutions to the Pyomo model object, which are stored as model attributes, so we can use them conveniently without leaving Python.

The recommended way to install GLPK is through conda so that Pyomo can find the GLPK solver easily. To install all dependencies together in one go, run:

conda install -y -c conda-forge pyomo=6.5.0 pandas networkx glpk

For organized people

I recommend creating a separate virtual environment in which to install all the libraries needed to follow the articles in this series. Copy this text

name: ttp  # traveling tourist problem
channels:
- conda-forge
dependencies:
- python=3.9
- pyomo=6.5.0
- pandas
- networkx
- glpk  # external solver used to optimize models
- jupyterlab  # comment this line if you won't use Jupyter Lab as IDE

and save it in a YAML file named environment.yml. Open an Anaconda prompt in the same location and run the command

conda env create --file environment.yml

After some minutes, the environment is created with all the dependencies installed inside. Run conda activate ttp to “get inside” the environment, fire up Jupyter Lab (by running jupyter lab in the terminal), and start coding along!

1.2. Math becomes code

First, let’s make sure the GLPK solver is findable by Pyomo

### =====  Code block 3.1  ===== ###
import pandas as pd
import pyomo.environ as pyo
import pyomo.version
import networkx as nxsolver = pyo.SolverFactory("glpk")
solver_available = solver.available(exception_flag=False)
print(f"Solver 'solver.name' available: solver_available")
if solver_available:
print(f"Solver version: solver.version()")
print("pyomo version:", pyomo.version.__version__)
print("networkx version:", nx.__version__)

Solver 'glpk' available: True
Solver version: (5, 0, 0, 0)
pyomo version: 6.5
networkx version: 2.8.4

⛔ If you got the message 'glpk' available: False, the solver didn’t install properly. Please try one of these to fix the issue:

– re-do the installation steps carefully

– run conda install -y -c conda-forge glpk in the base (default) environment

– try to install a different solver that works for you

Then read the distance data CSV file

### =====  Code block 3.2  ===== ###
path_data = (
"https://raw.githubusercontent.com/carlosjuribe/"
"traveling-tourist-problem/main/"
"Paris_sites_spherical_distance_matrix.csv"
)
df_distances = pd.read_csv(path_data, index_col="site")df_distances

Now we enter “stage 4” of the [Agile Operations Research workflow], marked as the green block below:

**Figure 1.** Minimalist workflow to problem-solving in OR. 4th stage: **computer** **model** (Image by author)

The task is to take the mathematical model created earlier and implement it in code exactly as it was defined mathematically.

👁️ We are allowed to create as many Python objects as we want if that makes the model implementation easier, but we are not allowed to modify the underlying model in any way, while we’re coding it. It would cause the math model and the computer model to be out-of-sync, thereby making later model debugging pretty hard.

We instantiate an empty Pyomo model, in which we will store model components as attributes:

model = pyo.ConcreteModel("TSP")

1.2.1. Sets

In order to create the set of sites 𝕊 = Louvre, Tour Eiffel, …, hotel, we extract their names from the index of the dataframe and use it to create a Pyomo Set named sites:

### =====  Code block 3.3  ===== ###
list_of_sites = df_distances.index.tolist()model.sites = pyo.Set(initialize=list_of_sites, 
domain=pyo.Any, 
doc="set of all sites to be visited (𝕊)")

To create the derived set

**Expression 3.1.** Derived set of possible arcs of the tour (site-to-site trajectories).

We store the filter 𝑖 ≠ 𝑗 inside a construction rule (the Python function _rule_domain_arcs), and pass this rule to the filter keyword when initializing the Set. Note that this filter will be applied to the cross-product of the sites (𝕊 × 𝕊), and will filter out those members of the cross-product that do not satisfy the rule.

### =====  Code block 3.4  ===== ###
def _rule_domain_arcs(model, i, j):
""" All possible arcs connecting the sites (𝔸) """
# only create pair (i, j) if site i and site j are different
return (i, j) if i != j else None rule = _rule_domain_arcs
model.valid_arcs = pyo.Set(
initialize=model.sites * model.sites,  # 𝕊 × 𝕊
filter=rule, doc=rule.__doc__)

1.2.2. Parameters

The parameter

𝐷ᵢⱼ ≔ Distance between site 𝑖 and site 𝑗

is created with the constructor pyo.Param, which takes as the first (positional) argument the domain 𝔸 (model.valid_arcs) that indexes it, and as the keyword argument initialize another construction rule, _rule_distance_between_sites, that is evaluated for each pair (𝑖, 𝑗) ∈ 𝔸. In each evaluation, the numeric value of the distance is fetched from the dataframe df_distances, and linked internally to the pair (𝑖, 𝑗):

### =====  Code block 3.5  ===== ###
def _rule_distance_between_sites(model, i, j):
""" Distance between site i and site j (𝐷𝑖𝑗) """
return df_distances.at[i, j]  # fetch the distance from dataframerule = _rule_distance_between_sites
model.distance_ij = pyo.Param(model.valid_arcs, 
initialize=rule, 
doc=rule.__doc__)

1.2.3. Decision variables

Because 𝛿ᵢⱼ has the same “index domain” as 𝐷ᵢⱼ, the way to build this component is very similar, with the exception that no construction rule is needed here.

**Expression 3.2.** Binary decision variables

model.delta_ij = pyo.Var(model.valid_arcs, within=pyo.Binary, 
doc="Whether to go from site i to site j (𝛿𝑖𝑗)")

The first positional argument of pyo.Var is reserved for its index set 𝔸, and the “type” of variable is specified with the keyword argument within. With “type of variable” I mean the range of values that the variable can take. Here, the range of 𝛿ᵢⱼ is just 0 and 1, so it is of type binary. Mathematically, we would write 𝛿ᵢⱼ ∈ 0, 1, but instead of creating separate constraints to indicate this, we can indicate it directly in Pyomo by setting within=pyo.Binary at the time we create the variable.

1.2.4. Objective function

**Expression 3.3.** The objective function to be minimized: total tour distance

To construct an objective function we can “store” the expression in a function that will be used as the construction rule for the objective. This function only takes one argument, the model, which is used to fetch any model component needed to build the expression.

### =====  Code block 3.6  ===== ###
def _rule_total_distance_traveled(model):
""" total distance traveled """
return pyo.summation(model.distance_ij, model.delta_ij)rule = _rule_total_distance_traveled
model.obj_total_distance = pyo.Objective(rule=rule, 
sense=pyo.minimize, 
doc=rule.__doc__)

Observe the parallelism between the mathematical expression and the return statement of the function. We specify that this is a minimization problem with the sense keyword.

1.2.5. Constraints

If you recall from the previous article, a convenient way to enforce that each site is visited only once is by enforcing that each site is “entered” once and “exited” once, simultaneously.

Each site is entered just once

def _rule_site_is_entered_once(model, j):
""" each site j must be visited from exactly one other site """
return sum(model.delta_ij[i, j] for i in model.sites if i != j) == 1rule = _rule_site_is_entered_once
model.constr_each_site_is_entered_once = pyo.Constraint(
model.sites, 
rule=rule, 
doc=rule.__doc__)

Each site is exited just once

def _rule_site_is_exited_once(model, i):
""" each site i must departure to exactly one other site """
return sum(model.delta_ij[i, j] for j in model.sites if j != i) == 1rule = _rule_site_is_exited_once
model.constr_each_site_is_exited_once = pyo.Constraint(
model.sites, 
rule=rule, 
doc=rule.__doc__)

1.2.6. Final inspection of the model

The model implementation is done. To see what it looks like as a whole, we should execute model.pprint(), and navigate a bit around to see if we missed some declarations or made some mistakes.

To get an idea of the size of the model, we print the number of variables and constraints that it is made of:

def print_model_info(model):
print(f"Name: model.name", 
f"Num variables: model.nvariables()",
f"Num constraints: model.nconstraints()", sep="\n- ")print_model_info(model)
#[Out]:
# Name: TSP
#  - Num variables: 72
#  - Num constraints: 18

Having fewer than 100 constraints or variables, this is a small-size problem, and it will be optimized relatively fast by the solver.