I’ve been doing a lot of work in the network management space recently (on account of that’s my job) and I’ve found it really helpful to use a few randomly-generated network topologies. So I wrote a little tool which takes as input a slightly-massaged version of the cities database from geonames.org and produces as output a graph where cities are vertices and edges are defined in a pseudo-realistic fashion. The graph is defined in XML. I’ll probably get around to open-sourcing the tool, eventually.

By “pseudo-realistic” I mean that cities are much more likely to be adjacent to nearby cities, and the edges are weighted such that a shortest-path algorithm will prefer many short hops over a few long hops. Specifically, the weight of an edge is n*log(n) where n is the great-circle distance between the two incident cities.

This algorithm tends to produce disconnected graphs. It often doesn’t define any edges between the Americas and Africa or Europe. Usually I just go and insert those edges manually.

Comments are extremely welcome re. how to make this simulated topology more realistic. One thing I’m thinking of is defining a set of vertices as “aggregation nodes” and then auto-generating a certain number of devices adjacent to each aggregation node, where the number of aggregated devices per aggregation node is proportional to the population of the aggregation node city.