Territorial evolution of the United States of America, from maps analyzed with the Dynamic Color Binning algorithm.

Image Analysis: Dynamic Color Binning Applied

Continuing my little project for analyzing surface areas using maps, (see my post on the territorial evolution of Poland and the previous post on Dynamic Color Binning) I decided to automate the process for a whole time series of maps.

(Just here for code? Find it on GitHub.)

We developed a method for analyzing the dominant colors and their pixel counts in a given image, which I have christened Dynamic Color Binning, and we want to know how these dominant colors change from one image to the next. A great data structure to do this with is the pandas DataFrame, which is essentially a table with a host of useful functionality. What we want to do is load each of our dominant color histograms into a row of a DataFrame.

DataFrames can be automatically generated from data in a variety of ways, but an easy method for our purpose is to load our list of colors into a dictionaries with entries {'color': pixels}. The DataFrame’s columns can automatically be labeled with the dictionary keys. Let’s assume our colors are in a 2D list called colorlist, where each row contains the three BGR values and the corresponding pixel count.


import pandas as pd

...

colordict = {'('+str(x[0])+','+str(x[1])+','+str(x[2])+')':x[3] for x in colorlist}
dataframe = pd.DataFrame(colordict, index=[0])

We create a string (B,G,R) to represent the color and use it as a key in the dictionary before converting to a DataFrame. We will generate such a DataFrame for each image we want to analyze. These images may of course not all have the same dominant colors. Some maps will contain more colors than others, and peaks may be slightly shifted. This is not a problem for pandas. If we concatenate the dataframes, it will automatically create all the necessary columns, and fill any blank spaces with NaN.

Once we have built a list dataframes of DataFrames for our images, we simply call


bdf = pd.concat(dfs)

Output sample:
Date,(0, 0, 0),(0, 115, 153),(0, 153, 204),(0, 38, 50), \
  (0, 38, 51),(0, 76, 102),(100, 100, 128),(102, 102, 128)...
 
1816-01,18784.0,1220.0,119749.0,NaN,669.0,1204.0,NaN,1430.0...
1822-01,18356.0,1660.0,129535.0,NaN,834.0,1769.0,1914.0,NaN...
1898-01,19099.0,1118.0,38083.0,490.0,NaN,1317.0,5002.0,NaN...
1896-01,19158.0,1081.0,37417.0,490.0,NaN,1276.0,5001.0,NaN...
1889-11,19473.0,2023.0,96369.0,1317.0,NaN,2559.0,3949.0,NaN...
1837-01,18968.0,1473.0,119237.0,NaN,747.0,1529.0,2199.0,NaN...

The dates column was generated from the filenames from this particular set of images. For each, we have a number of columns, each representing a color, for which we can now easily read off the evolution over time.

However, we are not quite there yet. The DataFrame contains colors that should probably have been grouped together as the same color, like (100,100,128) and (102, 102, 128). This is simply due to one or the other variant being dominant in different images. Luckily, using pandas, this problem is also easily fixed.

If colors are so close that they should have been grouped together, it means that for any given image, only one of them will (probably) be present, so the columns for similar colors should be empty. DataFrames come with a method called fillna() to fill in any NaN values, which we can use here to move over the missing values:


# (Psuedocode)
for color in columns, other_color in other_columns:
  delta = distance_in_Lab-space(color2, color1)
  if ( delta < r ):
    
    # Actual Python code:
    # fill color column with entries from other_color...
    bdf[color].fillna(bdf[other_color], inplace=True)
    
    # ...and get rid of the superfluous column
    del bdf[other_color]

We are essentially there. We may want to strip out columns for colors that do not appear often enough to be indicative of an area of interest, and sort the columns, but I’ll not go into details.

By loading all the colors into a table, we can very easily make plots for various components of a map. Let’s apply our new code to an example! Since there isn’t much more to gain from the maps of Poland, I took another set of maps, this time describing the territorial evolution of the (current) United States. Once more turning to Wikipedia, I found a beautiful set of maps, this time by user Golbez, visualizing this evolution. An example is below.

Map of the United States in 1861. Made by Wikipedia user Golbez.

Map of the United States in 1861. Made by Wikipedia user Golbez.

The territory that is currently the United States of America has at points in its history been organized in the federal US, external territories, or the Confederacy. Of course at various times areas were not yet part of the US, or were disputed. This great set of maps shows the evolution of these areas and we can plot their surface areas to see how they evolved.

Territorial evolution of the United States of America, from maps analyzed with the Dynamic Color Binning algorithm.

Territorial evolution of the United States of America, from maps analyzed with the Dynamic Color Binning algorithm.

We can clearly see the United States proper (pink) growing over the course of a century as it makes its way west, taking over foreign parts and turning unorganized territories into new states. Clearly visible is an important dip from 1861 to 1865 – the American Civil War.

The beauty of our new tools can be seen when comparing this plot to the one of Polish territorial surface area. There I only measured one specific area on the maps, semi-manually, and if I had wanted to analyze others, I would have had to spend as much time on them. Our new automated process simply measures all the colors, from which we can then plot as many areas of interest as we like. In this case, it beautifully allows us to see the interplay between all the different statuses that the current United States territory has had in the past.

This concludes the development of my dynamic color binning algorithm for now. If you’re interested in the actual code, written in Python, you can find it on GitHub.

Marco is a theoretical (bio)physicist, currently engaged in unraveling the sequence-dependent dynamics of DNA molecules to earn his PhD at Leiden University. Other passions include literature and history.

Leave a Reply

*

Next ArticleA Simple Sales Model - Kaggle Rossmann Competition