Image Analysis: Territorial Evolution of Poland

The history of Poland has, through the centuries, come with ups and downs. Through numerous wars with its neighbours and invasion by foreign powers, the territory of Poland has fluctuated greatly. So much, in fact, that it is not necessarily straightforward to write down a continuous history of Poland as an entity. If you feel like plowing through its lengthy story, check out the article Territorial Evolution of Poland on Wikipedia.

I thought it might be a cute project to try to plot just how much the surface area of Poland has fluctuated. I love history, and why not combine data science with another one of my passions? A quantitative visual story can be a very powerful companion to a written one.

I couldn’t find such data readily available (and if I would have, where would the challenge have been?) but I found a useable resource in the Wikipedia article I just linked to. User Esemono furnishes this article with some wonderful maps of his own creation, and has generously released them into the public domain.

Polish territory in 1635.

Polish territory in 1635. Map by Wikipedia user Esemono.

These maps are reasonably simple, showing just (quasi-)monochromatic states and their borders. Can we get the surface area from such a map? Of course we can: we just count how many pixels Poland takes up, by looking at the colour!

First, we need to find out what colour Poland actually is. The colour is not perfectly uniform, and unfortunately it’s also not entirely consistent from one map to the next. To find out which colours we should consider Polish, we can use ImageMagick.

convert <filename> -define histogram:unique-colors=true \
-format %c histogram:info:-

What convert is doing here is give you a histogram of every colour it finds in the image, and how many pixels are set to that colour. The output is formatted to be displayed in your terminal, not to be saved to a datafile, but that can be fixed by piping it through some postprocessing.

We end up with a nicely sorted list of colours and the number of times they occur. To analyze which colours are important for our purpose, let’s just look at the top 10 in the histogram for the map above.

222311 (255,255,191) #FFFFBF srgb(255,255,191)
98051 (86,163,127) #56A37F srgb(86,163,127)
70944 (127,181,241) #7FB5F1 srgb(127,181,241)
37222 (102,164,19) #66A413 srgb(102,164,19)
22575 (111,104,155) #6F689B srgb(111,104,155)
8735 (254,254,190) #FEFEBE srgb(254,254,190)
4410 (87,164,128) #57A480 srgb(87,164,128)
3476 (0,0,0) #000000 black
3446 (255,255,190) #FFFFBE srgb(255,255,190)
3436 (0,0,120) #000078 srgb(0,0,120)

Now you might be inclined to guess that the top colour represents Poland. It’s prominent, as you would expect looking at the map above, and looking at the RGB values, it’s also quite light. In fact, you’d be right, but is it the only colour that represents Poland? Let’s visualize the colours in this histogram. We could actually plot the histogram data, which would probably look cool, but we don’t really need to. It will suffice to just take a palette, squirt out some of each of these colours, and inspect them visually. ImageMagick can help us out again.

# Load up all the colours
for i in {1..10}
color[$i]=$(sed -n ${i}p <filename>-hist_top10.dat | awk '{ print $3 }')
# Paint a pretty picture with them convert -size 300x500 xc:white -strokewidth 50\ -stroke "${color[1]}" -draw "line 0,25 300,25"\ -stroke "${color[2]}" -draw "line 0,75 300,75"\ -stroke "${color[3]}" -draw "line 0,125 300,125"\ -stroke "${color[4]}" -draw "line 0,175 300,175"\ -stroke "${color[5]}" -draw "line 0,225 300,225"\ -stroke "${color[6]}" -draw "line 0,275 300,275"\ -stroke "${color[7]}" -draw "line 0,325 300,325"\ -stroke "${color[8]}" -draw "line 0,375 300,375"\ -stroke "${color[9]}" -draw "line 0,425 300,425"\ -stroke "${color[10]}" -draw "line 0,475 300,475"\ -gravity northwest -font "Linux-Libertine-Mono-O-Mono" -pointsize 30\ -strokewidth 0 -stroke "#888888" -fill "#888888" -interline-spacing 15\ -annotate +10+10 "${color[1]}\n${color[2]}\n${color[3]}\n${color[4]}\n \ ${color[5]}\n${color[6]}\n${color[7]}\n${color[8]}\n \ ${color[9]}\n${color[10]}\n" <filename>-hist_top10.png
Top colors in the 1635 map of Poland.

Top colors in the 1635 map of Poland.

This just draws a horizontal band with each of the ten colours, in order, and overlays the colour code. The result is displayed to the right.

We see immediately that the top colour is indeed Poland, but there are two more colours represented in the top-10 that are extremely similar. It’s hard to tell them apart by eye. These additional shades are probably JPEG artefacts, or due to anti-aliasing. A quick manual check verified that these colours are indeed part of the Poland area.

The occurrences of these additional colours are not very high (some two orders of magnitude smaller than that of the top colour) but let’s take them into account anyway – at least the ones that make it into the top-10.

It’s possible to automate the extraction of these colours, and I will show you how to do that in a future post. For now however, I won’t go into that. There aren’t so many datapoints, and the relevant colours change in such a way that some clever copying and pasting will do.

So that’s what I did and I ended up with a list that contained the year (multiple copies possible) and a colour that represents Poland in that year. A sample:

1635 #FFFFBF
1635 #FEFEBE
1635 #FFFFBE
1645 #FFFFBF
1645 #FEFEBE

Now we need to match these colours with their histogram values. We can do so with some simple sed and awk manipulations.

# Number of entries
n=$(wc -l colorsofpoland.dat | awk '{print $1}')
rm -f tmp.dat
# Loop through the entries for ((i=1; i<=$n; i++)) do # Get the year year=$(sed -n ${i}p colorsofpoland.dat | awk '{print $1}')
# Get the colour color=$(sed -n ${i}p colorsofpoland.dat | awk '{print $2}')
# Get the corresponding histogram value from the right file A=$(awk -v color="$color" 'BEGIN{A=0} \ {if($3==color)A=$1}END{print A}' \ Territorial_changes_of_Poland_$year.jpg-hist_top10.dat)
# Output to a file echo $A >> tmp.dat done
# Add the histogram values to the original table paste colorsofpoland.dat tmp.dat > cop_with_pixels.dat

We end up with the same list, but with pixel numbers added:

1635 #FFFFBF 222311
1635 #FEFEBE 8735
1635 #FFFFBE 3446
1645 #FFFFBF 222285
1645 #FEFEBE 8703

Almost there! Now we just need to add up the pixel values corresponding to the same year. Some more awk trickery will do it.

awk '
END {for (i in arr) print i, 4*arr[i]/1000}
' cop_with_pixels.dat > surfaceofpoland.dat

However, we don’t just want to know the number of pixels; we want some real-life unit of surface area! To get that I just looked at the last pixel value in the list, for 2002. Assuming that Poland’s borders haven’t changed since 2002, we can just look up the real value of the surface area of Polish territory today. That figure Wikipedia knows: 312 679 kmĀ². It turns out that this number is almost exactly four times as big as the number of pixels found for 2002, so we just multiply all our pixel values by 4.

And that’s it! We have successfully extracted numerical surface area data from these simple maps! The final plot is below. I highlighted a couple of interesting historical events that did some serious work shaping the country’s borders.

The surface area of Poland over time.

The surface area of Poland over time.

It’s somehow very satisfying to see history represented in terms of quantifiable data. It gives a visual sense of time and scale, and of the geopolitical impact that historical events can have. The so-called Partitions of Poland were a cataclysmic event: the total disappearance of an independent state for over a hundred years. Less devastating but still interesting to see happen is the loss of territory in the post-WWII restructuring in Eastern Europe. Poland survived the trials it faced in the course of its history, but it did not do so unscathed.

Marco is a theoretical (bio)physicist, currently engaged in unraveling the sequence-dependent dynamics of DNA molecules to earn his PhD at Leiden University. Other passions include literature and history.

Leave a Reply


1 comment

  1. Bartosz Malinowski

    Hello, changes in territory is one thing, but the socio-political cataclysms that accompanied these changes can’t go unnoticed. Both the post-Partitions and post-WW2 traumas still live on. Thanks for the beautiful plot!

Next ArticleImage Analysis: Dynamic Color Binning