# My Data Kraken – a Shapeshifter

I wonder if Data Kraken is only used by German speakers who translate our hackneyed Datenkrake – is it a word like eigenvector?

Anyway, I need this animal metaphor, despite this post is not about facebook or Google. It’s about my personal Data Kraken – which is a true shapeshifter like all octopuses are:

(… because they are spineless, but I don’t want to over-interpret the metaphor…)

Data Kraken’s shapeability is a blessing, given ongoing challenges:

When the Chief Engineer is fighting with other intimidating life-forms in our habitat, he focuses on survival first and foremost … and sometimes he forgets to inform the Chief Science Officer about fundamental changes to our landscape of sensors. Then Data Kraken has to be trained again to learn how to detect if the heat pump is on or off in a specific timeslot. Use the signal sent from control to the heat pump? Or to the brine pump? Or better use brine flow and temperature difference?

It might seem like a dull and tedious exercise to calculate ‘averages’ and other performance indicators that require only very simple arithmetics. But with the exception of room or ambient temperature most of the ‘averages’ just make sense if some condition is met, like: The heating water inlet temperature should only be calculated when the heating circuit pump is on. But the temperature of the cold water, when the same floor loops are used for cooling in summer, should not be included in this average of ‘heating water temperature’. Above all, false sensor readings, like 0, NULL or any value (like 999) a vendor chooses to indicate as an error, have to be excluded. And sometimes I rediscover eternal truths like the ratio of averages not being equal to the average of ratios.

The Chief Engineer is tinkering with new sensors all the time: In parallel to using the old & robust analog sensor for measuring the water level in the tank…

… a multitude of level sensors was evaluated …

… until finally Mr. Bubble won the casting …

… and the surface level is now measured via the pressure increasing linearly with depth. For the Big Data Department this means to add some new fields to the Kraken database, calculate new averages … and to smoothly transition from the volume of ice calculated from ruler readings to the new values.

Change is the only constant in the universe, paraphrasing Heraclitus [*]. Sensors morph in purpose: The heating circuit, formerly known (to the control unit) as the radiator circuit became a new wall heating circuit, and the radiator circuit was virtually reborn as a new circuit.

I am also guilty of adding new tentacles all the time, too, herding a zoo of meters added in 2015, each of them adding a new log file, containing data taken at different points of time in different intervals. This year I let Kraken put tentacles into the heat pump:

But the most challenging data source to integrate is the most unassuming source of logging data: The small list of the data that The Chief Engineer had recorded manually until recently (until the advent of Miss Pi CAN Sniffer and Mr Bubble). Reason: He had refused to take data at exactly 00:00:00 every single day, so learned things I never wanted to know about SQL programming languages to deal with the odd time intervals.

To be fair, the Chief Engineer has been dedicated at data recording! He never shunned true challenges, like a legendary white-out in our garden, at the time when measuring ground temperatures was not automated yet:

Long-term readers of this blog know that ‘elkement’ stands for a combination of nerd and luddite, so I try to merge a dinosaur scripting approach with real-world global AI Data Krakens’ wildest dream: I wrote scripts that create scripts that create scripts [[[…]]] that were based on a small proto-Kraken – a nice-to-use documentation database containing the history of sensors and calculations.

The mutated Kraken is able to eat all kinds of log files, including clients’ ones, and above all, it can be cloned easily.

I’ve added all the images and anecdotes to justify why an unpretentious user interface like the following is my true Christmas present to myself – ‘easily clickable’ calculated performance data for days, months, years, and heating seasons.

… and diagrams that can be changed automatically, by selecting interesting parameters and time frames:

The major overhaul of Data Kraken turned out to be prescient as a seemingly innocuous firmware upgrade just changed not only log file naming conventions and publication scheduled but also shuffled all the fields in log files. My Data Kraken has to be capable to rebuild the SQL database from scratch, based on a documentation of those ever changing fields and the raw log files.

_________________________________

[*] It was hard to find the true original quote for that, as the internet is cluttered with change management coaches using that quote, and Heraclitus speaks to us only through secondary sources. But anyway, what this philosophy website says about Heraclitus applies very well to my Data Kraken:

The exact interpretation of these doctrines is controversial, as is the inference often drawn from this theory that in the world as Heraclitus conceives it contradictory propositions must be true.

In my world, I also need to deal with intriguing ambiguity!

# Give the ‘Thing’ a Subnet of Its Own!

To my surprise, the most clicked post ever on this blog is this:

Network Sniffing for Everyone:
Getting to Know Your Things (As in Internet of Things)

… a step-by-step guide to sniff the network traffic of your ‘things’ contacting their mothership, plus a brief introduction to networking. I wanted to show how you can trace your networked devices’ traffic without any specialized equipment but being creative with what many users might already have, by turning a Windows PC into a router with Internet Connection Sharing.

Recently, an army of captured things took down part of the internet, and this reminded me of this post. No, this is not one more gloomy article about the Internet of Things. I just needed to use this Internet Sharing feature for the very purpose it was actually invented.

The Chief Engineer had finally set up the perfect test lab for programming and testing freely programmable UVR16x2 control systems (successor of UVR1611). But this test lab was a spot not equipped with wired ethernet, and the control unit’s data logger and ethernet gateway, so-called CMI (Control and Monitoring Interface), only has a LAN interface and no WLAN.

So an ages-old test laptop was revived to serve as a router (improving its ecological footprint in passing): This notebook connects to the standard ‘office’ network via WLAN: This wireless connection is thus the internet connection that can be shared with a device connected to the notebook’s LAN interface, e.g. via a cross-over cable. As explained in detail in the older article the router-laptop then allows for sniffing the traffic, – but above all it allows the ‘thing’ to connect to the internet at all.

This is the setup:

The router laptop is automatically configured with IP address 192.168.137.1 and hands out addresses in the 192.168.137.x network as a DHCP server, while using an IP address provided by the internet router for its WLAN adapter (indicated here as commonly used 192.168.0.x addresses). If Windows 10 is used on the router-notebook, you might need to re-enable ICS after a reboot.

The control unit is connected to the CMI via CAN bus – so the combination of test laptop, CMI, and UVR16x2 control unit is similar to the setup used for investigating CAN monitoring recently.

The CMI ‘thing’ is tucked away in a private subnet dedicated to it, and it cannot be accessed directly from any ‘Office PC’ – except the router PC itself. A standard office PC (green) effectively has to access the CMI via the same ‘cloud’ route as an Internet User (red). This makes the setup a realistic test for future remote support – when the CMI plus control unit has been shipped to its proud owner and is configured on the final local network.

The private subnet setup is also a simple workaround in case several things can not get along well with each other: For example, an internet TV service flooded CMI’s predecessor BL-NET with packets that were hard to digest – so BL-NET refused to work without a further reboot. Putting the sensitive device in a private subnet – using a ‘spare part’ router, solved the problem.

# Hacking My Heat Pump – Part 2: Logging Energy Values

In the last post, I showed how to use Raspberry Pi as CAN bus logger – using a test bus connected to control unit UVR1611. Now I have connected it to my heat pump’s bus.

Credits for software and instructions:

Special thanks to SK Pang Electronics who provided me with CAN boards for Raspberry Pi after having read my previous post!!

CAN extension boards for Raspberry Pi, by SK Pang. Left: PiCAN 2 board (40 GPIO pins), right: smaller, retired PiCAN board with 26 GPIO pins – the latter fits my older Pi. In contrast to the board I used in the first tests, these have also a serial (DB9) interface.

Wiring CAN bus

We use a Stiebel-Eltron WPF 7 basic heat pump installed in 2012. The English website now refers to model WPF 7 basic s.

The CAN bus connections described in the German manual (Section 12.2.3) and the English manual (Wiring diagram, p.25) are similar:

CAN bus connections inside WPF 7 basic heat pump. For reference, see the description of the Physical Layer of the CAN protocol. Usage of the power supply (BUS +) is optional.

H, L and GROUND wires from the Pi’s CAN board are connected to the respective terminals inside the heat pump. I don’t use the optional power supply as the CAN board is powered by Raspberry Pi, and I don’t terminate the bus correctly with 120 Ω. As with the test bus, wires are rather short and thus have low resistance.

Heat pump with cover removed – CAN High (H – red), Low (L – blue), and Ground (yellow) are connected. The CAN cable is a few meters long and connects to the Raspberry Pi CAN board.

In the first tests Raspberry Pi had the privilege to overlook the heat pump room as the top of the buffer tank was the only spot the WLAN signal was strong enough …

Typical, temporary nerd’s test setup.

… or I used a cross-over ethernet cable and a special office desk:

Typical, temporary nerd’s workplace.

Now Raspberry Pi has its final position on the ‘organic controller board’, next to control unit UVR16x2 – and after a major upgrade to both LAN and WLAN all connections are reliable.

Raspberry Pi with PiCAN board from SK Pang and UVR16x2 control unit from Technische Alternative (each connected to a different CAN bus).

Bringing up the interface

According to messpunkt.org the bit rate of Stiebel-Eltron’s bus is 20000 bit/s; so the interface is activated with:

```sudo ip link set can0 type can bitrate 20000
sudo ifconfig can0 up```

Watching the idle bus

First I was simply watching with sniffer Wireshark if the heat pump says anything without being triggered. It does not – only once every few minutes there are two packets. So I need to learn to talk to it.

SK Pang provides an example of requesting data using open source tool cansend: The so-called CAN ID is followed by # and the actual data. This CAN ID refers to an ‘object’ – a set of properties of the device, like the set of inputs or outputs – and it can contain also the node ID of the device on the bus. There are many CAN tutorials on the net, I found this (German) introduction and this English tutorial very useful.

I was able to follow the communications of the two nodes in my test bus as I knew their node numbers and what to expect – the data logger would ask the controller for a set of configured sensor outputs every minute. Most packets sent by either bus member are related to object 480, indicating the transmission of a set of values (Process Data Exchange Objects, PDOs. More details on UVR’s CAN communication, in German)

Sniffing test CAN bus – communication of UVR1611 (node no 1) and logger BL-NET (node number 62 = be). Both devices use an ID related to object ID 480 plus their respective node number, as described here.

So I need to know object ID(s) and properly formed data values to ask the heat pump for energy readings – without breaking something by changing values.

Collecting interesting heat pump parameters for monitoring

I am very grateful for Jürg’s CAN tool can_scan that allow for querying a Stiebel-Eltron heat pump for specific values and also for learning about all possible parameters (listed in so-called Elster tables).

In order to check the list of allowed CAN IDs used by the heat pump I run:

`./can_scan can0 680`

can0 is the (default) name of the interface created earlier and 680 is my (the sender’s) CAN ID, one of the IDs allowed by can_scan.

Start of output:

```elster-kromschroeder can-bus address scanner and test utility
copyright (c) 2014 Jürg Müller, CH-5524

scan on CAN-id: 680
list of valid can id's:

000 (8000 = 325-07)
180 (8000 = 325-07)
301 (8000 = 325-07)
480 (8000 = 325-07)
601 (8000 = 325-07)```

In order to investigate available values and their meaning I run can_scan for each of these IDs:

`./can_scan can0 680 180`

Embedded below is part of the output, containing some of the values (and /* Comments */). This list of parameters is much longer than the list of values available via the display on the heat pump!

I am mainly interested in metered energies and current temperatures of the heat source (brine) and the ‘environment’ – to compare these values to other sensors’ output:

```elster-kromschroeder can-bus address scanner and test utility
copyright (c) 2014 Jürg Müller, CH-5524

0001:  0000  (FEHLERMELDUNG  0)
0003:  019a  (SPEICHERSOLLTEMP  41.0)
0005:  00f0  (RAUMSOLLTEMP_I  24.0)
0006:  00c8  (RAUMSOLLTEMP_II  20.0)
0007:  00c8  (RAUMSOLLTEMP_III  20.0)
0008:  00a0  (RAUMSOLLTEMP_NACHT  16.0)
0009:  3a0e  (UHRZEIT  14:58)
000a:  1208  (DATUM  18.08.)
000c:  00e9  (AUSSENTEMP  23.3) /* Ambient temperature */
000d:  ffe6  (SAMMLERISTTEMP  -2.6)
000e:  fe70  (SPEICHERISTTEMP  -40.0)
0010:  0050  (GERAETEKONFIGURATION  80)
0013:  01e0  (EINSTELL_SPEICHERSOLLTEMP  48.0)
0016:  0140  (RUECKLAUFISTTEMP  32.0) /* Heating water return temperature */
...
01d4:  00e2  (QUELLE_IST  22.6) /* Source (brine) temperature */
...
/* Hot tap water heating energy MWh + kWh */
/* Daily totaly */
092a:  030d  (WAERMEERTRAG_WW_TAG_WH  781)
092b:  0000  (WAERMEERTRAG_WW_TAG_KWH  0)
/* Total energy since system startup */
092c:  0155  (WAERMEERTRAG_WW_SUM_KWH  341)
092d:  001a  (WAERMEERTRAG_WW_SUM_MWH  26)
/* Space heating energy, MWh + kWh */
/* Daily totals */
092e:  02db  (WAERMEERTRAG_HEIZ_TAG_WH  731)
092f:  0006  (WAERMEERTRAG_HEIZ_TAG_KWH  6)
/* Total energy since system startup */
0930:  0073  (WAERMEERTRAG_HEIZ_SUM_KWH  115)
0931:  0027  (WAERMEERTRAG_HEIZ_SUM_MWH  39)```

Querying for one value

The the heating energy to date in MWh corresponds to index 0931:

`./can_scan can0 680 180.0931`

The output of can_scan already contains the sum of the MWh (0931) and kWh (0930) values:

```elster-kromschroeder can-bus address scanner and test utility
copyright (c) 2014 Jürg Müller, CH-5524

value: 0027  (WAERMEERTRAG_HEIZ_SUM_MWH  39.115)```

The network trace shows that the logger (using ID 680) queries for two values related to ID 180 – the kWh and the MWh part:

Network trace of Raspberry Pi CAN logger (ID 680) querying CAN ID 180. Since the returned MWh value is the sum of MWh and kWh value, two queries are needed. Detailed interpretation of packets in the text below.

Interpretation of these four packets – as explained on Jürg’s website here and here in German:

```00 00 06 80 05 00 00 00 31 00 fa 09 31
00 00 01 80 07 00 00 00 d2 00 fa 09 31 00 27
00 00 06 80 05 00 00 00 31 00 fa 09 30
00 00 01 80 07 00 00 00 d2 00 fa 09 30 00 73
|---------| ||          |---| || |---| |---|
1)          2)          3)    4) 5)    6)

1) CAN-ID used by the sender: 180 or 680
2) No of bytes of data - 5 for queries, 8 for replies
3) CAN ID of the communications partner and type of message.
For queries the second digit is 1.
Pattern: n1 0m with n = 180 / 80 = 3 (hex) and m = 180 mod 7 = 0
(hex) Partner ID = 30 * 8 (hex) + 00 = 180
Responses follow a similar pattern using second digit 2:
Partner ID is: d0 * 8 + 00 = 680
4) fa indicates that the Elster index no is greater equal ff.
5) Index (parameter) queried for: 0930 for kWh and 0931 for MWh
6) Value returned 27h=39,73h=115```

I am not sure which node IDs my logger and the heat pump use as the IDs. 180 seems to be an object ID without node ID added while 301 would refer to object ID + node ID 1. But I suppose with two devices on the bus only, and one being only a listener, there is no ambiguity.

Logging script

I found all interesting indices listed under CAN ID 180; so am now looping through this set once every three minutes with can_scan, cut out the number, and add it to a new line in a text log file. The CAN interfaces is (re-)started every time in case something happens, and the file is sent to my local server via FTP.

Every month a new log file is started, and log files – to be imported into my SQL Server  and processed as log files from UVR1611 / UVR16x2, the PV generator’s inverter, or the smart meter.

(Not the most elegant script – consider it a ‘proof of concept’! Another option is to trigger the sending of data with can_scan and collect output via can_logger.)

Interesting to-be-logged parameters are added to a ‘table’ – a file called indices:

```0016 RUECKLAUFISTTEMP
01d4 QUELLE_IST
01d6 WPVORLAUFIST
091b EL_AUFNAHMELEISTUNG_WW_TAG_KWH
091d EL_AUFNAHMELEISTUNG_WW_SUM_MWH
091f EL_AUFNAHMELEISTUNG_HEIZ_TAG_KWH
0921 EL_AUFNAHMELEISTUNG_HEIZ_SUM_MWH
092b WAERMEERTRAG_WW_TAG_KWH
092f WAERMEERTRAG_HEIZ_TAG_KWH
092d WAERMEERTRAG_WW_SUM_MWH
0931 WAERMEERTRAG_HEIZ_SUM_MWH
000c AUSSENTEMP
0923 WAERMEERTRAG_2WE_WW_TAG_KWH
0925 WAERMEERTRAG_2WE_WW_SUM_MWH
0927 WAERMEERTRAG_2WE_HEIZ_TAG_KWH
0929 WAERMEERTRAG_2WE_HEIZ_SUM_MWH```

Script:

```# Define folders
logdir="/CAN_LOGS"
scriptsdir="/CAN_SCRIPTS"
indexfile="\$scriptsdir/indices"

# FTP parameters
ftphost="FTP_SERVER"
ftpuser="FTP_USER"
ftppw="***********"

if ! [ -d \$scriptsdir ]
then
echo Directory \$scriptsdir does not exist!
exit 1
fi

# Create log dir if it does not exist yet
if ! [ -d \$logdir ]
then
mkdir \$logdir
fi

sleep 5

echo ======================================================================

# Start logging
while [ 0 -le 1 ]
do

# Get current date and start new logging line
now=\$(date +'%Y-%m-%d;%H:%M:%S')
line=\$now
year=\$(date +'%Y')
month=\$(date +'%m')
logfile=\$year-\$month-can-log-wpf7.csv
logfilepath=\$logdir/\$logfile

# Create a new file for every month, write header line
# Create a new file for every month
if ! [ -f \$logfilepath ]
then
do
header=\$(echo \$indexline | cut -d" " -f2)
done < \$indexfile ; echo "\$headers" > \$logfilepath
fi

# (Re-)start CAN interface
sudo ip link set can0 type can bitrate 20000
sudo ip link set can0 up

# Loop through interesting Elster indices
do
# Get output of can_scan for this index, search for line with output values
index=\$(echo \$indexline | cut -d" " -f1)
value=\$(\$scriptsdir/./can_scan can0 680 180.\$index | grep "value" | replace ")" "" | grep -o "\<[0-9]*\.\?[0-9]*\$" | replace "." ",")
echo "\$index \$value"

# Append value to line of CSV file
line="\$line;\$value"
done < \$indexfile ; echo \$line >> \$logfilepath

# echo FTP log file to server
ftp -n -v \$ftphost << END_SCRIPT
ascii
user \$ftpuser \$ftppw
binary
cd RPi
ls
lcd \$logdir
put \$logfile
ls
bye
END_SCRIPT

echo "------------------------------------------------------------------"

# Wait - next logging data point
sleep 180

# Runs forever, use Ctrl+C to stop
done
```

In order to autostart the script I added a line to the rc.local file:

`su pi -c '/CAN_SCRIPTS/pkt_can_monitor'`

Using the logged values

In contrast to brine or water temperature heating energies are not available on the heat pump’s CAN bus in real-time: The main MWh counter is only incremented once per day at midnight. Then the daily kWh counter is added to the previous value.

Daily or monthly energy increments are calculated from the logged values in the SQL database and for example used to determine performance factors (heating energy over electrical energy) shown in our documentation of measurement data for the heat pump system.

# Random Things I Have Learned from My Web Development Project

It’s nearly done (previous episode here).

I have copied all the content from my personal websites, painstakingly disentangling snippets of different ‘posts’ that were physically contained in the same ‘web page’, re-assigning existing images to them, adding tags, consolidating information that was stored in different places. Raking the Virtual Zen Garden – again.

Draft of the layout, showing a ‘post’. Left and right pane vanish in responsive fashion if the screen gets too small.

… Nothing you have not seen in more elaborate fashion elsewhere. For me the pleasure is in creating the whole thing bottom up not using existing frameworks, content management systems or templates – requiring an FTP client and a text editor only.

I spent a lot of time on designing my redirect strategy. For historical reasons, all my sites use the same virtual web server. Different sites have been separated just by different virtual directories. So in order to display the e-stangl.at content as one stand-alone website, a viewer accessing e-stangl.at is redirected to e-stangl.at/e/. This means that entering [personal.at]/[business] would result in showing the business content at the personal URL. In order to prevent this, the main page generation script used checks for the virtual directory and redirects ‘bottom-up’ to [business.at]/[business].

In the future, I am going to use a new hostname for my website. In addition, I want to have the option to migrate only some applications while keeping the others tied to the old ASP scripts temporarily. This means more redirect logic, especially as I want to test all the redirects. I have a non-public test site on the same server, but I have never tested redirects as it means creating loads of test host names; but due to the complexity of redirects to come I added names like wwwdummy for every domain, redirecting to my new main test host name, in the same way as the www URLs would redirect to my new public host name.

And lest we forget I am obsessed with keeping old URLs working. I don’t like it if websites are migrated to a new content management system, changing all the URLs. As I mentioned before, I already use ASP.NET Routing for having nice URLs with the new site: A request for /en/2014/10/29/some-post-title does not access a physical folder but the ‘flat-file database engine’ I wrote from scratch will search for the proper content text file based on a SQL string handed to it, retrieve attributes from both file name and file content, and display HTML content and attributes like title and thumbnail image properly.

Flat-file database: Two folders, ‘pages’ and ‘posts’. Post file names include creation date, short relative URL and category. Using the ascx extension (actually for .NET ‘user controls’ as the web server will not return these files directly but respond with 404. No need to tweak permissions.)

The top menu, the tag cloud, the yearly/monthly/daily archives, the list of posts on the Home page, XML RSS Feed and XML sitemap  are also created by querying these sets of files.

File representing a post: Upper half – meta tags and attributes, lower half – after attribute ‘content’: Actual content in plain HTML.

Now I want to redirect from the old .asp files (to be deleted from the server at some point in the future) to these nice URLs. My preferred solution for this class of redirects is using a rewrite map hard-coded in the web server’s config file. From my spreadsheet documentation of the 1:n relation of old ASP pages to new ‘posts’ I have automatically created the XML tags to be inserted in the ‘rewrite map’.

Now the boring part is over and I scared everybody off (But just in case you can find more technical information on the last update on the English version of all website, e.g. here) …

… I come up with my grand insights, click-bait X-Things-You-Need-To-Know-About-Seomthing-You-Should-Not-Do-and-Could-Not-Care-Less-Style:

It is sometimes painful to read really old content, like articles, manifestos and speeches from the last century. Yet I don’t hide or change anything.

After all, this is perhaps the point of such a website. I did not go online for the interaction (of social networks, clicks, likes, comments). Putting your thoughts out there, on the internet that does never forget, is like publishing a book you cannot un-publish. It is about holding yourself accountable and aiming at self-consistency.

I am not a visual person. If I would have been more courageous I’d use plain Courier New without formatting and images. Just for the fun of it, I tested adding dedicated images to each post and creating thumbnails from them – and I admit it adds to the content. Disturbing, that is!

I truly love software development. After a day of ‘professional’ software development (simulations re physics and engineering) I am still happy to plunge into this personal web development project. I realized programming is one of the few occupations that was part of any job I ever had. Years ago, soul-searching and preparing for the next career change, I rather figured the main common feature was teaching and know-how transfer – workshops and acedemic lectures etc. But I am relieved I gave that up; perhaps I just tried to live up to the expected ideal of the techie who will finally turn to a more managerial or at least ‘social’ role.

You can always find perfect rationales for irrational projects: Our web server had been hacked last year (ASP pages with spammy links put into some folders) and from backlinks in the network of spammy links I conclude that classical ASP pages had been targeted. My web server was then hosted on Windows 2003, as this time still fully supported. I made use of Parent Paths (../ relative URLs) which might have eased the hack. Now I am migrating to ASP.NET with the goal to turn off Classical ASP completely, and I already got rid of the Parent Paths requirement by editing the existing pages.

This website and my obsession with keeping the old stuff intact reflects my appreciation of The ExistingBeing Creative With What You Have. Re-using my old images and articles feels like re-using our cellar as a water tank. Both of which are passions I might not share with too many people.

My websites had been an experiment in compartmentalizing my thinking and writing – ‘Personal’, ‘Science’, ‘Weird’, at the very beginning the latter two were authored pseudonymously – briefly. My wordpress.com blog has been one quick shot at Grand Unified Theory of my Blogging, and I could not prevent my personal websites to become more an more intertwined, too, in the past years. So finally both do reflect my reluctance of separating my personal and professional self.

My website is self-indulgent – in content and in meta-content. I realize that the technical features I have added are exactly what I need to browse my own stuff for myself, not necessarily what readers might expect or what is considered standard practice. One example is my preference for a three-pane design, and for that infinite (no dropdown-menu) archive.

Nothing slows a website down like social media integration. My text file management is for sure not the epitome of efficient programming, but I was flabbergasted by how fast it was to display nearly 150 posts at once – compared to the endless sending back and forth questionable stuff between social networks, tracking, and ad sites (watch the status bar!).

However, this gives me some ideas about the purpose of this blog versus the purpose of my website. Here, on the WordPress.com blog, I feel more challenged to write self-contained, complete, edited, shareable (?) articles – often based on extensive research and consolidation of our original(*) data (OK, there are exceptions, such as this post), whereas the personal website is more of a container of drafts and personal announcements. This also explains why the technical sections of my personal websites contain rather collections of links than full articles.

(*)Which is why I totally use my subversive sense of humour and turn into a nitpicking furious submitter of copyright complaints if somebody steals my articles published here, on the blog. However, I wonder how I’d react if somebody infringed my rights as the ‘web artist’ featured on subversiv.at.

Since 15 years I spent a lot of time on (re-)organizing and categorizing my content. This blog has also been part of this initiative. That re-organization is what I like websites and blogs for – a place to play with structure and content, and their relationship. Again, doing this in public makes me holding myself accountable. Categories are weird – I believe they can only be done right with hindsight. Now all my websites, blogs, and social media profiles eventually use the same categories which have evolved naturally and are very unlike what I might have planned ‘theoretically’.

Structure should be light-weight. I started my websites with the idea of first and second level ‘menu’s and hardly any emphasis on time stamps. But your own persona and your ideas seem to be moving targets. I started commenting on my old articles, correcting or amending what I said (as I don’t delete, see above). subversiv.at has been my Art-from-the-Scrapyard-Weird-Experiments playground, before and in addition to the Art category here and over there I enjoyed commenting in English on German articles and vice versa. But the Temporal Structure, the Arrow of Time was stronger; so I finally made the structure more blog-like.

Curated lists … were most often just ‘posts’. I started collecting links, like resources for specific topics or my own posts written elsewhere, but after some time I did not considered them so useful any more. Perhaps somebody noticed that I have mothballed and hidden my Reading list and Physics Resources here (the latter moved to my ‘science site’ radices.net – URLs do still work of course). Again: The arrow of time wins!

I loved and I cursed the bilingual nature of all my sites. Cursed, because the old structure made it too obvious when the counter-part in the other language was ‘missing’; so it felt like a translation assignment. However, I don’t like translations. I am actually not even capable to really translate the spirit of my own posts. Sometimes I feel like writing in English, sometimes I feel like writing in German. Some days or weeks or months later I feel like reflecting in the same ideas, using the other language. Now I came up with that loose connection of an English and German article, referencing each other via a meta attribute, which results in an unobtrusive URL pointing to the other version.

Quantitative analysis helps to correct distorted views. I thought I wrote ‘so much’. But the tangle of posts and pages in the old sites obscured that actually the content translates to only 138 posts in German and 78 in English. Actually, I wrote in bursts, typically immediately before and after an important change, and the first main burst 2004/2005 was German-only. I think the numbers would have been higher had I given up on the menu-based approach earlier, and rather written a new, updated ‘post’ instead of adding infinitesimal amendments to the existing pseudo-static pages.

Analysing my own process of analysing puts me into this detached mode of thinking. I have shielded myself from social media timelines in the past weeks and tinkered with articles, content written long before somebody could have ‘shared’ it. I feel that it motivates me again to not care about things like word count (too long), target groups (weird mixture of armchair web psychology and technical content), and shareability.

# My Flat-File Database

A brief update on my web programming project.

I have preferred to create online text by editing simple text files; so I only need a text editor and an FTP client as management tool. My ‘old’ personal and business web pages are currently created dynamically in the following way:
[Code for including a script (including other scripts)]
[Content of the article in plain HTML = inner HTML of content div]
[Code for writing footer]

Meta information about pages or about the whole site are kept in CSV text files. There are e.g. files with tables…

• … listing all of pages in each site and their attributes – like title, key words, hover texts for navigation links or
• … tabulating all main properties of all web sites – such as ‘tag lines’ or the name of the CSS file.

A bunch of CSV files / tables can be accessed like a database by defining the columns in a schema.ini file, and using a text driver (on my Windows web server). I am running SQL queries against these text files, and it would be simple to migrate my CSV files to a grown-up database. But I tacked on RSS feeds later; these XML files are hand-crafted and basically a parallel ‘database’.

This CSV file database is not yet what I mean by flat-file database: In my new site the content of a typical ‘article file’ should be plain text, free from code. All meta information will be included in each file, instead of putting it into the separate CSV files. A typical file would look like this:

```title: Some really catchy title
date_created: 2015-09-15 11:42
date_changed: 2015-09-15 11:45
author: elkement
[more properties and meta tags]
content:
Text in plain HTML.
```

The logic for creating formatted pages with header, footer, menus etc. has to be contained in code separate from these files; and text files needs to be parsed for meta data and content. The set of files has effectively become ‘the database’, the plain text content being just one of many attributes of a page. Folder structure and file naming conventions are part of the ‘database logic’.

I figured this was all an unprofessional hack until I found many so-called flat-file / database-less content management systems on the internet, intended to be used with smaller sites. They comprise some folders with text files, to be named according to a pre-defined schema plus parsing code that will extract meta data from files’ contents.

Motivated by that find, I created the following structure in VB.NET from scratch:

• Retrieving a set of text files based on a search criteria from the file system – e.g. for creating the menu from all pages, or for searching for one specific file that should represent the current page – current as per the URL the user entered.
• Code for parsing a text file for lines having a [name]: [value] structure
• Processing nice URL entered by the user to make the web server pick the correct text file.

Speaking about URLs, so-called ASP.NET Routing came in handy: Before, I had used a few folders whose default page redirects to an existing page (such as /heatpump/ redirecting to /somefolder/heatpump.asp). Otherwise my URLs all corresponded to existing single files.

I use a typical blogging platform’s schema with the new site: If users enters

`/en/2015/09/15/some-cool-article/`

the server accesses a text text file whose name contains language, year, such as:

`2015-09-15_en_some-cool-article.txt`

… and displays the content at the nice URL.

‘Language’ is part of the URL: If a user with a German browsers explicitly accesses an URL starting with /en/ , the language is effectively set to English. However, If the main page is hit, I detect the language from the header sent by the client.

I am not overly original: I use two categories of content – posts and pages – corresponding to text files organized in two different folders in the file system, and following different conventions for file names. Learning from my experience with hand-crafted menu pages in this this blog here, I added:

• A summary text included in the file, to be displayed in a list of posts per category.
• A list of posts in a single category, displayed on the category / menu page.

The category is assigned to the post simply as part of the file name; moving a post to another category is done by renaming it.

Since I found that having to add my Google+ posts to just a single Collection was a nice exercise I limit myself to one category per post deliberately.

Having built all the required search patterns and functions for creating lists of posts or menus or recent posts, or for extracting information from specific pages as the current or the corresponding page in the other language …  I realized that I needed a better and clear-cut separation of a high-level query for a bunch of attributes for any set of files meeting some criteria from the lower level doing the search, file retrieval, and parsing.

So why not using genuine SQL commands at the top level – to be translated to file searches and file content parsing on the lower level?

I envisaged building the menu of all pages e.g. by executing something like

`SELECT title, url, headline from pages WHERE isMenu=TRUE`

and creating the list of recent posts on the home page by running

`SELECT * FROM posts WHERE date_created < [some date]`

This would also allow for a smooth migration to an actual relational database system if the performance of file-based database would not be that great after all.

I underestimated the efforts of ‘building your own database engine’, but finally the main logic is done. My file system recordset class has this functionality (and I think I finally got the hang of classes and objects):

• Parse a SQL string to check if it is well-formed.
• Split it into pieces and translate pieces to names of tables (from FROM) and list of fields (from SELECT and WHERE).
• For each field, check (against my schema) if the field should be encoded in the file’s name of if it was part of the name / value attributes in the file contents.
• Build a file search pattern string with * at the right places from the file name attributes.
• Get the list of files meeting this part of the WHERE criteria.
• Parse the contents of each file and exclude those not meeting the ‘content fields’ criteria specified in the WHERE clause.
• Stuff all attributes specified in the SELECT statement into a table-like structure (a dataTable in .NET) and return a recordset object –  that can be queried and handled like recordsets returned by standard database queries – that is: Check for End Of File, or MoveNext, return the value of a specific cell in a column with specific name.

Now I am (re-)creating all collections of pages and posts using my personal SQL engine, In parallel I am manually sifting through old content and turning my web pages into articles. To do: The tag cloud and handling tags in general, and the generation of the RSS XML file from the database.

The new site is not publicly available yet. At the time of writing of this post, all my sites still use the old schema.

Disclaimers:

• I don’t claim this is the best way to build a web site / blog. It’s also a fun project for the sake of having fun with developing it, exploring the limits of flat-file databases, forcing myself to deal with potential performance issues.
• It is a deliberate choice: My hosting space allows for picking from different well-known relational databases and I have done a lot of SQL Server programming in the past months in other projects.
• I have a licence of Visual Studio. Using only a text editor instead is a deliberate choice, too.

# Interrupting Regularly Scheduled Programming …

(… for programming.)

Playing with websites has been a hobby of mine since nearly two decades. What has intrigued me was the combination of different tasks, appealing to different moods – or modes:

• Designing the user interface and organizing content.
• Writing the actual content, and toggling between creative and research mode.
• Developing the backend: database and application logic.

I have distributed different classes of content between my three personal sites, noticed how they drifted apart or become similar again, and have migrated my content over and over when re-doing the underlying software.

Currently the sites run on outdated ASP scripts accessing CSV files as database tables via SQL. This was not a corporate software project – or too similar to one: I kept tacking on new features as I went, indulging in organically grown code. I hand-craft my XML feeds!

It is time to consolidate all this. I feel entitled, motivated, or perhaps even forced to migrate to a new ‘platform’, finally based on true object-oriented programming. Our other three sites run on the same legacy code, which I don’t want to support forever – I will migrate those sites as well in the long run.

So: I am developing a new .NET site from scratch, and I am going to merge my three personal sites into one.

However, I cannot bring myself to re-doing the code only and trying to migrate the content unchanged and as automated as possible. Every old article brings up memories and challenges me to comment on it and to reply to former self. I have to deal with all the three aspects listed above!

As for the layout, the challenge is to preserve the spirit and colors of all three sites – perhaps using something silly as three different layouts that visitors (especially: myself) can pick from, changing the layout based on category, or based on something random.

This is just a first draft – building on the ‘subversive’ layout.

I will dedicate most of my ‘online time’ to this project; so I am taking a break from my usual blogging here and there – except from progress reports on this web migration project – and I will not be very active on social media.