Let Your Hyperlinks Live Forever!

It is the the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. This needs thought, and organization, and commitment. (https://www.w3.org/Provider/Style/URI)

Joel Spolsky did it:

 I’m bending over backwards not to create “linkrot” — all old links to Joel on Software stories have been replaced with redirects, so they should still work. (November 2001)

More than once:

I owe a huge debt of gratitude to [several people] for weeks of hard work on creating this almost perfect port of 16 years of cruft, preserving over 1000 links with redirects… (December 2016).

Most of the outgoing URLs linked by Joel of Software have rotted, with some notable exceptions: Jakob Nielsen’s URLs do still work, so they live what he preached – in 1998:

… linkrot contributes to dissolving the very fabric of the Web: there is a looming danger that the Web will stop being an interconnected universal hypertext and turn into a set of isolated info-islands. Anything that reduces the prevalence and usefulness of cross-site linking is a direct attack on the founding principle of the Web.

No excuses if you are not Spolsky- or Nielsen-famous – I did it too, several times. In 2015 I rewrote the application for my websites from scratch and redirected every single .asp URL to a new friendly URL at a new subdomain.

I am obsessed with keeping old URLs working. I don’t like it if websites are migrated to a new content management system, changing all the URLs.

I checked all that again when migrating to HTTPS last year.

So I am a typical nitpicking dinosaur, waxing nostalgic about the time when web pages were still pages, and when Hyperlinks Subverted Hierarchy. When browsers were not yet running an OS written in Javascript and hogging 70% of your CPU for ad-tracking or crypto-mining.

The dinosaur is grumpy when it has to fix outgoing URLs on this blog. So. Many. Times. Like every second time I test a URL that shows up in my WordPress statistics as clicked, it 404s. Then I try to find equivalent content on the same site if the domain does still exist – and had not been orphaned and hijacked by malvertizers. If I am not successful I link to a version of this content on web.archive.org, track down the content owner’s new site, or find similar content elsewhere.

My heart breaks when I see that it’s specifically the interesting, unusual content that users want to follow from here – like hard-to-find historical information on how to build a heat pump from clay tablets and straw. My heart breaks even more when the technical content on the target site gets dumbed down more and more with every URL breaking website overhaul. But OK – you now have this terrific header image with a happy-people-at-work stock photo that covers all my desktop so that I have to scroll for anything, and the dumbed down content is shown in boxes that pop up and whirl – totally responsive, though clunky on a desktop computer.

And, yes: I totally know that site owners don’t own me anything. Just because you hosted that rare and interesting content for the last 10 years does not mean you have to do that forever.

But you marketing ninjas and website wranglers neglected an important point: We live in the age of silly gamification that makes 1990s link building pale: I like yours and you like mine. Buy Followers. Every time I read a puffed up Case Study for a project I was familiar with as an insider, I was laughing for minutes and then checked if it was not satire.

In this era of fake word-of-mouth marketing you get incoming links. People say something thoughtful, maybe even nice about you just because they found your content interesting and worth linking not because you play silly games of reciprocating. The most valuable links are set by people you don’t know and who did not anticipate you will ever notice their link. As Nassim Taleb says: Virtue is what you do when nobody is looking.

I would go to great lengths not to break links to my sites in those obscure DIY forums whose posts are hardly indexed by search engines. At least I would make a half-hearted attempt at redirecting to a custom 404 page that explains where you might the moved content. Or just keep the domain name intact. Which of course means not to register a catchy domain name for every product in the first place. Which I consider bad practice anyway – training users to fall for phishing, by getting them used to jumping from one weird but legit domain to another.

And, no, I don’t blame you personally, poor stressed out web admin who had to get the new site up and running before April 1st, because suits in your company said the world would come to an end otherwise. I just think that our internet culture that embraces natural linkrot so easily is as broken as the links.

I tag this as Rant, but it is a Plea: I beg you, I implore you to invest just a tiny part of the time, budget and efforts you allocated to Making the Experience of Your Website Better to making some attempt at keeping your URLs intact. They are actually valuable for others – something you should be proud of.

Reverse Engineering Fun

Recently I read a lot about reverse engineering –  in relation to malware research. I for one simply wanted to get ancient and hardly documented HVAC engineering software to work.

The software in question should have shown a photo of the front panel of a device – knobs and displays – augmented with current system’s data, and you could have played with settings to ‘simulate’ the control unit’s behavior.

I tested it on several machines, to rule out some typical issues quickly: Will in run on Windows 7? Will it run on a 32bit system? Do I need to run it was Administrator? None of that helped. I actually saw the application’s user interface coming up once, on the Win 7 32bit test machine I had not started in a while. But I could not reproduce the correct start-up, and in all other attempts on all other machines I just encountered an error message … that used an Asian character set.

I poked around the files and folders the application uses. There were some .xls and .xml files, and most text was in the foreign character set. The Asian error message was a generic Windows dialogue box: You cannot select the text within it directly, but the whole contents of such error messages can be copied using Ctrl+C. Pasting it into Google Translate it told me:

Failed to read the XY device data file

Checking the files again, there was an on xydevice.xls file, and I wondered if the relative path from exe to xls did not work, or if it was an issue with permissions. The latter was hard to believe, given that I simply copied the whole bunch of files, my user having the same (full) permissions on all of them.

I started Microsoft Sysinternals Process Monitor to check if the application was groping in vain for the file. It found the file just fine in the right location:

Immediately before accessing the file, the application looped through registry entries for Microsoft JET database drivers for Office files – the last one it probed was msexcl40.dll – a  database driver for accessing Excel files.

There is no obvious error in this dump: The xls file was closed before the Windows error popup was brought up; so the application had handled the error somehow.

I had been tinkering a lot myself with database drivers for Excel spreadsheets, Access databases, and even text files – so that looked like a familiar engineering software hack to me 🙂 On start-up the application created a bunch of XML files – I saw them once, right after I saw the GUI once in that non-reproducible test. As far as I could decipher the content in the foreign language, the entries were taken from that problematic xls file which contained a formatted table. It seemed that the application was using a sheet in the xls file as a database table.

What went wrong? I started Windows debugger WinDbg (part of the Debugging tools for Windows). I tried to go the next unhandled or handled exception, and I saw again that it stumbled over msexec40.dll:

But here was finally a complete and googleable error message in nerd speak:

Unexpected error from external database driver (1).

This sounded generic and I was not very optimistic. But this recent Microsoft article was one of the few mentioning the specific error message – an overview of operating system updates and fixes, dated October 2017. It describes exactly the observed issue with using the JET database driver to access an xls file:

Finally my curious observation of the non-reproducible single successful test made sense: When I started the exe on the Win 7 test client, this computer had been started the first time after ~3 months; it was old and slow, and it was just processing Windows Updates – so at the first run the software had worked because the deadly Windows Update had not been applied yet.

Also the ‘2007 timeframe’ mentioned was consistent – as all the application’s executable files were nearly 10 years old. The recommended strategy is to use a more modern version of the database driver, but Microsoft also states they will fix it again in a future version.

So I did not get the software to to run, as I obviously cannot fix somebody else’s compiled code – but I could provide the exact information needed by the developer to repair it.

But the key message in this post is that it was simply a lot of fun to track this down 🙂

The Orphaned Internet Domain Risk

I have clicked on company websites of social media acquaintances, and something is not right: Slight errors in formatting, encoding errors for special German characters.

Then I notice that some of the pages contain links to other websites that advertize products in a spammy way. However, the links to the spammy sites are embedded in this alleged company websites in a subtle way: Using the (nearly) correct layout, or  embedding the link in a ‘news article’ that also contains legit product information – content really related to the internet domain I am visiting.

Looking up whois information tells me that these internet domain are not owned by my friends anymore – consistent with what they actually say on the social media profiles. So how come that they ‘have given’ their former domains to spammers? They did not, and they didn’t need to: Spammers simply need to watch out for expired domains, seize them when they are available – and then reconstruct the former legit content from public archives, and interleave it with their spammy messages.

The former content of legitimate sites is often available on the web archive. Here is the timeline of one of the sites I checked:

Clicking on the details shows:

  • Last display of legit content in 2008.
  • In 2012 and 2013 a generic message from the hosting provider was displayed: This site has been registered by one of our clients
  • After that we see mainly 403 Forbidden errors – so the spammers don’t want their site to be archived – but at one time a screen capture of the spammy site had been taken.

The new site shows the name of the former owner at the bottom but an unobtrusive link had been added, indicating the new owner – a US-based marketing and SEO consultancy.

So my take away is: If you ever feel like decluttering your websites and free yourself of your useless digital possessions – and possibly also social media accounts, think twice: As soon as your domain or name is available, somebody might take it, and re-use and exploit your former content and possibly your former reputation for promoting their spammy stuff in a shady way.

This happened a while ago, but I know now it can get much worse: Why only distribute marketing spam if you can distribute malware through channels still considered trusted? In this blog post Malwarebytes raises the question if such practices are illegal or not – it seems that question is not straight-forward to answer.

Visitors do not even have to visit the abandoned domain explicitly to get hacked by malware served. I have seen some reports of abandoned embedded plug-ins turned into malicious zombies. Silly example: If you embed your latest tweets, Twitter goes out-of-business, and its domains are seized by spammers – you Follow Me icon might help to spread malware.

If a legit site runs third-party code, they need to trust the authors of this code. For example, Equifax’ website recently served spyware:

… the problem stemmed from a “third-party vendor that Equifax uses to collect website performance data,” and that “the vendor’s code running on an Equifax Web site was serving malicious content.”

So if you run any plug-ins, embedded widgets or the like – better check out regularly if the originating domain is still run by the expected owner – monitor your vendors often; and don’t run code you do not absolutely need in the first place. Don’t use embedded active badges if a simple link to your profile would do.

Do a painful boring inventory and assessment often – then you will notice how much work it is to manage these ‘partners’ and rather stay away from signing up and registering for too much services.

Update 2017-10-25: And as we speak, we learn about another example – snatching a domain used for a Dell backup software, preinstalled on PCs.

Other People Have Lives – I Have Domains

These are just some boring update notifications from the elkemental Webiverse.

The elkement blog has recently celebrated its fifth anniversary, and the punktwissen blog will turn five in December. Time to celebrate this – with new domain names that says exactly what these sites are – the ‘elkement.blog‘ and the ‘punktwissen.blog‘.

Actually, I wanted to get rid of the ads on both blogs, and with the upgrade came a free domain. WordPress has a detailed cookie policy – and I am showing it dutifully using the respective widget, but they have to defer to their partners when it comes to third-party cookies. I only want to worry about research cookies set by Twitter and Facebook, but not by ad providers, and I am also considering to remove social media sharing buttons and the embedded tweets. (Yes, I am thinking about this!)

On the websites under my control I went full dinosaur, and the server sends only non-interactive HTML pages sent to the client, not requiring any client-side activity. I now got rid of the last half-hearted usage of a session object and the respective cookie, and I have never used any social media buttons or other tracking.

So there are no login data or cookies to protect, but yet I finally migrated all sites to HTTPS.

It is a matter of principle: I of all website owners should use https. Since 15 years I have been planning and building Public Key Infrastructures and troubleshooting X.509 certificates.

But of course I fear Google’s verdict: They have announced long ago to HTTPS is considered a positive ranking by its search engine. Pages not using HTTPS will be tagged as insecure using more and more terrifying icons – e.g. http-only pages with login buttons already display a striked-through padlock in Firefox. In the past years I migrated a lot of PKIs from SHA1 to SHA256 to fight the first wave of Insecure icons.

Finally Let’s Encrypt has started a revolution: Free SSL certificates, based on domain validation only. My hosting provider uses a solution based on Let’s Encrypt – using a reverse proxy that does the actual HTTPS. I only had to re-target all my DNS records to the reverse proxy – it would have been very easy would it not have been for all my already existing URL rewriting and tweaking and redirecting. I also wanted to keep the option of still using HTTP in the future for tests and special scenario (like hosting a revocation list), so I decided on redirecting myself in the application(s) instead of using the offered automated redirect. But a code review and clean-up now and then can never hurt 🙂 For large complex sites the migration to HTTPS is anything but easy.

In case I ever forget which domains and host names I use, I just need to check out this list of Subject Alternative Names again:

(And I have another certificate for the ‘test’ host names that I need for testing the sites themselves and also for testing various redirects ;-))

WordPress.com also uses Let’s Encrypt (Automattic is a sponsor), and the SAN elkement.blog is lumped together with several other blog names, allegedly the ones which needed new certificates at about the same time.

It will be interesting what the consequences for phishing websites will be. Malicious websites will look trusted as being issued certificates automatically, but revoking a certificate might provide another method for invalidating a malicious website.

Anyway, special thanks to the WordPress.com Happiness Engineers and support staff at my hosting provider Puaschitz IT. Despite all the nerdiness displayed on this blog I prefer hosted / ‘shared’ solutions when it comes to my own websites because I totally like it when somebody else has to patch the server and deal with attacks. I am an annoying client – with all kinds of special needs and questions – thanks for the great support! 🙂

Ice Storage Hierarchy of Needs

Data Kraken – the tentacled tangled pieces of software for data analysis – has a secret theoretical sibling, an older one: Before we built our heat source from a cellar, I developed numerical simulations of the future heat pump system. Today this simulation tool comprises e.g. a model of our control system, real-live weather data, energy balances of all storage tanks, and a solution to the heat equation for the ground surrounding the water/ice tank.

I can model the change of the tank temperature and  ‘peak ice’ in a heating season. But the point of these simulations is rather to find out to which parameters the system’s performance reacts particularly sensitive: In a worst case scenario will the storage tank be large enough?

A seemingly fascinating aspect was how peak ice ‘reacts’ to input parameters: It is quite sensitive to the properties of ground and the solar/air collector. If you made either the ground or the collector just ‘a bit worse’, ice seems to grow out of proportion. Taking a step back I realized that I could have come to that conclusion using simple energy accounting instead of differential equations – once I had long-term data for the average energy harvesting power of the collector and ground. Caveat: The simple calculation only works if these estimates are reliable for a chosen system – and this depends e.g. on hydraulic design, control logic, the shape of the tank, and the heat transfer properties of ground and collector.

For the operations of the combined tank+collector source the critical months are the ice months Dec/Jan/Feb when air temperature does not allow harvesting all energy from air. Before and after that period, the solar/air collector is nearly the only source anyway. As I emphasized on this blog again and again, even during the ice months, the collector is still the main source and delivers most of the ambient energy the heat pump needs (if properly sized) in a typical winter. The rest has to come from energy stored in the ground surrounding the tank or from freezing water.

I am finally succumbing to trends of edutainment and storytelling in science communications – here is an infographic:

Ambient energy needed in Dec/Jan/Fec - approximate contributions of collector, ground, ice

(Add analogies to psychology here.)

Using some typical numbers, I am illustrating 4 scenarios in the figure below, for a  system with these parameters:

  • A cuboid tank of about 23 m3
  • Required ambient energy for the three ice months is ~7000kWh
    (about 9330kWh of heating energy at a performance factor of 4)
  • ‘Standard’ scenario: The collector delivers 75% of the ambient energy, ground delivers about 18%.
  • Worse’ scenarios: Either collector or/and ground energy is reduced by 25% compared to the standard.

Contributions of the three sources add up to the total ambient energy needed – this is yet another way of combining different energies in one balance.

Contributions to ambient energy in ice months - scenarios.

Ambient energy needed by the heat pump in  Dec+Jan+Feb,  as delivered by the three different sources. Latent ‘ice’ energy is also translated to the percentage of water in the tank that would be frozen.

Neither collector nor ground energy change much in relation to the base line. But latent energy has to fill in the gap: As the total collector energy is much higher than the total latent energy content of the tank, an increase in the gap is large in relation to the base ice energy.

If collector and ground would both ‘underdeliver’ by 25% the tank in this scenario would be frozen completely instead of only 23%.

The ice energy is just the peak of the total ambient energy iceberg.

You could call this system an air-geothermal-ice heat pump then!

____________________________

Continued: Here are some details on simulations.

My Data Kraken – a Shapeshifter

I wonder if Data Kraken is only used by German speakers who translate our hackneyed Datenkrake – is it a word like eigenvector?

Anyway, I need this animal metaphor, despite this post is not about facebook or Google. It’s about my personal Data Kraken – which is a true shapeshifter like all octopuses are:

(… because they are spineless, but I don’t want to over-interpret the metaphor…)

Data Kraken’s shapeability is a blessing, given ongoing challenges:

When the Chief Engineer is fighting with other intimidating life-forms in our habitat, he focuses on survival first and foremost … and sometimes he forgets to inform the Chief Science Officer about fundamental changes to our landscape of sensors. Then Data Kraken has to be trained again to learn how to detect if the heat pump is on or off in a specific timeslot. Use the signal sent from control to the heat pump? Or to the brine pump? Or better use brine flow and temperature difference?

It might seem like a dull and tedious exercise to calculate ‘averages’ and other performance indicators that require only very simple arithmetics. But with the exception of room or ambient temperature most of the ‘averages’ just make sense if some condition is met, like: The heating water inlet temperature should only be calculated when the heating circuit pump is on. But the temperature of the cold water, when the same floor loops are used for cooling in summer, should not be included in this average of ‘heating water temperature’. Above all, false sensor readings, like 0, NULL or any value (like 999) a vendor chooses to indicate as an error, have to be excluded. And sometimes I rediscover eternal truths like the ratio of averages not being equal to the average of ratios.

The Chief Engineer is tinkering with new sensors all the time: In parallel to using the old & robust analog sensor for measuring the water level in the tank…

Level sensor: The old way

… a multitude of level sensors was evaluated …

Level sensors: The precursors

… until finally Mr. Bubble won the casting …

blubber-messrohr-3

… and the surface level is now measured via the pressure increasing linearly with depth. For the Big Data Department this means to add some new fields to the Kraken database, calculate new averages … and to smoothly transition from the volume of ice calculated from ruler readings to the new values.

Change is the only constant in the universe, paraphrasing Heraclitus [*]. Sensors morph in purpose: The heating circuit, formerly known (to the control unit) as the radiator circuit became a new wall heating circuit, and the radiator circuit was virtually reborn as a new circuit.

I am also guilty of adding new tentacles all the time, too, herding a zoo of meters added in 2015, each of them adding a new log file, containing data taken at different points of time in different intervals. This year I let Kraken put tentacles into the heat pump:

Data Kraken: Tentacles in the heat pump!

But the most challenging data source to integrate is the most unassuming source of logging data: The small list of the data that The Chief Engineer had recorded manually until recently (until the advent of Miss Pi CAN Sniffer and Mr Bubble). Reason: He had refused to take data at exactly 00:00:00 every single day, so learned things I never wanted to know about SQL programming languages to deal with the odd time intervals.

To be fair, the Chief Engineer has been dedicated at data recording! He never shunned true challenges, like a legendary white-out in our garden, at the time when measuring ground temperatures was not automated yet:

The challenge

White Out

Long-term readers of this blog know that ‘elkement’ stands for a combination of nerd and luddite, so I try to merge a dinosaur scripting approach with real-world global AI Data Krakens’ wildest dream: I wrote scripts that create scripts that create scripts [[[…]]] that were based on a small proto-Kraken – a nice-to-use documentation database containing the history of sensors and calculations.

The mutated Kraken is able to eat all kinds of log files, including clients’ ones, and above all, it can be cloned easily.

I’ve added all the images and anecdotes to justify why an unpretentious user interface like the following is my true Christmas present to myself – ‘easily clickable’ calculated performance data for days, months, years, and heating seasons.

Data Kraken: UI

… and diagrams that can be changed automatically, by selecting interesting parameters and time frames:

Excel for visualization of measurement data

The major overhaul of Data Kraken turned out to be prescient as a seemingly innocuous firmware upgrade just changed not only log file naming conventions and publication scheduled but also shuffled all the fields in log files. My Data Kraken has to be capable to rebuild the SQL database from scratch, based on a documentation of those ever changing fields and the raw log files.

_________________________________

[*] It was hard to find the true original quote for that, as the internet is cluttered with change management coaches using that quote, and Heraclitus speaks to us only through secondary sources. But anyway, what this philosophy website says about Heraclitus applies very well to my Data Kraken:

The exact interpretation of these doctrines is controversial, as is the inference often drawn from this theory that in the world as Heraclitus conceives it contradictory propositions must be true.

In my world, I also need to deal with intriguing ambiguity!

My Flat-File Database

A brief update on my web programming project.

I have preferred to create online text by editing simple text files; so I only need a text editor and an FTP client as management tool. My ‘old’ personal and business web pages are currently created dynamically in the following way:
[Code for including a script (including other scripts)]
[Content of the article in plain HTML = inner HTML of content div]
[Code for writing footer]

The main script(s) create layout containers, meta tags, navigation menus etc.

Meta information about pages or about the whole site are kept in CSV text files. There are e.g. files with tables…

  • … listing all of pages in each site and their attributes – like title, key words, hover texts for navigation links or
  • … tabulating all main properties of all web sites – such as ‘tag lines’ or the name of the CSS file.

A bunch of CSV files / tables can be accessed like a database by defining the columns in a schema.ini file, and using a text driver (on my Windows web server). I am running SQL queries against these text files, and it would be simple to migrate my CSV files to a grown-up database. But I tacked on RSS feeds later; these XML files are hand-crafted and basically a parallel ‘database’.

This CSV file database is not yet what I mean by flat-file database: In my new site the content of a typical ‘article file’ should be plain text, free from code. All meta information will be included in each file, instead of putting it into the separate CSV files. A typical file would look like this:

title: Some really catchy title
headline: Some equally catchy, but a bit longer headline
date_created: 2015-09-15 11:42
date_changed: 2015-09-15 11:45
author: elkement
[more properties and meta tags]
content:
Text in plain HTML.

The logic for creating formatted pages with header, footer, menus etc. has to be contained in code separate from these files; and text files needs to be parsed for meta data and content. The set of files has effectively become ‘the database’, the plain text content being just one of many attributes of a page. Folder structure and file naming conventions are part of the ‘database logic’.

I figured this was all an unprofessional hack until I found many so-called flat-file / database-less content management systems on the internet, intended to be used with smaller sites. They comprise some folders with text files, to be named according to a pre-defined schema plus parsing code that will extract meta data from files’ contents.

Motivated by that find, I created the following structure in VB.NET from scratch:

  • Retrieving a set of text files based on a search criteria from the file system – e.g. for creating the menu from all pages, or for searching for one specific file that should represent the current page – current as per the URL the user entered.
  • Code for parsing a text file for lines having a [name]: [value] structure
  • Processing nice URL entered by the user to make the web server pick the correct text file.

Speaking about URLs, so-called ASP.NET Routing came in handy: Before, I had used a few folders whose default page redirects to an existing page (such as /heatpump/ redirecting to /somefolder/heatpump.asp). Otherwise my URLs all corresponded to existing single files.

I use a typical blogging platform’s schema with the new site: If users enters

/en/2015/09/15/some-cool-article/

the server accesses a text text file whose name contains language, year, such as:

2015-09-15_en_some-cool-article.txt

… and displays the content at the nice URL.

‘Language’ is part of the URL: If a user with a German browsers explicitly accesses an URL starting with /en/ , the language is effectively set to English. However, If the main page is hit, I detect the language from the header sent by the client.

I am not overly original: I use two categories of content – posts and pages – corresponding to text files organized in two different folders in the file system, and following different conventions for file names. Learning from my experience with hand-crafted menu pages in this this blog here, I added:

  • A summary text included in the file, to be displayed in a list of posts per category.
  • A list of posts in a single category, displayed on the category / menu page.

The category is assigned to the post simply as part of the file name; moving a post to another category is done by renaming it.

Since I found that having to add my Google+ posts to just a single Collection was a nice exercise I limit myself to one category per post deliberately.

Having built all the required search patterns and functions for creating lists of posts or menus or recent posts, or for extracting information from specific pages as the current or the corresponding page in the other language …  I realized that I needed a better and clear-cut separation of a high-level query for a bunch of attributes for any set of files meeting some criteria from the lower level doing the search, file retrieval, and parsing.

So why not using genuine SQL commands at the top level – to be translated to file searches and file content parsing on the lower level?

I envisaged building the menu of all pages e.g. by executing something like

SELECT title, url, headline from pages WHERE isMenu=TRUE

and creating the list of recent posts on the home page by running

SELECT * FROM posts WHERE date_created < [some date]

This would also allow for a smooth migration to an actual relational database system if the performance of file-based database would not be that great after all.

I underestimated the efforts of ‘building your own database engine’, but finally the main logic is done. My file system recordset class has this functionality (and I think I finally got the hang of classes and objects):

  • Parse a SQL string to check if it is well-formed.
  • Split it into pieces and translate pieces to names of tables (from FROM) and list of fields (from SELECT and WHERE).
  • For each field, check (against my schema) if the field should be encoded in the file’s name of if it was part of the name / value attributes in the file contents.
  • Build a file search pattern string with * at the right places from the file name attributes.
  • Get the list of files meeting this part of the WHERE criteria.
  • Parse the contents of each file and exclude those not meeting the ‘content fields’ criteria specified in the WHERE clause.
  • Stuff all attributes specified in the SELECT statement into a table-like structure (a dataTable in .NET) and return a recordset object –  that can be queried and handled like recordsets returned by standard database queries – that is: Check for End Of File, or MoveNext, return the value of a specific cell in a column with specific name.

Now I am (re-)creating all collections of pages and posts using my personal SQL engine, In parallel I am manually sifting through old content and turning my web pages into articles. To do: The tag cloud and handling tags in general, and the generation of the RSS XML file from the database.

The new site is not publicly available yet. At the time of writing of this post, all my sites still use the old schema.

Disclaimers:

  • I don’t claim this is the best way to build a web site / blog. It’s also a fun project for the sake of having fun with developing it, exploring the limits of flat-file databases, forcing myself to deal with potential performance issues.
  • It is a deliberate choice: My hosting space allows for picking from different well-known relational databases and I have done a lot of SQL Server programming in the past months in other projects.
  • I have a licence of Visual Studio. Using only a text editor instead is a deliberate choice, too.