Infinite Loop: Theory and Practice Revisited.

I’ve unlocked a new achievement as a blogger, or a new milestone as a life-form. As a dinosaur telling the same old stories over and over again.

I started drafting a blog post, as I always do since a while: I do it in my mind only, twist and turn in for days or weeks – until I am ready to write it down in one go. Today I wanted to release a post called On Learning (2) or the like. I knew I had written an early post with a similar title, so I expected this to be a loosely related update. But then I checked the old On Learning post: I found not only the same general ideas but the same autobiographical anecdotes I wanted to use now – even  in the same order.

2014 I had looked back on being both a teacher and a student for the greater part of my professional life, and the patterns were always the same – be the field physics, engineering, or IT security. I had written this post after a major update of our software for analyzing measurement data. This update had required me to acquire new skills, which was a delightful learning experience. I tried to reconcile very different learning modes: ‘Book learning’ about so-called theory, including learning for the joy of learning, and solving problems hands-on based on the minimum knowledge absolutely required.

It seems I like to talk about the The Joys of Theory a lot – I have meta-posted about theoretical physics in general, more than oncegeneral relativity as an example, and about computer science. I searched for posts about hands-on learning now – there aren’t any. But every post about my own research and work chronicles this hands-on learning in a non-meta explicit way. These are the posts listed on the heat pump / engineering page,  the IT security / control page, and some of the physics posts about the calculations I used in my own simulations.

Now that I am wallowing in nostalgia and scrolling through my old posts I feel there is one possibly new insight: Whenever I used knowledge to achieve a result that I really needed to get some job done, I think about this knowledge as emerging from hands-on tinkering and from self-study. I once read that many seasoned software developers also said that in a survey about their background: They checked self-taught despite having university degrees or professional training.

This holds for the things I had learned theoretically – be it in a class room or via my morning routine of reading textbooks. I learned about differential equations, thermodynamics, numerical methods, heat pumps, and about object-oriented software development. Yet when I actually have to do all that, it is always like re-learning it again in a more pragmatic way, even if the ‘class’ was very ‘applied’, not much time had passed since learning only, and I had taken exams. This is even true for the archetype all self-studied disciplines – hacking. Doing it – like here  – white-hat-style 😉 – is always a self-learning exercise, and reading about pentesting and security happens in an alternate universe.

The difference between these learning modes is maybe not only in ‘the applied’ versus ‘the theoretical’, but it is your personal stake in the outcome that matters – Skin In The Game. A project done by a group of students for the final purpose of passing a grade is not equivalent to running this project for your client or for yourself. The point is not if the student project is done for a real-life client, or the task as such makes sense in the real world. The difference is whether it feels like an exercise in an gamified system, or whether the result will matter financially / ‘existentially’ as you might try to empress your future client or employer or use the project results to build your own business. The major difference is in weighing risks and rewards, efforts and long-term consequences. Even ‘applied hacking’ in Capture-the-Flag-like contests is different from real-life pentesting. It makes all the difference if you just loose ‘points’ and miss the ‘flag’, or if you inadvertently take down a production system and violate your contract.

So I wonder if the Joy of Theoretical Learning is to some extent due to its risk-free nature. As long as you just learn about all those super interesting things just because you want to know – it is innocent play. Only if you finally touch something in the real world and touching things has hard consequences – only then you know if you are truly ‘interested enough’.

Sorry, but I told you I will post stream-of-consciousness-style now and then 🙂

I think it is OK to re-use the image of my beloved pre-1900 physics book I used in the 2014 post:

Where Are the Files? [Winsol – UVR16x2]

Recently somebody has asked me where the log files are stored. This question is more interesting then it seems.

We are using the freely programmable controller UVR16x2 (and its predecessor) UVR1611) …

.. and their Control and Monitoring Interface – CMI:The CMI is a data logger and runs a web server. It logs data from the controllers (and other devices) via CAN bus – I have demonstrated this in a contrived example recently, and described the whole setup in this older post.

IT / smart home nerds asked me why there are two ‘boxes’ as other solutions only use a ‘single box’ as both controller and logger. I believe separating these functions is safer and more secure: A logger / web server should not be vital to run the controller, and any issues with these auxiliary components must impact the controller’s core functions.

Log files are stored on the CMI in a proprietary format, and they can retrieved via HTTP using the software Winsol. Winsol lets you visualize data for 1 or more days, zoom in, define views etc. – and data can be exported as CSV files. This is the tool we use for reverse engineering hydraulics and control logic (German blog post about remote hydraulics surgery):

In the latest versions of Winsol, log files are per default stored in the user’s profile on Windows:
C:\Users\[Username]\Documents\Technische Alternative\Winsol

I had never paid much attention to this; I had always changed that path in the configuration to make backup and automation easier. The current question about the log files’ location was actually about how I managed to make different users work with the same log files.

The answer might not be obvious because of the historical location of the log files:

Until some version of Winsol in use in 2017 log files were by stored in the Program Files folder, or at least Winsol tried to use that folder. Windows does not allow this anymore for security reasons.

If Winsol is upgraded from an older version, settings might be preserved. I did my tests  with Winsol 2.07 upgraded from an earlier version. I am a bit vague about versions as I did not test different upgrade paths in detail My point is users of control system’s software tend to be conservative when it comes to changing a running system – an older ‘logging PC’ with an older or upgraded version of Winsol is not an unlikely setup.

I started debugging on Windows 10 with the new security feature Controlled Folder Access enabled. CFA, of course, did not know Winsol, considered it an unfriendly app … to be white-listed.

Then I was curious about the default log file folders, and I saw this:

In the Winsol file picker dialogue (to the right) the log folders seem to be in the Program Files folder:
C:\Program Files\Technische Alternative\Winsol\LogX
But in Windows Explorer (to the left) there are no log files at that location.

What does Microsoft Sysinternals Process Monitor say?

There is a Reparse Point, and the file access is redirected to the folder:
C:\Users\[User]\AppData\Local\VirtualStore\Program Files\Technische Alternative\Winsol
Selecting this folder directly in Windows Explorer shows the missing files:

This location can be re-configured in Winsol to allow different users to access the same files (Disclaimer: Perhaps unsupported by the vendor…)

And there are also some truly user-specific configuration files in the user’s profile, in
C:\Users\[User]\AppData\Roaming\Technische Alternative\Winsol

Winsol.xml is e.g. for storing the list of ‘clients’ (logging profiles) that are included in automated processing of log files, and cookie.txt is the logon cookie for access to the online logging portal provided by Technische Alternative. If you absolutely want to switch Windows users *and* switch logging profiles often *and* sync those you have to tinker with Winsol.xml, e.g. by editing it using a script (Disclaimer again: Unlikely to be a supported way of doing things ;-))

As a summary, I describe the steps required to migrate Winsol’s configuration to a new PC and prepare it for usage by different users.

  • Install the latest version of Winsol on the target PC.
  • If you use Controlled Folder Access on Windows 10: Exempt Winsol as a friendly app.
  • Copy the contents of C:\Users\[User]\AppData\Roaming\Technische Alternative\Winsol from the user’s profile on the old machine to the new machine (user-specific config files).
  • If the log file folder shows up at a different path on the two machines – for example when using the same folder via a network share – edit the path in Winsol.xml or configure it in General Settings in Winsol.
  • Copy your existing log data to this new path. LogX contains the main log files, Infosol contain clients’ data. The logging configuration for each client, e.g. the IP address or portal name of the logger, is included in the setup.xml file in the root of each client’s folder.

Note: If you skip some Winsol versions on migrating/upgrading the structure of files might have changed – be careful! Last time that happened by the end of 2016 and Data Kraken had to re-configure some tentacles.

Cloudy Troubleshooting

Actors:

  • Cloud: Service provider delivering an application over the internet.
  • Client: Business using the Cloud
  • Telco: Service provider operating part of the network infrastructure connecting them.
  • elkement: Somebody who always ends up playing intermediary.

~

Client: Cloud logs us off ever so often! We can’t work like this!

elkement: Cloud, what timeouts do you use? Client was only idle for a short break and is logged off.

Cloud: Must be something about your infrastructure – we set the timeout to 1 hour.

Client: It’s becoming worse – Cloud logs us off every few minutes even we are in the middle of working.

[elkement does a quick test. Yes, it is true.]

elkement: Cloud, what’s going on? Any known issue?

Cloud: No issue in our side. We have thousands of happy clients online. If we’d have issues, our inboxes would be on fire.

[elkement does more tests. Different computers at Client. Different logon users. Different Client offices. Different speeds of internet connections. Computers at elkement office.]

elkement: It is difficult to reproduce. It seems like it works well for some computers or some locations for some time. But Cloud – we did not have any issues of that kind in the last year. This year the troubles started.

Cloud: The timing of our app is sensitive: If network cards in your computers turn on power saving that might appear as a disconnect to us.

[elkement learns what she never wanted to know about various power saving settings. To no avail.]

Cloud: What about your bandwidth?… Well, that’s really slow. If all people in the office are using that connection we can totally understand why our app sees your users disappearing.

[elkement on a warpath: Tracking down each application eating bandwidth. Learning what she never wanted to know about tuning the background apps, tracking down processes.]

elkement: Cloud, I’ve throttled everything. I am the only person using Clients’ computers late at night, and I still encounter these issues.

Cloud: Upgrade the internet connection! Our protocol might choke on a hardly noticeable outage.

[elkement has to agree. The late-night tests were done over a remote connections; so measurement may impact results, as in quantum physics.]

Client: Telco, we buy more internet!

[Telco installs more internet, elkement measures speed. Yeah, fast!]

Client: Nothing has changed, Clouds still kicks us out every few minutes.

elkement: Cloud, I need to badger you again….

Cloud: Check the power saving settings of your firewalls, switches, routers. Again, you are the only one reporting such problems.

[The router is a blackbox operated by Telco]

elkement: Telco, does the router use any power saving features? Could you turn that off?

Telco: No we don’t use any power saving at all.

[elkement dreams up conspiracy theories: Sometimes performance seems to degrade after business hours. Cloud running backup jobs? Telco’s lines clogged by private users streaming movies? But sometimes it’s working well even in the location with the crappiest internet connection.]

elkement: Telco, we see this weird issue. It’s either Cloud, Client’s infrastructure, or anything in between, e.g. you. Any known issues?

Telco: No, but [proposal of test that would be difficult to do]. Or send us a Wireshark trace.

elkement: … which is what I planned to do anyway…

[elkement on a warpath 2: Sniffing, tracing every process. Turning off all background stuff. Looking at every packet in the trace. Getting to the level where there are no other packets in between the stream of messages between Client’s computers and Cloud’s servers.]

elkement: Cloud, I tracked it down. This is not a timeout. Look at the trace: Server and client communicating nicely, textbook three-way handshake, server says FIN! And no other packet in the way!

Cloud: Try to connect to a specific server of us.

[elkement: Conspiracy theory about load balancers]

elkement: No – erratic as ever. Sometimes we are logged off, sometimes it works with crappy internet. Note that Client could work during vacation last summer with supper shaky wireless connections.

[Lots of small changes and tests by elkement and Cloud. No solution yet, but the collaboration is seamless. No politics and finger-pointing who to blame – just work. The thing that keeps you happy as a netadmin / sysadmin in stressful times.]

elkement: Client, there is another interface which has less features. I am going to test it…

[elkement: Conspiracy theory about protocols. More night-time testing].

elkement: Client, Other Interface has the same problems.

[elkement on a warpath 3: Testing again with all possible combinations of computers, clients, locations, internet connections. Suddenly a pattern emerges…]

elkement: I see something!! Cloud, I believe it’s user-dependent. Users X and Y are logged off all the time while A and B aren’t.

[elkement scratches head: Why was this so difficult to see? Tests were not that unambiguous until now!]

Cloud: We’ve created a replacement user – please test.

elkement: Yes – New User works reliably all the time! 🙂

Client: It works –  we are not thrown off in the middle of work anymore!

Cloud: Seems that something about the user on our servers is broken – never happened before…

elkement: But wait 😦 it’s not totally OK: Now logged off after 15 minutes of inactivity? But never mind – at least not as bad as logged off every 2 minutes in the middle of some work.

Cloud: Yeah, that could happen – an issue with Add-On Product. But only if your app looks idle to our servers!

elkement: But didn’t you tell us that every timeout ever is no less than 1 hour?

Cloud: No – that 1 hour was another timeout …

elkement: Wow – classic misunderstanding! That’s why it is was so difficult to spot the pattern. So we had two completely different problems, but both looked like unwanted logoffs after a brief period, and at the beginning both weren’t totally reproducible.

[elkement’s theory validated again: If anything qualifies elkement for such stuff at all it was experience in the applied physics lab – tracking down the impact of temperature, pressure and 1000 other parameters on the electrical properties of superconductors… and trying to tell artifacts from reproducible behavior.]

~

Cloudy

Logging Fun with UVR16x2: Photovoltaic Generator – Modbus – CAN Bus

The Data Kraken wants to grow new tentacles.

I am playing with the CMI – Control and Monitoring Interface – the logger / ‘ethernet gateway’ connected to our control units (UVR1611, UVR16x2) via CAN bus. The CMI has become a little Data Kraken itself: Inputs and outputs can be created for CAN bus and Modbus, and data from most CAN devices can also be logged via JSON.

Are these features useful to integrate the datalogger of our photovoltaics inverter – Fronius Symo 4.5-3-M? I am now logging data to an USB stick and feed the CSV files to the SQL Server Data Kraken. The USB logger’s logging interval is 5 minutes whereas Modbus TCP allows for logging every few seconds. The inverter has built-in energy management features, but it can only ‘signal’ via a relay which also requires proper wiring. Modbus TCP, on the other hand, could use the existing WLAN connection of the inverter and the control unit could do something smarter with the sensor reading of the output power.

My motivation is to test if you – as an UVR16x2 user – can re-use a logger you  already have – the CMI – as much as possible, avoiding the need to run another ‘logging server’ all the time (Also my SQL Server is for analysis, not for real-time logging). I know that there are many open source Modbus clients available and that it is easy to write a Python script.

Activate Modbus on the inverter: I prefer floating numbers to integers plus a scaling factor, and I turn off the option to make changes via Modbus:

Modbus settings, Fronius Datalogger, inverter’s local web server. 502 is the TCO standard port. The alternative to floating numbers is integers plus a varying scaling factor (SF), to be found in another register.

Check Fronius documentation of its Modbus registers: The document is currently available here. There are different sets of registers related to the inverter or associated with one string of PV panels:

PDF p.97, Common Inverter Model. For logging AC output power you need:  Address 40092, type of register 3 (read and hold), datatype float32 (‘corresponds’ to two 16bit integer register, thus size 2).

The address to be configured on a Modbus client is smaller than this address by 1 – so 40091 needs to be set to log AC power.

Using these configuration parameters an analog Modbus Input is added at the CMI. The signal is ‘digital’ – but in field-bus-language everything that is not a single bit – 0/1 – seems to be ‘analog’.

Modbus input at the CMI. Input value:  32bits read from the bus interpreted as an integer. Actual value: Integer part of the ‘true value’ = the 32bits interpreted as 32-bit float.

Yes, I checked the network trace 😉 as the byte order dropdown menu confused me: According to the Modbus protocol specification Big Endian is required, not an option.

Factors and data types: Only integer values are understood by CAN devices. Decimal places might be indicated by a scaling factor. The PV power value in Watt has enough significant digits; so the integer part of the float number is fine. But for current in Ampere – typically about 15A maximum – a Factor of 10 would be better. It would not have helped to select int + scaling factor at the inverter: The scaling factor would be stored in a second register, there is a different factor for every parameter, and you cannot configure another ‘scaling factor register’ per input at the CMI. Theoretically you could log the scaling factor separately and re-scale the value in a custom application – but then I would use a separate, custom logger.

In any case, if you screw this up, you see non-sensible numbers of the CAN bus: Slowly evolving positive values – like PV power on a sunny day – are displayed as wild variations of signed integers between -32000 and 32000 😉

Where are the ‘logged’ data? The CMI is first and foremost the data logger for the control units. The CMI does not immediately store the data from Modbus inputs in a  local ‘logging database’. All I have achieved so far is to display the value on the Settings page. The CMI can only log values from the CAN bus or DL bus. So we need an…

… Analog CAN Output at the CMI:

The CMI has the default node number 56 on the CAN bus. Other CAN devices on the bus can query it for this parameter by specifying node 56 and output no 1.

These are the devices on our CAN bus:

CAN bus displayed on the CMI’s website. UVR1611 and UVR16x2 controllers can be managed by clicking their icons – which brings up a web page that resembles the controller’s local display.

The CMI’s Logging page looks tempting – can we simply select the CMI itself as a CAN logging source – CAN 56?

Configuration of the devices the CMI logs data from, via CAN bus. CAN 1 – UVR1611, CAN 2 – UVR16x3, CAN 41 – energy meter CAN-EZ.

Nothing stops you from selecting CAN 56 in this dropdown menu, but it does not end well:

CAN error message displayed at the logger CMI when you try to configure the CMI also as a logging source.

We need a round-trip: Data needs to be sent to a supported device first – one of the controllers on the CAN bus. We need an…

… Analog CAN input / network variable at the UVR16x2:

Configuration of a CAN input at the controller UVR16x2 (via CMI’s web interface to the controller).

The value of AC power is displayed as integer without scaling. Had a factor of 10 been used at the Modbus input it would be ‘corrected’ here, using the Unit called dimensionless,1.

logging-uvr16x2-can-network-input-can-value-display

Values received by the controller UVR16x2 over CAN bus.

Result of all this: UVR16x2 knows PV power and can use it do magic smart things when controlling the heat pump. On the other hand, CMI can log this value – in the same way it logs all other sensor readings.

Log files are retrieved by Winsol, the free logging software for the CMI …

Logged visualized with Winsol. Logfiles are downloaded from the CMI on the internal LAN or via Technsche Alternative’s web portal. PV power (PV.Leistung.Watt) is displayed together with global radiation on a vertical plane (GBS, at the solar/air collector for the theat pump), ambient temperature (red), temperature of solar/air collector (orange)

… or logging is configured at the web portal cmi.ta.co.at …

Configuration of logging at cmi.ta.co.at: Supported loggers are UVR1611 and UVR16x2. Values to be logged are selected from all direct inputs / outputs / functions and from CAN network inputs and outputs.

… and data can be viewed online:

Data visualized at cmi.ta.co.at. Data logged via CAN are sent from the CMI to the web portal.

Using this kind of logging for all values the inverter provides would be costly: It’s not just a column you add to a log file, but you occupy one of the limited inputs and outputs at the CMI and the controller. If you really need to know the voltage between phase 1 and 2 or apparent power you better stick with the USB file or use a separate Modbus logger like a Rasbperry Pi. This project is great and documented very well – data acqusition from a Symo inverter using Python plus a web front end.

Sending Modbus data back and forth from the CMI to UVR controllers is only worth the efforts if you need them for control, not for ‘nice-to-have’ logging.

Things You Find in Your Hydraulic Schematic

Building an ice storage powered heat pump system is a DIY adventure – for a Leonardo da Vinci of plumbing, electrical engineering, carpentry, masonry, and computer technology.

But that holistic approach is already demonstrated clearly in our hydraulic schematics. Actually, here it is even more daring and bold:

There is Plutonium – Pu – everywhere in the heating circuit and the brine circuit …

I can’t tell if this is a hazard or if it boosts energy harvest. But I was not surprised – given that Doc Emmett Brown is our hero.

Maybe we see the impact of contamination already: How should I explain the mutated butterflies with three wings otherwise? After all, they are even tagged with M

Our default backup heating system is … Facebook Messenger!

So the big internet companies are already delivering heating-as-a-service-from-the-cloud!

But what the hell is the tennis ball needed for?

 

Cooling Potential

I had an interesting discussion about the cooling potential of our heat pump system – in a climate warmer than ours.

Recently I’ve shown data for the past heating season, including also passive cooling performance:

After the heating season, tank temperature is limited to 10°C as long as possible – the collector is bypassed in the brine circuit (‘switched off’). But with the beginning of May, the tank temperature starts to rise though as the tank is heated by the surrounding ground.

Daily cooling energy hardly exceeds 20kWh, so the average cooling power is always well below 1kW. This is much lower than the design peak cooling load – the power you would need to cool the rooms to 20°C at noon on a hot in summer day (rather ~10kW for our house.)

The blue spikes are single dots for a few days, and they make the curve look more impressive than it really is: We could use about 600kWh of cooling energy – compared to about 15.000kWh for space heating. (Note that I am from Europe – I use decimal commas and thousands dots :-))

There are three ways of ‘harvesting cold’ with this system:

(1) When water in the hygienic storage tank (for domestic hot water) is heated up in summer, the heat pump extracts heat from the underground tank.

Per summer month the heat pump needs about 170kWh of input ambient energy from the cold tank – for producing an output heating energy of about 7kWh per day – 0,3kW on average for two persons, just in line with ‘standards’. This means that nearly all the passive cooling energy we used was ‘produced’ by heating hot water.

You can see the effect on the cooling power available during a hot day here (from this article on passive cooling in the hot summer of 2015)

Blue arrows indicate hot water heating time slots – for half an hour a cooling power of about 4kW was available. But for keeping the room temperature at somewhat bearable levels, it was crucial to cool ‘low-tech style’ – by opening the windows during the night (Vent)

(2) If nights in late spring and early summer are still cool, the underground tank can be cooled via the collector during the night.

In the last season we gained about ~170kWh in total in that way – so only as much as by one month of hot water heating. The effect also depends on control details: If you start cooling early in the season when you ‘actually do not really need it’ you can harvest more cold because of the higher temperature difference between tank and cold air.

(3) You keep the cold or ice you ‘create’ during the heating season.

The set point tank temperature for summer  is a trade-off between saving as much cooling energy as possible and keeping the Coefficient of Performance (COP) reasonably high also in summer – when the heat sink temperature is 50°C because the heat pump only heats hot tap water.

20°C is the maximum heat source temperature allowed by the heat pump vendor. The temperature difference to the set point of 10°C translates to about 300kWh (only) for 25m3 of water. But cold is also transferred to ground and thus the effective store of cold is larger than the tank itself.

What are the options to increase this seasonal storage of cold?

  • Turning the collector off earlier. To store as much ice as possible, the collector could even be turned off while still in space heating mode – as we did during the Ice Storage Challenge 2015.
  • Active cooling: The store of passive cooling energy is limited – our large tank only contains about 2.000kWh even if frozen completely; If more cooling energy is required, there has to be a cooling backup. Some brine/water heat pumps[#] have a 4-way-valve built into the refrigeration cycle, and the roles of evaporator and condenser can be reversed: The room is cooled and the tank is heated up. In contrast to passive cooling the luke-warm tank and the surrounding ground are useful. The cooling COP would be fantastic because of the low temperature difference between source and sink – it might actually be so high that you need special hydraulic precautions to limit it.

The earlier / the more often the collector is turned off to create ice for passive cooling, the worse the heating COP will be. On the other hand, the more cold you save, the more economic is cooling later:

  1. Because the active cooling COP (or EER[*]) will be higher and
  2. Because the total cooling COP summed over both cooling phases will be higher as no electrical input energy is needed for passive cooling – only circulation pumps.

([*] The COP is the ratio of output heating energy and electrical energy, and the EER – energy efficiency ratio – is the ratio of output cooling energy and electrical energy. Using kWh as the unit for all energies and assuming condenser and evaporator are completely ‘symmetrical’, the EER or a heat pump used ‘in reverse’ is its heating COP minus 1.)

So there would be four distinct ways / phases of running the system in a season:

  1. Standard heating using collector and tank. In a warmer climate, the tank might not even be frozen yet.
  2. Making ice: At end of the heating season the collector might be turned off to build up ice for passive cooling. In case of an ’emergency’ / unexpected cold spell of weather, the collector could be turned on intermittently.
  3. Passive cooling: After the end of the heating season, the underground tank cools the buffer tank (via its internal heat exchanger spirals that containing cool brine) which in turn cools the heating floor loops turned ‘cooling loops’.
  4. When passive cooling power is not sufficient anymore, active cooling could be turned on. The bulk volume of the buffer tank is cooled now directly with the heat pump, and waste heat is deposited in the underground tank and ground. This will also boost the underground heat sink just right to serve as the heat source again in the upcoming heating season.

In both cooling phases the collector could be turned on in colder nights to cool the tank. This will work much better in the active cooling phase – when the tank is likely to be warmer than the air in the night. Actually, night-time cooling might be the main function the collector would have in a warmer climate.

___________________________________

[#] That seems to be valid mainly/only for domestic brine-water heat pumps from North American or Chinese vendors; they offer the reversing valve as a common option. European vendors rather offer a so called Active Cooling box, which is a cabinet that can be nearly as the heat pump itself. It contains a bunch of valves and heat exchangers that allow for ‘externally’ swapping the connections of condenser and evaporator to heat sink and source respectively.

Reverse Engineering Fun

Recently I read a lot about reverse engineering –  in relation to malware research. I for one simply wanted to get ancient and hardly documented HVAC engineering software to work.

The software in question should have shown a photo of the front panel of a device – knobs and displays – augmented with current system’s data, and you could have played with settings to ‘simulate’ the control unit’s behavior.

I tested it on several machines, to rule out some typical issues quickly: Will in run on Windows 7? Will it run on a 32bit system? Do I need to run it was Administrator? None of that helped. I actually saw the application’s user interface coming up once, on the Win 7 32bit test machine I had not started in a while. But I could not reproduce the correct start-up, and in all other attempts on all other machines I just encountered an error message … that used an Asian character set.

I poked around the files and folders the application uses. There were some .xls and .xml files, and most text was in the foreign character set. The Asian error message was a generic Windows dialogue box: You cannot select the text within it directly, but the whole contents of such error messages can be copied using Ctrl+C. Pasting it into Google Translate it told me:

Failed to read the XY device data file

Checking the files again, there was an on xydevice.xls file, and I wondered if the relative path from exe to xls did not work, or if it was an issue with permissions. The latter was hard to believe, given that I simply copied the whole bunch of files, my user having the same (full) permissions on all of them.

I started Microsoft Sysinternals Process Monitor to check if the application was groping in vain for the file. It found the file just fine in the right location:

Immediately before accessing the file, the application looped through registry entries for Microsoft JET database drivers for Office files – the last one it probed was msexcl40.dll – a  database driver for accessing Excel files.

There is no obvious error in this dump: The xls file was closed before the Windows error popup was brought up; so the application had handled the error somehow.

I had been tinkering a lot myself with database drivers for Excel spreadsheets, Access databases, and even text files – so that looked like a familiar engineering software hack to me 🙂 On start-up the application created a bunch of XML files – I saw them once, right after I saw the GUI once in that non-reproducible test. As far as I could decipher the content in the foreign language, the entries were taken from that problematic xls file which contained a formatted table. It seemed that the application was using a sheet in the xls file as a database table.

What went wrong? I started Windows debugger WinDbg (part of the Debugging tools for Windows). I tried to go the next unhandled or handled exception, and I saw again that it stumbled over msexec40.dll:

But here was finally a complete and googleable error message in nerd speak:

Unexpected error from external database driver (1).

This sounded generic and I was not very optimistic. But this recent Microsoft article was one of the few mentioning the specific error message – an overview of operating system updates and fixes, dated October 2017. It describes exactly the observed issue with using the JET database driver to access an xls file:

Finally my curious observation of the non-reproducible single successful test made sense: When I started the exe on the Win 7 test client, this computer had been started the first time after ~3 months; it was old and slow, and it was just processing Windows Updates – so at the first run the software had worked because the deadly Windows Update had not been applied yet.

Also the ‘2007 timeframe’ mentioned was consistent – as all the application’s executable files were nearly 10 years old. The recommended strategy is to use a more modern version of the database driver, but Microsoft also states they will fix it again in a future version.

So I did not get the software to to run, as I obviously cannot fix somebody else’s compiled code – but I could provide the exact information needed by the developer to repair it.

But the key message in this post is that it was simply a lot of fun to track this down 🙂