Bots, Like This! I am an Ardent Fan of HTTPS and Certificates!

This is an experiment in Machine Learning, Big Data, Artificial Intelligence, whatever.

But I need proper digression first.

Last autumn, I turned my back on social media and went offline for a few days.

There, in that magical place, the real world was offline as well. A history of physics museum had to be opened, just for us.

The sign says: Please call XY and we open immediately.

Scientific instruments of the past have a strange appeal, steampunk-y, artisanal, timeless. But I could not have enjoyed it, hadn’t I locked down the gates of my social media fortresses before.

Last year’ improved’ bots and spammers seem to have invaded WordPress. Did their vigilant spam filters feel a disturbance of the force? My blog had been open for anonymous comments since more than 5 years, but I finally had to restrict access. Since last year every commentator needs to have one manually approved comment.

But how to get attention if I block the comments? Spam your links by Liking other blogs. Anticipate that clickers will be very dedicated: Clicking on your icon only takes the viewer to your gravatar profile. The gravatar shows a link to the actual spammy website.

And how to pick suitable – likeable – target blog posts? Use your sophisticated artificial intelligence: If you want to sell SSL certificates (!) pick articles that contain key words like SSL or domain – like this one. BTW, I take the ads for acne treatment personally. Please stick to marketing SSL certificates. Especially in the era of free certificates provided by Let’s Encrypt.

Please use a different image for your different gravatars. You have done rather well when spam-liking the post on my domains and HTTPS, but what was on your mind when you found my post on hijacking orphaned domains for malvertizing?

Did statements like this attract the army of bots?

… some of the pages contain links to other websites that advertize products in a spammy way.

So what do I need to do to make you all like this post? Should I tell you that have a bunch of internet domains? That I migrated my non-blogs to HTTPS last year? That WordPress migrated blogs to HTTPS some time ago? That they use Let’s Encrypt certificates now, just as the hosting provider of my other websites does?

[Perhaps I should quote ‘SSL’ and ‘TLS’, too.]

Or should I tell you that I once made a fool of myself for publishing my conspiracy theories – about how Google ditched my blog from their index? While I actually had missed that you need to add the HTTPS version as a separate item in Google Webmaster Tools?

So I despearately need help with Search Engine Optimization and Online Marketing. Google shows me ads for their free online marketing courses on Facebook all the time now.

Or I need help with HTTPS (TLS/SSL) – embarrassing, as for many years I did nothing else than implementing Public Key Infrastructures and troubleshooting certificates? I am still debugging of all kinds weird certificate chaining and browser issues. The internet is always a little bit broken, says Sir Tim Berners-Lee.

[Is X.509 certificate a good search term? No, too nerdy, I guess.]

Or maybe you are more interested in my pioneering Search Term Poetry and Spam Poetry.  I need new raw material.

Like this! Like this! Like this!

Maybe I am going to even approve a comment and talk to you. It would not be the first time I fail the Turing test on this blog.

Don’t let me down, bots! I count on you!

Update 2018-02-13: So far, this post was a success. The elkemental blog has not seen this many likes in years.… and right now I noticed that the omnipresent suit bot also started to market solar energy and to like my related posts!

Update 2018-02-18: They have not given up yet – we welcome another batch of bots!

bots-welcome-experiment-success-2

The Orphaned Internet Domain Risk

I have clicked on company websites of social media acquaintances, and something is not right: Slight errors in formatting, encoding errors for special German characters.

Then I notice that some of the pages contain links to other websites that advertize products in a spammy way. However, the links to the spammy sites are embedded in this alleged company websites in a subtle way: Using the (nearly) correct layout, or  embedding the link in a ‘news article’ that also contains legit product information – content really related to the internet domain I am visiting.

Looking up whois information tells me that these internet domain are not owned by my friends anymore – consistent with what they actually say on the social media profiles. So how come that they ‘have given’ their former domains to spammers? They did not, and they didn’t need to: Spammers simply need to watch out for expired domains, seize them when they are available – and then reconstruct the former legit content from public archives, and interleave it with their spammy messages.

The former content of legitimate sites is often available on the web archive. Here is the timeline of one of the sites I checked:

Clicking on the details shows:

  • Last display of legit content in 2008.
  • In 2012 and 2013 a generic message from the hosting provider was displayed: This site has been registered by one of our clients
  • After that we see mainly 403 Forbidden errors – so the spammers don’t want their site to be archived – but at one time a screen capture of the spammy site had been taken.

The new site shows the name of the former owner at the bottom but an unobtrusive link had been added, indicating the new owner – a US-based marketing and SEO consultancy.

So my take away is: If you ever feel like decluttering your websites and free yourself of your useless digital possessions – and possibly also social media accounts, think twice: As soon as your domain or name is available, somebody might take it, and re-use and exploit your former content and possibly your former reputation for promoting their spammy stuff in a shady way.

This happened a while ago, but I know now it can get much worse: Why only distribute marketing spam if you can distribute malware through channels still considered trusted? In this blog post Malwarebytes raises the question if such practices are illegal or not – it seems that question is not straight-forward to answer.

Visitors do not even have to visit the abandoned domain explicitly to get hacked by malware served. I have seen some reports of abandoned embedded plug-ins turned into malicious zombies. Silly example: If you embed your latest tweets, Twitter goes out-of-business, and its domains are seized by spammers – you Follow Me icon might help to spread malware.

If a legit site runs third-party code, they need to trust the authors of this code. For example, Equifax’ website recently served spyware:

… the problem stemmed from a “third-party vendor that Equifax uses to collect website performance data,” and that “the vendor’s code running on an Equifax Web site was serving malicious content.”

So if you run any plug-ins, embedded widgets or the like – better check out regularly if the originating domain is still run by the expected owner – monitor your vendors often; and don’t run code you do not absolutely need in the first place. Don’t use embedded active badges if a simple link to your profile would do.

Do a painful boring inventory and assessment often – then you will notice how much work it is to manage these ‘partners’ and rather stay away from signing up and registering for too much services.

Update 2017-10-25: And as we speak, we learn about another example – snatching a domain used for a Dell backup software, preinstalled on PCs.

Other People Have Lives – I Have Domains

These are just some boring update notifications from the elkemental Webiverse.

The elkement blog has recently celebrated its fifth anniversary, and the punktwissen blog will turn five in December. Time to celebrate this – with new domain names that says exactly what these sites are – the ‘elkement.blog‘ and the ‘punktwissen.blog‘.

Actually, I wanted to get rid of the ads on both blogs, and with the upgrade came a free domain. WordPress has a detailed cookie policy – and I am showing it dutifully using the respective widget, but they have to defer to their partners when it comes to third-party cookies. I only want to worry about research cookies set by Twitter and Facebook, but not by ad providers, and I am also considering to remove social media sharing buttons and the embedded tweets. (Yes, I am thinking about this!)

On the websites under my control I went full dinosaur, and the server sends only non-interactive HTML pages sent to the client, not requiring any client-side activity. I now got rid of the last half-hearted usage of a session object and the respective cookie, and I have never used any social media buttons or other tracking.

So there are no login data or cookies to protect, but yet I finally migrated all sites to HTTPS.

It is a matter of principle: I of all website owners should use https. Since 15 years I have been planning and building Public Key Infrastructures and troubleshooting X.509 certificates.

But of course I fear Google’s verdict: They have announced long ago to HTTPS is considered a positive ranking by its search engine. Pages not using HTTPS will be tagged as insecure using more and more terrifying icons – e.g. http-only pages with login buttons already display a striked-through padlock in Firefox. In the past years I migrated a lot of PKIs from SHA1 to SHA256 to fight the first wave of Insecure icons.

Finally Let’s Encrypt has started a revolution: Free SSL certificates, based on domain validation only. My hosting provider uses a solution based on Let’s Encrypt – using a reverse proxy that does the actual HTTPS. I only had to re-target all my DNS records to the reverse proxy – it would have been very easy would it not have been for all my already existing URL rewriting and tweaking and redirecting. I also wanted to keep the option of still using HTTP in the future for tests and special scenario (like hosting a revocation list), so I decided on redirecting myself in the application(s) instead of using the offered automated redirect. But a code review and clean-up now and then can never hurt 🙂 For large complex sites the migration to HTTPS is anything but easy.

In case I ever forget which domains and host names I use, I just need to check out this list of Subject Alternative Names again:

(And I have another certificate for the ‘test’ host names that I need for testing the sites themselves and also for testing various redirects ;-))

WordPress.com also uses Let’s Encrypt (Automattic is a sponsor), and the SAN elkement.blog is lumped together with several other blog names, allegedly the ones which needed new certificates at about the same time.

It will be interesting what the consequences for phishing websites will be. Malicious websites will look trusted as being issued certificates automatically, but revoking a certificate might provide another method for invalidating a malicious website.

Anyway, special thanks to the WordPress.com Happiness Engineers and support staff at my hosting provider Puaschitz IT. Despite all the nerdiness displayed on this blog I prefer hosted / ‘shared’ solutions when it comes to my own websites because I totally like it when somebody else has to patch the server and deal with attacks. I am an annoying client – with all kinds of special needs and questions – thanks for the great support! 🙂

Where to Find What?

I have confessed on this blog that I have Mr. Monk DVDs for a reason. We like to categorize, tag, painstakingly re-organize, and re-use. This is reflected in our Innovations in Agriculture …

The Seedbank: Left-over squared timber met the chopsaw.

The Nursery: Rebirth of copper tubes and newspapers.

… as well as in my periodical Raking The Virtual Zen Garden: Updating collections of web resources, especially those related to the heat pump system.

Here is a list of lists, sorted by increasing order of compactification:

But thanks to algorithms, we get helpful advice on presentation from social media platforms: Facebook, for example, encouraged me to tag products in the following photo, so here we go:

“Hand-crafted, artisanal, mobile nursery from recycled metal and wood, for holding biodegradable nursery pots.” Produced without crowd-funding and not submitted to contests concerned with The Intersection of Science, Art, and Innovation.

The Stages of Blogging – an Empirical Study

… with sample size 1.

Last year, at the 4-years anniversary, I presented a quantitative analysis – in line with the editorial policy I had silently established: My blogging had turned from quasi-philosophical ramblings on science, work, and life to no-nonsense number crunching.

But the comment threads on my recent posts exhibit my subconsciousness spilling over. So at this anniversary, I give myself permission to incoherent reminiscences. I have even amended the tagline with this blog’s historical title:

Theory and Practice of Trying to Combine Just Anything.

Anecdotal evidence shows that many people start a blog, or another blog, when they are in a personal or professional transition. I had been there before: My first outburst of online writing on my personal websites predated quitting my corporate job and starting our business. The creative well ran dry, after I had taken the decision and had taken action – in the aftermath of that legendary journey.

I resurrected the old websites and I started this blog when I was in a professional no-man’s-land: Having officially left IT security, still struggling with saying No to project requests, working on our pilot heat pump system in stealth mode, and having enrolled in another degree program in renewable energies.

The pseudonymous phase: Trying out the new platform, not yet adding much About Me information. Playing. In the old times, I had a separate domain with proper name for that (subversiv.at). This WordPress blog was again a new blank sheet of paper, and I took the other sites offline temporarily, to celebrate this moment.

The discovery of a new community: The WordPress community was distinct from all other professional communities and social circles I was part of. It seems that new bloggers always flock together in groups, perhaps WordPress’ algorithms facilitate that. I participated with glee in silly blogging award ceremonies. However, I missed my old communities, and I even joined Facebook to re-unite with some of them. Living in separate worlds, sometimes colliding in unexpected ways, was intriguing.

Echoes of the past: I write about Difficult Things That I Handled In the Past – despite or because I have resolved those issues long before. This makes all my Life / Work / Everything collections a bit negative and gloomy. I blogged about my leaving academia, and my mixed memories of being part of The Corporate World. It is especially the difficult topics that let me play with geeky humor and twisted sarcasm.

The self-referential aspect: Online writing has always been an interesting experiment: Writing about technology and life, but also using technology. As philosophers of the web have pointed out, the internet or the medium in general modifies the message. I play with websites’ structure and layout, and I watch how my online content is impacted by seemingly cosmetic details of presentation.

Series of posts – find our favorite topic: I’ve never participated in blogging challenges, like one article a day. But I can understand that such blogging goals help to keep going. I ran a series on quantum field theory, but of course my expertise was Weird Internet Poetry … yet another demonstration of self-referentiality.

The unexpected positive consequences of weird websites – perhaps called ‘authentic’ today. They are a first class filter. Only people who share your sense of humor with contact you – and sense of humor is the single best criterion to find out if you will work well with somebody.

Writing about other people’s Big Ideas versus your own quaint microcosmos. I have written book reviews, and featured my favorite thinkersideas. I focussed on those fields in physics that are most popular (in popular science). My blog’s views had their all-time-high. But there are thousands of people writing about those Big Things. Whatever you are going to write about, there is one writer who cannot only write better, but who is also more of a subject matter expert, like a scientist working also as a science writer. This is an aspect of my empirical rule about your life being cliché. The remaining uncharted territory was my own small corner of the world.

Skin in the Game versus fence-sitting. Lots of people have opinions on many things on the internet. The preferred publication is a link to an article plus a one-liner of an opinion. Some people might really know something about the things they have opinions on. A minority has Skin in the Game, that is: Will feel the consequences of being wrong, personally and financially. I decided to focus on blogging about topics that fulfill these criteria: I have 1) related education and theoretical knowledge, 2) practical hands-on experience, 3) Skin in the Game. Priorities in reverse order.

The revolutionary experiment: Blogging without the motivational trigger of upcoming change. Now I have lacked the primary blogging impulse for a while. I am contented and combine anything in practice since a while. But I don’t have to explain anything to anybody anymore – including myself. I resorted to playing with data – harping on engineering details. I turn technical questions I get into articles, and I spend a lot of time on ‘curating’: creating list of links and overview pages. I have developed the software for my personal websites from scratch, and turned from creating content to structure for a while.

Leaving your comfort zone: I do edit, re-write, and scrutinize blog postings here relentlessly. I delete more content again than I finally publish, and I – as a text-only Courier New person – spend considerable time on illustrations. This is as much as I want to leave my comfort zone, and it is another ongoing experiment – just as the original stream-of-consciousness writing was.

But perhaps I will write a post like this one now and then.

Pine trees in Tenerife.

Anniversary 4 (4 Me): “Life Ends Despite Increasing Energy”

I published my first post on this blog on March 24, 2012. Back then its title and tagline were:

Theory and Practice of Trying to Combine Just Anything
Physics versus engineering
off-the-wall geek humor versus existential questions
IT versus the real thing
corporate world’s strangeness versus small business entrepreneur’s microcosmos knowledge worker’s connectedness versus striving for independence

… which became

Theory and Practice of Trying to Combine Just Anything
I mean it

… which became

elkemental Force
Research Notes on Energy, Software, Life, the Universe, and Everything

last November. It seems I have run out of philosophical ideas and said anything I had to say about Life and Work and Culture. Now it’s not Big Ideas that make me publish a new post but my small Big Data. Recent posts on measurement data analysis or on the differential equation of heat transport  are typical for my new editorial policy.

Cartoonist Scott Adams (of Dilbert fame) encourages to look for patterns in one’s life, rather than to interpret and theorize – and to be fooled by biases and fallacies. Following this advice and my new policy, I celebrate my 4th blogging anniversary by crunching this blog’s numbers.

No, this does not mean I will show off the humbling statistics of views provided by WordPress 🙂 I am rather interested in my own evolution as a blogger. Having raked my virtual Zen garden two years ago I have manually maintained lists of posts in each main category – these are my menu pages. Now I have processed each page’s HTML code automatically to count posts published per month, quarter, or year in each category. All figures in this post are based on all posts excluding reblogs and the current post.

Since I assigned two categories to some posts, I had to pick one primary category to make the height of one column reflect the total posts per month:Statistics on blog postings: Posts per month in each main category

It seems I had too much time in May 2013. Perhaps I needed creative compensation – indulging in Poetry and pop culture (Web), and – as back then I was writing a master thesis.

I had never missed a single month, but there were two summer breaks in 2012 and 2013 with only 1 post per month. It seems Life and Web gradually have been replaced by Energy, and there was a flash of IT in 2014 which I correlate with both nostalgia but also a professional flashback owing to lots of cryptography-induced deadlines.

But I find it hard to see a trend, and I am not sure about the distortion I made by picking one category.

So I rather group by quarter:

Statistics on blog postings: Posts per quarter in each main category

… which shows that posts per quarter have reached a low right now in Q1 2016, even when I would add the current posting. Most posts now are based on original calculations or data analysis which take more time to create than search term poetry or my autobiographical vignettes. But maybe my anecdotes and opinionated posts had just been easy to write as I was drawing on ‘content’ I had in mind for years before 2012.

In order to spot my ‘paradigm shifts’ I include duplicates in the next diagram: Each post assigned to two categories is counted twice. Since then the total number does not make sense I just depict relative category counts per quarter:

Statistics on blog postings: Posts per quarter in each category, including the assignment of more than one category.

Ultimate wisdom: Life ends, although Energy is increasing. IT is increasing, too, and was just hidden in the other diagram: Recently it is  often the secondary category in posts about energy systems’ data logging. Physics follows an erratic pattern. Quantum Field Theory was accountable for the maximum at the end of 2013, but then replaced by thermodynamics.

Web is also somewhat constant, but the list of posts shows that the most recent Web posts are on average more technical and less about Web and Culture and Everything. There are exceptions.

Those trends are also visible in yearly overviews. The Decline Of Web seems to be more pronounced – so I tag this post with Web.

Statistics on blog postings: Posts per year in each main category

Statistics on blog postings: Posts per year in each category, including the assignment of more than one category.

But perhaps I was cheating. Each category was not as stable as the labels in the diagrams’ legends do imply.

Shortcut categories refer to
1) these category pages: EnergyITLifePhysicsPoetryWeb,
2) and these categories EnergyITLifePhysicsPoetryWeb, respectively, manually kept in sync.

So somehow…

public-key-infrastructure became control-and-it

and

on-writing-blogging-and-indulging-in-web-culture is now simply web

… and should maybe be called nerdy-web-stuff-and-software-development.

In summary, I like my statistics as it confirms my hunches but there is one exception: There was no Poetry in Q1 2016 and I have to do something about this!

________________________________

The Making Of

  • Copy the HTML content of each page with a list to a text editor (I use Notepad2).
  • Find double line breaks (\r\n\r\n) and replace them by a single one (\r\n).
  • Copy the lines to an application that lets you manipulate strings (I use Excel).
  • Tweak strings with formulas / command to cut out date, url, title and comment. Use the HTML tags as markers.
  • Batch-add the page’s category in a new column.
  • Indicate if this is the primary or secondary category in a new column (Find duplicates automatically before so 1 can be assigned automatically to most posts.).
  • Group the list by month, quarter, and year respectively and add the counts to new data tables that will be used for diagrams (e.g. Excel function COUNTIFs, using only the category or category name  + indicator for the primary category as criteria).

It could be automated even better – without having to maintain category pages by simply using the category feeds (like this: https://elkement.wordpress.com/category/physics/feed) or by filtering the full blog feed for categories. I have re-categorized all my posts so that categories matches menu page lists, but I chose to use my lists as

  1. I get not only date and headline, but also my own additional summary / comment that’s not part of the feed. For our German blog, I actually do this in reverse: I create the HTML code of a a sitemap-style overview page on wordpress.com from an Excel list of all posts plus custom comments and then copy the auto-generated code to the HTML view of the respective menu page on the blog.
  2. the feed provided by WordPress.com can have 150 items maximum no matter which higher number you try to configure. So you need to start analyzing before you have published 150 posts.
  3. I can never resist to create a tool that manipulates text files and automates something, however weird.

Shortest Post Ever

… self-indulgent though, but just to add an update on the previous post.

My new personal website is  live:

elkement.subversiv.at

I have already redirected the root URLs of the precursor sites radices.net, subversiv.at and e-stangl.at. Now I am waiting for Google’s final verdict; then I am going to add the rewrite map for the 1:n map of old ASP files and new ‘posts’. This is also the pre-requisite for informing Google about the move officially.

The blog-like structure and standardized attributes like Open Graph meta tags and a XML sitemap should make my site more Google-likeable. With the new site – and one dedicated host name only – I finally added permanent redirects (HTTP 301). Before I used temporary (HTTP 302) redirects, to send requests from the root directory to subfolders, which (so the experts say) is not search-engine-friendly.

On the other hand the .at domain will not help: You can pick a certain country as preferred audience for a non-country domain, but I have to stick with Austria here, even if the language is set to English in all the proper places (I hope).

I have discovered that every WordPress.com Tag or Category has its own feed – just add /feed/ to the respective URLs – and I will make use this in order to automate some of my link curation, like this. This list of physics postings has been created from this feed of selected postings:
https://elkement.wordpress.com/category/science-and-technology/physics/feed/
Of course this means re-tagging and re-categorizing here! Thanks WordPress for the Tags to Categories (and vice versa) Conversion Tools!

It is fun to watch my server’s log files more closely. Otherwise I would have missed that SQL injection attack attempt, trying to put spammy links on my website (into my database):

SQL injection by spammer-hackers