Cloudy Troubleshooting (2)

Unrelated to part 1 – but the same genre.

Actors this time:

  • File Cloud: A cloud service for syncing and sharing files. We won’t drop a brand name, will we?
  • Client: Another user of File Cloud.
  • [Redacted]: Once known for reliability and as The Best Network.
  • Dark Platform: Wannabe hackers’ playground.
  • elkement: Somebody who sometimes just wants to be an end user, but always ends up sniffing and debugging.

There are no dialogues with human life-forms this time, only the elkement’s stream of consciousness, interacting with the others via looking at things at a screen.

elkement: Time for a challenging Sunday hack!

elkement connects to the The Dark Platform. Hardly notices anything in the real world anymore. But suddenly elkement looks at the clock – and at File Cloud’s icon next to it.

elkement: File Cloud, what’s going on?? Seems you have a hard time Connecting… for hours now? You have not even synced my hacker notes from yesterday evening?

elkement tries to avoid to look at File Cloud, but it gets too painful.

elkement: OK – let’s consider the File Cloud problem the real Sunday hacker’s challenge…

elkement walks through the imaginary checklist:

  • File Cloud mentioned on DownDetector website? No.
  • Users tweeting about outage? No.
  • Do the other cloudy apps work fine? Yes.
  • Do other web sites work fine? Yes.
  • Does my router needs its regular reboots because it’s DNS server got stuck? No.
  • Should I perhaps try the usual helpdesk recommendation? Yes. (*)

(*) elkement turns router and firewall off and on again. Does not help.

elkement gets worried about Client using File Cloud, too. Connects to Client’s network – via another cloudy app (that obviously also works).

  • Does Client has the same issues? Yes and No – Yes at one site, No at another site.

elkement: Oh no – do I have to setup a multi-dimensional test matrix again to check for weird dependencies?

Coffee Break. Leaving the hacker’s cave. Gardening.

elkement: OK, let’s try something new!

elkement connects to super shaky mobile internet via USB tethering on the smart phone.

  • Does an alternative internet connection fix File Cloud? Yes!!

elkement: Huh!? Will now again somebody explain to me that a protocol (File Cloud) is particularly sensitive to hardly noticeable network disconnects? Is it maybe really a problem with [Redacted] this time?

elkement checks out DownDetector – and there they are the angry users and red spots on the map. They mention that seemingly random websites and applications fail. And that [Redacted] is losing packets.

elkement: Really? Only packets for File Cloud?

elkement starts sniffing. Checks IP addresses.

(elkement: Great, whois does still work, despite the anticipated issues with GDPR!)

elkement spots communication with File Cloud. File Cloud client and server are stuck in a loop of misunderstandings. File Cloud client is rude and says: RST, then starts again. Says Hello. They never shake hands as a previous segment was not captured.

elkement: But why does all the other stuff work??

elkement googles harder. Indeed, some other sites might be slower – not The Dark Platform, fortunately. Now finally Google and duckduckgo stop working, too. 

elkement: I can’t hack without Google.

elkement hacks something without Google though. Managed to ignore File Cloud’s heartbreaking connection attempts.

A few hours later it’s over. File Cloud syncs hacker notes. Red spots on DownDetector start to fade out while the summer sun is setting.

~

FIN, ACK

Cloudy Troubleshooting

Actors:

  • Cloud: Service provider delivering an application over the internet.
  • Client: Business using the Cloud
  • Telco: Service provider operating part of the network infrastructure connecting them.
  • elkement: Somebody who always ends up playing intermediary.

~

Client: Cloud logs us off ever so often! We can’t work like this!

elkement: Cloud, what timeouts do you use? Client was only idle for a short break and is logged off.

Cloud: Must be something about your infrastructure – we set the timeout to 1 hour.

Client: It’s becoming worse – Cloud logs us off every few minutes even we are in the middle of working.

[elkement does a quick test. Yes, it is true.]

elkement: Cloud, what’s going on? Any known issue?

Cloud: No issue in our side. We have thousands of happy clients online. If we’d have issues, our inboxes would be on fire.

[elkement does more tests. Different computers at Client. Different logon users. Different Client offices. Different speeds of internet connections. Computers at elkement office.]

elkement: It is difficult to reproduce. It seems like it works well for some computers or some locations for some time. But Cloud – we did not have any issues of that kind in the last year. This year the troubles started.

Cloud: The timing of our app is sensitive: If network cards in your computers turn on power saving that might appear as a disconnect to us.

[elkement learns what she never wanted to know about various power saving settings. To no avail.]

Cloud: What about your bandwidth?… Well, that’s really slow. If all people in the office are using that connection we can totally understand why our app sees your users disappearing.

[elkement on a warpath: Tracking down each application eating bandwidth. Learning what she never wanted to know about tuning the background apps, tracking down processes.]

elkement: Cloud, I’ve throttled everything. I am the only person using Clients’ computers late at night, and I still encounter these issues.

Cloud: Upgrade the internet connection! Our protocol might choke on a hardly noticeable outage.

[elkement has to agree. The late-night tests were done over a remote connections; so measurement may impact results, as in quantum physics.]

Client: Telco, we buy more internet!

[Telco installs more internet, elkement measures speed. Yeah, fast!]

Client: Nothing has changed, Clouds still kicks us out every few minutes.

elkement: Cloud, I need to badger you again….

Cloud: Check the power saving settings of your firewalls, switches, routers. Again, you are the only one reporting such problems.

[The router is a blackbox operated by Telco]

elkement: Telco, does the router use any power saving features? Could you turn that off?

Telco: No we don’t use any power saving at all.

[elkement dreams up conspiracy theories: Sometimes performance seems to degrade after business hours. Cloud running backup jobs? Telco’s lines clogged by private users streaming movies? But sometimes it’s working well even in the location with the crappiest internet connection.]

elkement: Telco, we see this weird issue. It’s either Cloud, Client’s infrastructure, or anything in between, e.g. you. Any known issues?

Telco: No, but [proposal of test that would be difficult to do]. Or send us a Wireshark trace.

elkement: … which is what I planned to do anyway…

[elkement on a warpath 2: Sniffing, tracing every process. Turning off all background stuff. Looking at every packet in the trace. Getting to the level where there are no other packets in between the stream of messages between Client’s computers and Cloud’s servers.]

elkement: Cloud, I tracked it down. This is not a timeout. Look at the trace: Server and client communicating nicely, textbook three-way handshake, server says FIN! And no other packet in the way!

Cloud: Try to connect to a specific server of us.

[elkement: Conspiracy theory about load balancers]

elkement: No – erratic as ever. Sometimes we are logged off, sometimes it works with crappy internet. Note that Client could work during vacation last summer with supper shaky wireless connections.

[Lots of small changes and tests by elkement and Cloud. No solution yet, but the collaboration is seamless. No politics and finger-pointing who to blame – just work. The thing that keeps you happy as a netadmin / sysadmin in stressful times.]

elkement: Client, there is another interface which has less features. I am going to test it…

[elkement: Conspiracy theory about protocols. More night-time testing].

elkement: Client, Other Interface has the same problems.

[elkement on a warpath 3: Testing again with all possible combinations of computers, clients, locations, internet connections. Suddenly a pattern emerges…]

elkement: I see something!! Cloud, I believe it’s user-dependent. Users X and Y are logged off all the time while A and B aren’t.

[elkement scratches head: Why was this so difficult to see? Tests were not that unambiguous until now!]

Cloud: We’ve created a replacement user – please test.

elkement: Yes – New User works reliably all the time! 🙂

Client: It works –  we are not thrown off in the middle of work anymore!

Cloud: Seems that something about the user on our servers is broken – never happened before…

elkement: But wait 😦 it’s not totally OK: Now logged off after 15 minutes of inactivity? But never mind – at least not as bad as logged off every 2 minutes in the middle of some work.

Cloud: Yeah, that could happen – an issue with Add-On Product. But only if your app looks idle to our servers!

elkement: But didn’t you tell us that every timeout ever is no less than 1 hour?

Cloud: No – that 1 hour was another timeout …

elkement: Wow – classic misunderstanding! That’s why it is was so difficult to spot the pattern. So we had two completely different problems, but both looked like unwanted logoffs after a brief period, and at the beginning both weren’t totally reproducible.

[elkement’s theory validated again: If anything qualifies elkement for such stuff at all it was experience in the applied physics lab – tracking down the impact of temperature, pressure and 1000 other parameters on the electrical properties of superconductors… and trying to tell artifacts from reproducible behavior.]

~

Cloudy

Theory and Practice of Trying to Combine Physics with Anything

You have told me, you miss my physics posts. I have missed them, too, and I give it a try. But I cannot help turning this into a cross-over again, smashing together half-digested psychology, physics, IT networking, and badly hidden autobiographical anecdotes.

In 2005 I did research on the incorporation of physics-style thinking and mathematical models into non-science disciplines. Actually, it was a small contribution to an interdisciplinary research project, and I have / should have covered science-y ideas related to how revolutionary new ideas percolate society.

In retrospect, my resulting (German) paper was something in between science writing, thorough research including differential equations in detail – and some bold assumptions, partly inspired by popular science, cliché and science fiction. Probably like my posts, but more long-winded and minus the very obvious rants.

I built on my work in laser-materials processing, superconductivity, phase transitions, and I tried to relate chaos in thermodynamic systems and instabilities in fluids with related non-predictable diffusion of ideas.

HD-Rayleigh-Taylor

Simulation of Rayleigh-Taylor instabilities at the interface of fluids with different densities. You could probably test this with Caffe Latte.

I learned that there is a discipline called Networking Theory:

Many networked structures obey very similar rules. Networks of WWW hyperlinks, citations scientific papers, food chains, and airline networks are called scale-free networks, because the distribution function for the number of links follows a power law.

A small number of nodes has a high number of connections and the structure the networks appears the same on every scale applied – it is self-similar. The power law is only valid for ever growing networks.

Barabasi Albert 1000nodes

Network following a power-law distribution of connections. The backbone of the network is established by a few strong, well-connected nodes, and the vast majority of nodes has only a few connections.

The dynamics of such networks could be modeled using the same math as esoteric Bose-Einstein condensation, which allowed me to combine anything and relate networks and the quantum phenomena in superconductors.

But the basic idea is really simply: The more popular nodes attract more links. This is a winner-take-all model.

Companies have started monetizing network research by analyzing and modelling hidden structures and unveiling the the fabric underlying politics and economy.

Re-visiting that old article of mine I spot an application of physics in something-else-dynamics I have missed: One of the classical non-academic jobs for a (theoretical) physicist is Wall Street quantitative analyst or quant. Quants apply models taken from thermodynamics, such as diffusion in supernovas, to the finance world.

I would put The Physics of Wall Street – A Brief History of Predicting the Unpredictable on my Books-to-Read List if it would be available on Kindle, as I enjoyed this review:

The author, James Owen Weatherall is

an assistant professor of logic and philosophy of science at the University of California, Irvine, has two Ph.D.’s — one in physics and mathematics, and one in philosophy.

The book gives an overview of different models that resemble physics or are borrowed from physics – such as the Black-Scholes model that uses Brownian motion to model the dynamic development of prices of derivative financial products. Don’t ask me for details – I am just dropping keywords here.

The book seems to be based on optimistic assumptions:

Weatherall wants a new Manhattan Project to determine what’s wrong with economics, and he thinks it should be based in no small part on the contributions of physics-oriented economists, some of whom he believes have been treated unfairly by the establishment.

Here it is getting very interesting:

He has little use for Nassim Taleb, whose best-­selling book “The Black Swan” argues that the models used by traders disastrously underestimated the possibility of very negative outcomes — the black swans. To say that a model failed, Weatherall contends, is not to say that no models can work. “We use mathematical models cut from the same cloth to build bridges and to design airplane engines, to plan the electric grid and to launch spacecraft,” 

… as I am currently reading Nassim Taleb’s The Black Swan

In my outdated review article I finally came to the conclusion that some aspects of seemingly complicated systems – including those based on human beings – can be modeled using models of a baffling simplicity in relation to the alleged complexity of human nature. I am not ashamed of pointing out this glaring contradiction with my recent posts on gamification.

I would hailed Weatherall’s book and thanked him for contributing to my confirmation bias.

But Taleb speaks to me – in particular his chapter about Ludic Fallacy.

I do enjoy the clichéd characters of Fat Tony, the intuitive deal maker who hacks the real world, versus Dr. John, the nerdy engineering PhD who is fond of building mathematical models.

Taleb says:

Have you ever wondered why so many of these straight-A students end up going nowhere in life while someone who lagged behind is now getting the shekels, buying the diamonds, and getting his phone calls returned? Or even getting the Nobel Prize in a real discipline (say, medicine)

I took all my self-irony pills in order to recover. How could I not remember my indulgence in this diagram proving the braininess / nerdiness of physicists (and philosophers) – and my straight As of course. Did I mention that I am not a high-powered executive today or an accomplished professor? So it is Dr. Jane speaking here.

How could I not remember those enlightening anecdotes in David Goleman’s pop-psychology bestseller on EQ – emotional intelligence, first published in 1996. I enjoyed the story of two equally gifted students of mathematics, one becoming a rock star scientist, the other one becoming a mere computer consultant. I have read this book in German, so I will not give you a verbatim quote translated back to English. Actually, Goleman said something like: He pretended / claimed to be happy as a computer consultant. It says a more about me than about Goleman that I can quote this from memory without touching the book. I could say a lot of things about the notion of pretense here, but I will not repeat my most recent loosely related rant.

Goleman and Taleb both agree on the overarching role of intuition, thinking outside-the-box, gut feeling or whatever you call this. Luckily, Taleb is not concerned so much with proving which part of the brain is responsible for what because this is the part of pop-psy books I find incredibly boring. Nobody in his right mind would disagree (with the fact that interpersonal skills are important, not with my judgement of pop-psy books).

Even I tend so say, my modest successes in Mediocristan are largely due to my social skills whereas technical skills are needed to meet the minimum bar. Mediocristan is Taleb’s world of achievements limited by natural boundaries, such as: You will not get rich by being paid on time and material. You might get rich in Extremistan, as a best selling author or musician, but you have to deal with the extremely low probability of such a Black Swan of a success.

I am trying my hands at Occam’s Razor now and attempt to sort out this contradictions.

I believe that mathematical models of society make sense, and I do so without having read more propaganda by econo-physicists. I do so even if I will go on ranting about physicists that went into finance and caused a global crisis, because they just wanted to play with nice physics (as we said at the university) – ignoring that there is more at stake than your next research grant or paper.

Models of society and networks make sense if and only if we try to determine a gross statistical property of an enormous system. This is perfect science based on numbers that are only defined in terms of statistics – such as temperature in thermodynamics.

Malcom Gladwell is a master story teller in providing some convincing examples that proves that sometimes it only context that matters and that turns us into automata. For example subjects – who were not informed about the experimental setup – were inquired about their ethical standards. Would you help the poor? Of course they would. Then the experimental (gamified!) setup urged the subjects to hurry to another location, under some pretext. On their way, they were confronted with (fake) poor persons in need. The majority of persons did not help the poor, not missing the next fake meeting was the top priority. Gladwell’s conclusion is that context very often matters more – and in a simple and predictable – than all our sophisticated ethical constructs.

This is probably similar to our predictability as social networking animals, that is: clicking, liking and sharing automata.

People in a stadium clapping their hands will synchronize, in a way similar to fireflies synchronizing their blinking. You can build very simple models and demonstrate them using electrically connected light bulbs equipped with trigger logics – and those bulbs will synchronize after a few cycles.

Enthusiasm ends here.

I believe that using and validating those reliable models we learn something about society that is not exactly ground-breaking.

We can model the winner-take-all behavior of successful blogs to whom all the readers gravitate by Bose-Einstein condensation. But so what? What exactly did science tell us that we did not know before and considered trivial everyday wisdom?

In particular, we learn nothing that would help us, as individual nodes in these networks, to cope with the randomness we are exposed to if we aimed at success in Extremistan.

Mr. Taleb, keep preaching on!

However, I still need to wrap my head around the synthesis of:

  • not falling for the narrative fallacy, denarrating, and ignoring TV and blogs.
  • but yet: focusing on the control of my decisions and trying to grasp the abstract concepts of probability in every moment.
Black Swan

Black Swan (Wikimedia). I wanted to embed an image of Nathalie Portman in Black Swan ballet dancer’s costume, but I did not find a public domain image quickly, and I am not bold enough to do so without cross-checking copyright issues.

___________________________________________________

Further reading – two related popular science books I had enjoyed in 2005:
Linked: The New Science Of Networks
The Tipping Point: How Little Things Can Make a Big Difference