# Spheres in a Space with Trillions of Dimensions

I don’t venture into speculative science writing – this is just about classical statistical mechanics; actually about a special mathematical aspect. It was one of the things I found particularly intriguing in my first encounters with statistical mechanics and thermodynamics a long time ago – a curious feature of volumes.

I was mulling upon how to ‘briefly motivate’ the calculation below in a comprehensible way, a task I might have failed at years ago already, when I tried to use illustrations and metaphors (Here and here). When introducing the ‘kinetic theory’ in thermodynamics often the pressure of an ideal gas is calculated first, by considering averages over momenta transferred from particles hitting the wall of a container. This is rather easy to understand but still sort of an intermediate view – between phenomenological thermodynamics that does not explain the microscopic origin of properties like energy, and ‘true’ statistical mechanics. The latter makes use of a phase space with with dimensions the number of particles. One cubic meter of gas contains ~1025 molecules. Each possible state of the system is depicted as a point in so-called phase space: A point in this abstract space represents one possible system state. For each (point-like) particle 6 numbers are added to a gigantic vector – 3 for its position and 3 for its momentum (mass times velocity), so the space has ~6 x 1025 dimensions. Thermodynamic properties are averages taken over the state of one system watched for a long time or over a lot of ‘comparable’ systems starting from different initial conditions. At the heart of statistical mechanics are distributions functions that describe how a set of systems described by such gigantic vectors evolves. This function is like a density of an incompressible fluid in hydrodynamics. I resorted to using the metaphor of a jelly in hyperspace before.

Taking averages means to multiply the ‘mechanical’ property by the density function and integrate it over the space where these functions live. The volume of interest is a  generalized N-ball defined as the volume within a generalized sphere. A ‘sphere’ is the surface of all points in a certain distance (‘radius’ R) from an origin

$x_1^2 + x_2^2 + ... + x_ {N}^2 = R^2$

($x_n$ being the co-ordinates in phase space and assuming that all co-ordinates of the origin are zero). Why a sphere? Because states are ordered or defined by energy, and larger energy means a greater ‘radius’ in phase space. It’s all about rounded surfaces enclosing each other. The simplest example for this is the ellipse of the phase diagram of the harmonic oscillator – more energy means a larger amplitude and a larger maximum velocity.

And here is finally the curious fact I actually want to talk about: Nearly all the volume of an N-ball with so many dimensions is concentrated in an extremely thin shell beneath its surface. Then an integral over a thin shell can be extended over the full volume of the sphere without adding much, while making integration simpler.

This can be seen immediately from plotting the volume of a sphere over radius: The volume of an N-ball is always equal to some numerical factor, times the radius to the power of the number of dimensions. In three dimensions the volume is the traditional, honest volume proportional to r3, in two dimensions the ‘ball’ is a circle, and its ‘volume’ is its area. In a realistic thermodynamic system, the volume is then proportional to rN with a very large N.

The power function rN turn more and more into an L-shaped function with increasing exponent N. The volume increases enormously just by adding a small additional layer to the ball. In order to compare the function for different exponents, both ‘radius’ and ‘volume’ are shown in relation to the respective maximum value, R and RN.

The interesting layer ‘with all the volume’ is certainly much smaller than the radius R, but of course it must not be too small to contain something. How thick the substantial shell has to be can be found by investigating the volume in more detail – using a ‘trick’ that is needed often in statistical mechanics: Taylor expanding in the exponent.

A function can be replaced by its tangent if it is sufficiently ‘straight’ at this point. Mathematically it means: If dx is added to the argument x, then the function at the new target is f(x + dx), which can be approximated by f(x) + [the slope df/dx] * dx. The next – higher-order term would be proportional to the curvature, the second derivation – then the function is replaced by a 2nd order polynomial. Joseph Nebus has recently published a more comprehensible and detailed post about how this works.

So the first terms of this so-called Taylor expansion are:

$f(x + dx) = f(x) + dx{\frac{df}{dx}} + {\frac{dx^2}{2}}{\frac{d^2f}{dx^2}} + ...$

If dx is small higher-order terms can be neglected.

In the curious case of the ball in hyperspace we are interested in the ‘remaining volume’ V(r – dr). This should be small compared to V(r) = arN (a being the uninteresting constant numerical factor) after we remove a layer of thickness dr with the substantial ‘bulk of the volume’.

However, trying to expand the volume V(r – dr) = a(r – dr)N, we get:

$V(r - dr) = V(r) - adrNr^{N-1} + a{\frac{dr^2}{2}}N(N-1)r^{N-2} + ...$
$= ar^N(1 - N{\frac{dr}{r}} + {\frac{N(N-1)}{2}}({\frac{dr}{r}})^2) + ...$

But this is not exactly what we want: It is finally not an expansion, a polynomial, in (the small) ratio of dr/r, but in Ndr/r, and N is enormous.

So here’s the trick: 1) Apply the definition of the natural logarithm ln:

$V(r - dr) = ae^{N\ln(r - dr)} = ae^{N\ln(r(1 - {\frac{dr}{r}}))}$
$= ae^{N(\ln(r) + ln(1 - {\frac{dr}{r}}))}$
$= ar^Ne^{\ln(1 - {\frac{dr}{r}}))} = V(r)e^{N(\ln(1 - {\frac{dr}{r}}))}$

2) Spot a function that can be safely expanded in the exponent: The natural logarithm of 1 plus something small, dr/r. So we can expand near 1: The derivative of ln(x) is 1/x (thus equal to 1/1 near x=1) and ln(1) = 0. So ln(1 – x) is about -x for small x:

$V(r - dr) = V(r)e^{N(0 - 1{\frac{dr}{r})}} \simeq V(r)e^{-N{\frac{dr}{r}}}$

3) Re-arrange fractions …

$V(r - dr) = V(r)e^{-\frac{dr}{(\frac{r}{N})}}$

This is now the remaining volume, after the thin layer dr has been removed. It is small in comparison with V(r) if the exponential function is small, thus if ${\frac{dr}{(\frac{r}{N})}}$ is large or if:

$dr \gg \frac{r}{N}$

Summarizing: The volume of the N-dimensional hyperball is contained mainly in a shell dr below the surface if the following inequalities hold:

${\frac{r}{N}} \ll dr \ll r$

The second one is needed to state that the shell is thin – and allow for expansion in the exponent, the first one is needed to make the shell thick enough so that it contains something.

This might help to ‘visualize’ a closely related non-intuitive fact about large numbers, like eN: If you multiply such a number by a factor ‘it does not get that much bigger’ in a sense – even if the factor is itself a large number:

Assuming N is about 1025  then its natural logarithm is about 58 and…

$Ne^N = e^{\ln(N)+N} = e^{58+10^{25}}$

… 58 can be neglected compared to N itself. So a multiplicative factor becomes something to be neglected in a sum!

I used a plain number – base e – deliberately as I am obsessed with units. ‘r’ in phase space would be associated with a unit incorporating lots of lengths and momenta. Note that I use the term ‘dimensions’ in two slightly different, but related ways here: One is the mathematical dimension of (an abstract) space, the other is about cross-checking the physical units in case a ‘number’ is something that can be measured – like meters. The co-ordinate  numbers in the vector refer to measurable physical quantities. Applying the definition of the logarithm just to rN would result in dimensionless number N side-by-side with something that has dimensions of a logarithm of the unit.

Using r – a number with dimensions of length – as base, it has to be expressed as a plain number, a multiple of the unit length $R_0$ (like ‘1 meter’). So comparing the original volume of the ball $a{(\frac{r}{R_0})}^N$ to one a factor of N bigger …

$aN{(\frac{r}{R_0})}^N = ae^{\ln{(N)} + N\ln{(\frac{r}{R_0})}}$

… then ln(N) can be neglected as long as $\frac{r}{R_0}$ is not extreeeemely tiny. Using the same argument as for base e above, we are on the safe side (and can neglect factors) if r is of about the same order of magnitude as the ‘unit length’ $R_0$. The argument about negligible factors is an argument about plain numbers – and those ‘don’t exist’ in the real world as one could always decide to measure the ‘radius’ in a units of, say, 10-30 ‘meters’, which would make the original absolute number small and thus the additional factor non-negligible. One might save the argument by saying that we would always use units that sort of match the typical dimensions (size) of a system.

Saying everything in another way: If the volume of a hyperball ~rN is multiplied by a factor, this corresponds to multiplying the radius r by a factor very, very close to 1 – the Nth root of the factor for the volume. Only because the number of dimensions is so large, the volume is increased so much by such a small increase in radius.

As the ‘bulk of the volume’ is contained in a thin shell, the total volume is about the product of the surface area and the thickness of the shell dr. The N-ball is bounded by a ‘sphere’ with one dimension less than the ball. Increasing the volume by a factor means that the surface area and/or the thickness have to be increased by factors so that the product of these factors yield the volume increase factor. dr scales with r, and does thus not change much – the two inequalities derived above do still hold. Most of the volume factor ‘goes into’ the factor for increasing the surface. ‘The surface becomes the volume’.

This was long-winded. My excuse: Also Richard Feynman took great pleasure in explaining the same phenomenon in different ways. In his lectures you can hear him speak to himself when he says something along the lines of: Now let’s see if we really understood this – let’s try to derive it in another way…

And above all, he says (in a lecture that is more about math than about physics)

Now you may ask, “What is mathematics doing in a physics lecture?” We have several possible excuses: first, of course, mathematics is an important tool, but that would only excuse us for giving the formula in two minutes. On the other hand, in theoretical physics we discover that all our laws can be written in mathematical form; and that this has a certain simplicity and beauty about it. So, ultimately, in order to understand nature it may be necessary to have a deeper understanding of mathematical relationships. But the real reason is that the subject is enjoyable, and although we humans cut nature up in different ways, and we have different courses in different departments, such compartmentalization is really artificial, and we should take our intellectual pleasures where we find them.

___________________________________

Further reading / sources: Any theoretical physics textbook on classical thermodynamics / statistical mechanics. I am just re-reading mine.

# You Never Know

… when obscure knowledge comes in handy!

You can dismantle an old gutter without efforts, and without any special tools:

Just by gently setting it into twisted motion, effectively applying ~1Hz torsion waves that would lead to fatigue break within a few minutes.

I knew my stint in steel research in the 1990s would finally be good for something.

If you want to create a meme from this and tag it with Work Smart Not Harder, don’t forget to give me proper credits.

# Ploughing Through Theoretical Physics Textbooks Is Therapeutic

And finally science confirms it, in a sense.

Again and again, I’ve harped on this pet theory of mine – on this blog and elsewhere on the web: At the peak of my immersion in the so-called corporate world, as a super-busy bonus miles-collecting consultant, I turned to the only solace: Getting up (even) earlier, and starting to re-read all my old mathematics and physics textbooks and lecture notes.

The effect was two-fold: It made me more detached, perhaps more Stoic when facing the seemingly urgent challenges of the accelerated world. Maybe it already prepared me for a long and gradual withdrawal from that biosphere. But surprisingly, I felt it also made my work results (even ;-)) better: I clearly remember compiling documentation I wrote after setting up some security infrastructure with a client. Writing precise documentation was again more like casting scientific research results into stone, carefully picking each term and trying to be as succinct as possible.

As anybody else I enjoy reading about psychological research that confirms my biases one-datapoint-based research – and here it finally is. Thanks to Professor Gary for sharing it. Science says that Corporate-Speak Makes You Stupid. Haven’t we – Dilbert fans – always felt that this has to be true?

… I’ve met otherwise intelligent people, after working with management consultant, are convinced that infinitely-malleable concepts like “disruptive innovation,” “business ecosystem,” and “collaborative culture” have objective value.

In my post In Praise of Textbooks with Tons of Formulas I focused on possible positive explanations, like speeding up your rational System 2 ((c) Daniel Kahneman) – by getting accustomed to mathematics again. By training yourself to recognize patterns and to think out of the box when trying to find the clever twist to solve a physics problem. Re-reading this, I cringe though: Thinking out of the box has entered the corporate vocabulary already. Disclaimer: I am talking about ways to pick a mathematical approach, by drawing on other, slightly related problems intuitively – in the way Kahneman explains the so-called intuition of experts as pattern recognition.

But perhaps the explanation is really as simple as that we just need to shield ourselves from negative effects of certain ecosystems and cultures that are particularly intrusive and mind-bending. So this is my advice to physics and math graduates: Do not rely on your infamous analytical skills forever. First, using that phrase in a job application sounds like phony hollow BS (as unfortunately any self-advertising of social skills does). Second, these skills are real, but they will decay exponentially if you don’t hone them.

# Simulating Peak Ice

This year ice in the tank was finally melted between March 5 to March 10 – as ‘visual inspection’ showed. Level sensor Mr. Bubble was confused during the melting phase; thus it was an interesting exercise to compare simulations to measurements.

Simulations use the measured ambient temperature and solar radiation as an input, data points are taken every minute. Air temperature determines the heating energy needed by the house: Simulated heat load is increasing linearly until a maximum ‘cut off’ temperature.

The control logic of the real controller (UVR1611 / UVR16x2) is mirrored in the simulation: The controller’s heating curve determines the set temperature for the heating water, and it switches the virtual 3-way valves: Diverting heating water either to the hygienic storage or the buffer tank for space heating, and including the collector in the brine circuit if air temperature is high enough compared to brine temperature. In the brine circuit, three heat exchangers are connected in series: Three temperatures at different points are determined self-consistently from three equations that use underground tank temperature, air temperature, and the heat pump evaporator’s power as input parameters.

The hydraulic schematic for reference, as displayed in the controller’s visualization (See this article for details on operations.)

The Coefficient of Performance of the heat pump, its heating power, and its electrical input power are determined by heating water temperature and brine temperature – from polynomial fit curves to vendors’ data sheet.

So for every minute, the temperatures of tanks – hot and cold – and the volume of ice can be calculated from energy balances. The heating circuits and tap water consume energy, the heat pump delivers energy. The heat exchanger in the tank releases energy or harvests energy, and the collector exchanges energy with the environment. The heat flow between tank and ground is calculated by numerically solving the Heat Equation, using the nearly constant temperature in about 10 meters depth as a boundary condition.

For validating the simulation and for fine-tuning input parameters – like the thermal properties of ground or the building – I cross-check calculated versus measured daily / monthly energies and average temperatures.

Measurements for this winter show the artificial oscillations during the melting phase because Mr. Bubble faces the cliff of ice:

Simulations show growing of ice and the evolution of the tank temperature in agreement with measurements. The melting of ice is in line with observations. The ‘plateau’ shows the oscillations that Mr. Bubble notices, but the true amplitude is smaller:

Simulated peak ice is about 0,7m3 greater than the measured value. This can be explained by my neglecting temperature gradients within water or ice in the tank:

When there is only a bit of ice yet (small peak in December), tank temperature is underestimated: In reality, the density anomaly of water causes a zone of 4°C at the bottom, below the ice.

When the ice block is more massive (end of January), I overestimate brine temperature as ice has less than 0°C, at least intermittently when the heat pump is turned on. Thus the temperature difference between ambient air and brine is underestimated, and so is the simulated energy harvested from the collector – and more energy needs to be provided by freezing water.

However, a difference in volume of less than 10% is uncritical for system’s sizing, especially if you err on the size of caution. Temperature gradients in ice and convection in water should be less critical if heat exchanger tubes traverse the volume of tank evenly – our prime design principle.

I have got questions about the efficiency of immersed heat exchangers in the tank – will heat transfer deteriorate if the layer of ice becomes too thick? No, according also to this very detailed research report on simulations of ‘ice storage heat pump systems’ (p.5). We grow so-called ‘ice on coil’ which is compared to flat-plate heat exchangers:

… for the coil, the total heat transfer (UA), accounting for the growing ice surface, shows only a small decrease with growing ice thickness. The heat transfer resistance of the growing ice layer is partially compensated by the increased heat transfer area around the coil. In the case of the flat plate, on the contrary, also the UA-value decreases rapidly with growing ice thickness.

__________________________________

For system’s configuration data see the last chapter of this documentation.

# Earth, Air, Water, and Ice.

In my attempts at Ice Storage Heat Source popularization I have been facing one big challenge: How can you – succinctly, using pictures – answer questions like:

How much energy does the collector harvest?

or

What’s the contribution of ground?

or

Why do you need a collector if the monthly performance factor just drops a bit when you turned it off during the Ice Storage Challenge?

The short answer is that the collector (if properly sized in relation to tank and heat pump) provides for about 75% of the ambient energy needed by the heat pump in an average year. Before the ‘Challenge’ in 2015 performance did not drop because the energy in the tank had been filled up to the brim by the collector before. So the collector is not a nice add-on but an essential part of the heat source. The tank is needed to buffer energy for colder periods; otherwise the system would operate like an air heat pump without any storage.

I am calling Data Kraken for help to give me more diagrams.

There are two kinds of energy balances:

1) From the volume of ice and tank temperature the energy still stored in the tank can be calculated. Our tank ‘contains’ about 2.300 kWh of energy when ‘full’. Stored energy changes …

• … because energy is extracted from the tank or released to it via the heat exchanger pipes traversing it.
• … and because heat is exchanged with the surrounding ground through the walls and the floor of the tank.

Thus the contribution of ground can be determined by:

Change of stored energy(Ice, Water) =
Energy over ribbed pipe heat exchanger + Energy exchanged with ground

2) On the other hand, three heat exchangers are serially connected in the brine circuit: The heat pump’s evaporator, the solar air collector, and the heat exchanger in the tank. .

Both of these energy balances are shown in this diagram (The direction of arrows indicates energy > 0):

The heat pump is using a combined heat source, made up of tank and collector, so …

Ambient Energy for Heat Pump = -(Collector Energy) + Tank Energy

The following diagrams show data for the season containing the Ice Storage Challenge:

From September to January more and more ambient energy is needed – but also the contribution of the collector increases! The longer the collector is on in parallel with the heat pump, the more energy can be harvested from air (as the temperature difference between air and brine is increased).

As long as there is no ice the temperature of the tank and the brine inlet temperature follow air temperature approximately. But if air temperature drops quickly (e.g. at the end of November 2014), the tank is still rather warm in relation to air and the collector cannot harvest much. Then the energy stored in the tank drops and energy starts to flow from ground to the tank.

On Jan 10 an anomalous peak in collector energy is visible: Warm winter storm Felix gave us a record harvest exceeding the energy needed by the heat pump! In addition to high ambient temperatures and convection (wind) the tank temperature remained low while energy was used for melting ice.

On February 1, we turned off the collector – and now the stored energy started to decline. Since the collector energy in February is zero, the energy transferred via the heat exchanger is equal to the ambient energy used by the heat pump. Ground provided for about 1/3 of the ambient energy. Near the end of the Ice Storage Challenge (mid of March) the contribution of ground was increasing while the contribution of latent energy became smaller and smaller: Ice hardly grew anymore, allegedly after the ice cube has ‘touched ground’.

Mid of March the collector was turned on again: Again (as during the Felix episode) harvest is high because the tank remains at 0°C. The energy stored in the tank is replenished quickly. Heat transfer with ground is rather small, and thus the heat exchanger energy is about equal to the change in energy stored.

At the beginning of May, we switched to summer mode: The collector is turned off (by the control system) to keep tank temperature at 8°C as long as possible. This temperature is a trade-off between optimizing heat pump performance and keeping some energy for passive cooling. The energy available for cooling is reduced by the slow flow of heat from ground to the tank.

# On Photovoltaic Generators and Scattering Cross Sections

Subtitle: Dimensional Analysis again.

Our photovoltaic generator has about 5 kW rated ‘peak’ power – 18 panels with 265W each.

South-east oriented part of our generator – 10 panels. The remaining 8 are oriented south-west.

Peak output power is obtained under so-called standard testing condition – 1 kWp (kilo Watt peak) is equivalent to:

• a panel temperature of 25°C (as efficiency depends on temperature)
• an incident angle of sunlight relative to zenith of about 48°C – equivalent to an air mass of 1,5. This determines the spectrum of the electromagnetic radiation.
• an irradiance of solar energy of 1kW per square meter.

Simulated spectra for different air masses (Wikimedia, User Solar Gate). For AM 1 the path of sunlight is shortest and thus absorption is lowest.

The last condition can be rephrased as: We get 1 kW output per kW/minput. 1 kWp is thus defined as:

1 kWp = 1 kW / (1 kW/m2)

Canceling kW, you end up with 1 kWp being equivalent to an area of 1 m2.

Why is this a useful unit?

Solar radiation generates electron-hole pairs in solar cells, operated as photodiodes in reverse bias. Only if the incoming photon has exactly the right energy, solar energy is used efficiently. If the photon is not energetic enough – too ‘red’ – it is lost and converted to heat. If the photon is too blue  – too ‘ultraviolet’ – it generates electrical charges, but the greater part of its energy is wasted as the probability of two photons hitting at the same time is rare. Thus commercial solar panels have an efficiency of less than 20% today. (This does not yet say anything about economics as the total incoming energy is ‘free’.)

The less efficient solar panels are, the more of them you need to obtain a certain target output power. A perfect generator would deliver 1 kW output with a size of 1 m2 at standard test conditions. The kWp rating is equivalent to the area of an ideal generator that would generate the same output power, and it helps with evaluating if your rooftop area is large enough.

Our 4,77 kW generator uses 18 panels, about 1,61 m2 each – so 29 m2 in total. Panels’ efficiency  is then about 4,77 / 29 = 16,4% – a number you can also find in the datasheet.

There is no rated power comparable to that for solar thermal collectors, so I wonder why the unit has been defined in this way. Speculating wildly: Physicists working on solar cells usually have a background in solid state physics, and the design of the kWp rating is equivalent to a familiar concept: Scattering cross section.

An atom can be modeled as a little oscillator, driven by the incident electromagnetic energy. It re-radiates absorbed energy in all directions. Although this can be fully understood only in quantum mechanical terms, simple classical models are successful in explaining some macroscopic parameters, like the index of refraction. The scattering strength of an atom is expressed as:

[ Power scattered ] / [ Incident power of the beam / m2 ]

… the same sort of ratio as discussed above! Power cancels out and the result is an area, imagined as a ‘cross-section’. The atom acts as if it were an opaque disk of a certain area that ‘cuts out’ a respective part of the incident beam and re-radiates it.

The same concept is used for describing interactions between all kinds of particles (not only photons) – the scattering cross section determines the probability that an interaction will occur:

Particles’ scattering strengths are represented by red disks (area = cross section). The probability of a scattering event going to happen is equal to the ratio of the sum of all red disk areas and the total (blue+red) area. (Wikimedia, User FerdiBf)

# Learning General Relativity

Math blogger Joseph Nebus does another A – Z series of posts, explaining technical terms in mathematics. He asked readers for their favorite pick of things to be covered in this series, and I came up with General Covariance. Which he laid out in this post – in his signature style, using neither equations nor pop-science images like deformed rubber mattresses – but ‘just words’. As so often, he manages to explain things really well!

Actually, I asked for that term as I am in the middle of yet another physics (re-)learning project – in the spirit of my ventures into QFT a while back.

Since a while I have now tried (on this blog) to cover only the physics related to something I have both education in and hands-on experience with. Re General Relativity I have neither: My PhD was in applied condensed-matter physics – lasers, superconductors, optics – and this article by physicist Chad Orzel about What Math Do You Need For Physics? covers well what sort of math you need in that case. Quote:

I moved into the lab, and was concerned more with technical details of vacuum pumps and lasers and electronic circuits and computer data acquisition and analysis.

So I cannot find the remotest way to justify why I would need General Relativity on a daily basis – insider jokes about very peculiarly torus-shaped underground water/ice tanks for heat pumps aside.

My motivation is what I described in this post of mine: Math-heavy physics is – for me, that means a statistical sample of 1 – the best way of brazing myself for any type of tech / IT / engineering work. This positive effect is not even directly related to math/physics aspects of that work.

But I also noticed ‘on the internet’ that there is a community of science and math enthusiasts, who indulge in self-studying theoretical physics seriously as a hobby. Often these are physics majors who ended up in very different industry sectors or in management / ‘non-tech’ jobs and who want to reconnect with what they once learned.

For those fellow learners I’d like to publish links to my favorite learning resources.

There seem to be two ways to start a course or book on GR, and sometimes authors toggle between both modes. You can start from the ‘tangible’ physics of our flat space (spacetime) plus special relativity and then gradually ‘add a bit of curvature’ and related concepts. In this way the introduction sounds familiar, and less daunting. Or you could try to introduce the mathematical concepts at a most rigorous abstract level, and return to the actual physics of our 4D spacetime and matter as late as possible.

The latter makes a lot of sense as you better unlearn some things you took for granted about vector and tensor calculus in flat space. A vector must no longer be visualized as an arrow that can be moved around carelessly in space, and one must be very careful in visualizing what transforming coordinates really means.

For motivation or as an ‘upper level pop-sci intro’…

Richard Feynman’s lecture on curved space might be a very good primer. Feynman explains what curved space and curved spacetime actually mean. Yes, he is using that infamous beetle on a balloon, but he also gives some numbers obtained by back-of-the-envelope calculations that explain important concepts.

For learning about the mathematical foundations …

I cannot praise these Lectures given at the Heraeus International Winter School Gravity and Light 2015 enough. Award-winning lecturer Frederic P. Schuller goes to great lengths to introduce concepts carefully and precisely. His goal is to make all implicit assumptions explicit and avoid allusions to misguided ‘intuitions’ one might got have used to when working with vector analysis, tensors, gradients, derivatives etc. in our tangible 3D world – covered by what he calls ‘undergraduate analysis’. Only in lecture 9 the first connection is made back to Newtonian gravity. Then, back to math only for some more lectures, until finally our 4D spacetime is discussed in lecture 13.

Schuller mentions in passing that Einstein himself struggled with the advanced math of his own theory, e.g. in the sense of not yet distinguishing clearly between the mathematical structure that represents the real world (a topological manifold) and the multi-dimensional chart we project our world onto when using an atlas. It is interesting to pair these lectures with this paper on the history and philosophy of general relativity – a link Joseph Nebus has pointed to in his post on covariance.

Learning physics or math from videos you need to be much more disciplined than with plowing through textbooks – in the sense that you absolutely have to do every single step in a derivation on your own. It is easy to delude oneself that you understood something by following a derivation passively, without calculating anything yourself. So what makes these lectures so useful is that tutorial sessions have been recorded as well: Tutorial sheets and videos can be found here.
(Edit: The Youtube channel of the event has not all the recordings of the tutorial sessions, only this conference website has. It seems the former domain does not work any more, but the content is perserved at gravity-and-light.herokuapp.com)

You also find brief notes for these lectures here.

For a ‘physics-only’ introduction …

… I picked a classical, ‘legendary’ resource: Landau and Lifshitz give an introduction to General Relativity in the last third of the second volume in their Course of Theoretical Physics, The Classical Theory of Fields. Landau and Lifshitz’s text is terse, perhaps similar in style to Dirac’s classical introduction to quantum mechanics. No humor, but sublime and elegant.

Landau and Lifshitz don’t need manifolds nor tangent bundles, and they use the 3D curvature tensor of space a lot in addition to the metric tensor of 4D spacetime. They introduce concepts of differences in space and time right from the start, plus the notion of simultaneity. Mathematicians might be shocked by a somewhat handwaving, ‘typical physicist’s’ way to deal with differentials, the way vectors on different points in space are related, etc. – neglecting (at first sight, explore every footnote in detail!) the tower of mathematical structures you actually need to do this precisely.

But I would regard Lev Landau sort of a Richard Feynman of The East, so it takes his genius not make any silly mistakes by taking the seemingly intuitive notions too literally. And I recommend this book only when combined with a most rigorous introduction.