# Entropy and Dimensions (Following Landau and Lifshitz)

Some time ago I wrote about volumes of spheres in multi-dimensional phase space – as needed in integrals in statistical mechanics.

The post was primarily about the curious fact that the ‘bulk of the volume’ of such spheres is contained in a thin shell beneath their hyperspherical surfaces. The trick to calculate something reasonable is to spot expressions you can Tayler-expand in the exponent.

Large numbers ‘do not get much bigger’ if multiplied by a factor, to be demonstrated again by Taylor-expanding such a large number in the exponent; I used this example:

Assuming N is about 1025  then its natural logarithm is about 58 and $Ne^N = e^{\ln(N)+N} = e^{58+10^{25}}$, then 58 can be neglected compared to N itself.

However, in the real world numbers associated with locations and momenta of particles come with units. Calling the unit ‘length’ in phase space $R_0$ the large volume can be written as $aN{(\frac{r}{R_0})}^N = ae^{\ln{(N)} + N\ln{(\frac{r}{R_0})}}$, and the impact of an additional factor N also depends on the unit length chosen.

I did not yet mention the related issues with the definition of entropy. In this post I will follow the way Landau and Lifshitz introduce entropy in Statistical Physics, Volume 5 of their Course of Theoretical Physics.

Landau and Lifshitz introduce statistical mechanics top-down, starting from fundamental principles and from Hamiltonian classical mechanics: no applications, no definitions of ‘heat’ and ‘work’, nor historical references needed for motivation. Classical phenomenological thermodynamics is only introduced after their are done with the statistical foundations. Both entropy and temperature are defined – these are useful fundamental properties spotted in the mathematical derivations and thus deserve special names. They cover both classical and quantum statistics in small number of pages – LL’s style has been called terse or elegant.

The behaviour of a system with a large number of particles is encoded in a probability distribution function in phase space, a density. In the classical case this is a continuous function of phase-space co-ordinates. In the quantum case you consider distinct states – whose energy levels are densely packed together though. Moving from classical to quantum statistics means to count those states rather than to integrate the smooth density function over a volume. There are equivalent states created by permutations of identical particles – but factoring in that is postponed and not required for a first definition of entropy. A quasi-classical description is sufficient: using a minimum cell in phase space, whose dimensions are defined by Planck’s constant h that has a dimension of action – length times momentum.

Entropy as statistical weight

Entropy S is defined as the logarithm of the statistical weight $\Delta \Gamma$ – the number of quantum states associated with the part of phase phase used by the (sub)-system. (Landau and Lifshitz use the concept of a – still large – subsystem embedded in a larger volume most consequentially, in order to avoid reliance on the ergodic hypothesis as mentioned in the preface). In the quasi-classical view the statistical weight is the volume in phase space occupied by the system divided by the size of the minimum unit cell defined by Planck’s constant h. Denoting momenta by p, positions by q, using $\Delta p$ and $\Delta q$ as a shortcut applying multiple dimensions equivalent to s degrees of freedom…

$S = log \Delta \Gamma = log \frac {\Delta p \Delta q}{2 \pi \hbar^s}$

An example from solid state physics: if the system is considered a rectangular box in the physical world, possible quantum states related to vibrations can be visualized in terms of possible standing waves that ‘fit’ into the box. The statistical weight would then single out those bunch of states the system actually ‘has’ / ‘uses’ / ‘occupies’ in the long run.

Different sorts of statistical functions are introduced, and one reason for writing this article to emphasize the difference between them: The density function associates each point in phase space – each possible configuration of a system characterized by the momenta and locations of all particles – with a probability. These points are also called microstates. Taking into account the probabilities to find a system in any of these microstates gives you the so-called macrostate characterized by the statistical weight: How large or small a part of phase space the system will use when watched for a long time.

The canonical example is an ideal gas in a vessel: The most probable spacial distribution of particles is to find them spread out evenly, the most unlikely configuration is to have them concentrated in (nearly) the same location, like one corner of the box. The density function assigns probabilities to these configurations. As the even distribution is so much much more likely, the $\Delta q$ part of the statistical weight would cover all of the physical volume available. The statistical weight function has to obtain a maximum value in the most likely case, in equilibrium.

The significance of energies – and why there are logarithms everywhere.

Different sufficiently large subsystems of one big system are statistically independent – as their properties are defined by their bulk volume rather than their surfaces interfacing with other subsystems – and the larger the volume, the larger the ratio of volume and surface.  Thus the probability density function for the combined system – as a function of momenta and locations of all particles in the total phase phase – has to be equal to the product of the densities for each subsystem. Denoting the classical density with $\rho$ and adding a subscript for the set of momenta and positions referring to a subsystem:

$\rho(q,p) = \rho_1(q_1,p_1) \rho_2(q_2,p_2)$

(Since these are probability densities, the actual probability is always obtained by multiplying with the differential(s) $dqdp$).

This means that the logarithm of the composite density is equal to the sum of the logarithms of the individual densities. This the root cause of having logarithms show up everywhere in statistical mechanics.

A mechanical system of particles is characterized by only 7 ‘meaningful’ additive integrals: Energy, momentum and angular momentum – they add up when you combine systems, in contrast to all the other millions of integration constants that would appear when solving the equations of motions exactly. Momentum and angular momentum are not that interesting thermodynamically, as one can change to a frame moving and rotating with the system (LL also cover rotating systems). So energy remains as the integral of outstanding importance.

From counting states to energy intervals

What we want is to relate entropy to energy, so assertions about numbers of states covered need to be translated to statements about energy and energy ranges.

LL denote the probability to find a system in (micro-)state n with energy $E_n$ as $w_n$ – the quantum equivalent of density $\rho$. $w_n$ has to be a linear function of the energy of this micro-state $E_n$ as per the additivity just mentioned above, and thus LL omit the subscript n for w:

$w_n = w(E_n)$

(They omit any symbol ever if possible to keep their notation succinct ;-))

A thermodynamic system has an enormous number of (mechanical) degrees of freedom. Fluctuations are small as per the law of large numbers in statistics, and the probability to find a system with a certain energy can be approximated by a sharp delta-function-like peak at the system’s energy E. So in thermal equilibrium its energy has a very sharp peak. It occupies a very thin ‘layer’ of thickness $\Delta E$ in config space – around the hyperplane that characterizes its average energy E.

Statistical weight $\Delta \Gamma$ can be considered the width of the related function: Energy-wise broadening of the macroscopic state $\Delta E$ needs to be translated to a broadening related to the number of quantum states.

We change variables, so the connection between Γ and E is made via the derivative of Γ with respect to E. E is an integral, statistical property of the whole system, and the probability for the system to have energy E in equilibrium is $W(E)dE$. E is not discrete so this is again a  probability density. It is capital W now – in contrast to $w_n$ which says something about the ‘population’ of each quantum state with energy $E_n$.

A quasi-continuous number of states per energy Γ is related to E by the differential:

$d\Gamma = \frac{d\Gamma}{dE} dE$.

As E peaks so sharply and the energy levels are packed so densely it is reasonable to use the function (small) w but calculate it for an argument value E. Capital W(E) is a probability density as a function of total energy, small w(E) is a function of discrete energies denoting states – so it has to be multiplied by the number of states in the range in question:

$W(E)dE = w(E)d\Gamma$

Thus…

$W(E) = w(E)\frac{d\Gamma}{dE}$.

The delta-function-like functions (of energy or states) have to be normalized, and the widths ΔΓ and ΔE multiplied by the respective heights W and w taken at the average energy $E_\text{avg}$ have to be 1, respectively:

$W(E_\text{avg}) \Delta E = 1$
$w(E_\text{avg}) \Delta \Gamma = 1$

(… and the ‘average’ energy is what is simply called ‘the’ energy in classical thermodynamics).

So $\Delta \Gamma$ is inversely proportional to the probability of the most likely state (of average energy). This can also be concluded from the quasi-classical definition: If you imagine a box full of particles, the least possible state is equivalent to all particles occupying a single cell in phase space. The probability for that is (size of the unit cell) over (size of the box) times smaller than the probability to find the particles evenly distributed on the whole box … which is exactly the definition of $\Delta \Gamma$.

The statistical weight is finally:

$\Delta \Gamma = \frac{d\Gamma(E_\text{avg})}{dE} \Delta E$.

… the broadening in $\Gamma$, proportional to the broadening in $E$

The more familiar (?) definition of entropy

From that, you can recover another familiar definition of entropy, perhaps the more common one. Taking the logarithm…

$log S = log (\Delta \Gamma) = -log (w(E_\text{avg}))$.

As log w is linear in E, the averaging of E can be extended to the whole log function. Then the definition of ‘averaging over states n’ can be used: To multiply the value for each state n by probability $w_n$ and sum up:

$- \sum_{n} w_n log w_n$.

… which is the first statistical expression for entropy I had once learned.

LL do not introduce Boltzmann’s constant k here

It is effectively set to 1 – so entropy is defined without a reference to k. k is is only mentioned in passing later: In case one wishes to measure energy and temperature in different units. But there is no need to do so, if you defined entropy and temperature based on first principles.

Back to units

In a purely classical description based on the volume in phase space instead of the number of states there would be no cell of minimum size, and then instead of the statistical weight we had simply this volume: But then entropy would be calculated in a very awkward unit, the logarithm of action. Every change of the unit for measuring volumes in phase space would result in an additive constant – the deeper reason why entropy in a classical context is only defined up to such a constant.

So the natural unit called $R_0$ above should actually be Planck’s constant taken to the power defined by the number of particles.

Temperature

The first task to be solved in statistical mechanics is to find a general way of formulating a proper density function small $w_n$ as a function of energy $E_n$. You can either assume that the system has a clearly defined energy upfront – the system lives on a ‘energy-hyperplane in phase space’ – or you can consider it immersed in a larger system later identified with a ‘heat bath’ which causes the system to reach thermal equilibrium. These two concepts are called the micro-canonical and the canonical distribution (or Gibbs distribution) and the actual distribution functions don’t differ much because the energy peaks so sharply also in the canonical case. It’s that type of calculations where those hyperspheres are actually needed.

Temperature as a concept emerges from a closer look at these distributions, but LL introduce it upfront from simpler considerations: It is sufficient to know that 1) entropy only depends on energy, 2) both are additive functions of subsystems, and 3) entropy is a maximum in equilibrium. You divide one system in two subsystems. The total change in entropy has to be zero as this is a maximum (in equilibrium), and what energy $dE_1$ leaves one system has to be received as $dE_2$ by the other system. Taking a look at the total entropy S as a function of the energy of one subsystem:

$0 = \frac{dS}{dE_1} = \frac{dS_1}{dE_1} + \frac{dS_2}{dE_1} =$
$= \frac{dS_1}{dE_1} + \frac{dS_2}{dE_2} \frac{dE_2}{dE_1} =$
$= \frac{dS_1}{dE_1} + \frac{dS_2}{dE_2}$

So $\frac{dS_x}{dE_x}$ has to be the same for each subsystem x. Cutting one of the subsystems in two  you can use the same argument again. So there is one very interesting quantity that is the same for every subsystem – $\frac{dS}{dE}$. Let’s call it 1/T and let’s call T the temperature.

# Spheres in a Space with Trillions of Dimensions

I don’t venture into speculative science writing – this is just about classical statistical mechanics; actually about a special mathematical aspect. It was one of the things I found particularly intriguing in my first encounters with statistical mechanics and thermodynamics a long time ago – a curious feature of volumes.

I was mulling upon how to ‘briefly motivate’ the calculation below in a comprehensible way, a task I might have failed at years ago already, when I tried to use illustrations and metaphors (Here and here). When introducing the ‘kinetic theory’ in thermodynamics often the pressure of an ideal gas is calculated first, by considering averages over momenta transferred from particles hitting the wall of a container. This is rather easy to understand but still sort of an intermediate view – between phenomenological thermodynamics that does not explain the microscopic origin of properties like energy, and ‘true’ statistical mechanics. The latter makes use of a phase space with with dimensions the number of particles. One cubic meter of gas contains ~1025 molecules. Each possible state of the system is depicted as a point in so-called phase space: A point in this abstract space represents one possible system state. For each (point-like) particle 6 numbers are added to a gigantic vector – 3 for its position and 3 for its momentum (mass times velocity), so the space has ~6 x 1025 dimensions. Thermodynamic properties are averages taken over the state of one system watched for a long time or over a lot of ‘comparable’ systems starting from different initial conditions. At the heart of statistical mechanics are distributions functions that describe how a set of systems described by such gigantic vectors evolves. This function is like a density of an incompressible fluid in hydrodynamics. I resorted to using the metaphor of a jelly in hyperspace before.

Taking averages means to multiply the ‘mechanical’ property by the density function and integrate it over the space where these functions live. The volume of interest is a  generalized N-ball defined as the volume within a generalized sphere. A ‘sphere’ is the surface of all points in a certain distance (‘radius’ R) from an origin

$x_1^2 + x_2^2 + ... + x_ {N}^2 = R^2$

($x_n$ being the co-ordinates in phase space and assuming that all co-ordinates of the origin are zero). Why a sphere? Because states are ordered or defined by energy, and larger energy means a greater ‘radius’ in phase space. It’s all about rounded surfaces enclosing each other. The simplest example for this is the ellipse of the phase diagram of the harmonic oscillator – more energy means a larger amplitude and a larger maximum velocity.

And here is finally the curious fact I actually want to talk about: Nearly all the volume of an N-ball with so many dimensions is concentrated in an extremely thin shell beneath its surface. Then an integral over a thin shell can be extended over the full volume of the sphere without adding much, while making integration simpler.

This can be seen immediately from plotting the volume of a sphere over radius: The volume of an N-ball is always equal to some numerical factor, times the radius to the power of the number of dimensions. In three dimensions the volume is the traditional, honest volume proportional to r3, in two dimensions the ‘ball’ is a circle, and its ‘volume’ is its area. In a realistic thermodynamic system, the volume is then proportional to rN with a very large N.

The power function rN turn more and more into an L-shaped function with increasing exponent N. The volume increases enormously just by adding a small additional layer to the ball. In order to compare the function for different exponents, both ‘radius’ and ‘volume’ are shown in relation to the respective maximum value, R and RN.

The interesting layer ‘with all the volume’ is certainly much smaller than the radius R, but of course it must not be too small to contain something. How thick the substantial shell has to be can be found by investigating the volume in more detail – using a ‘trick’ that is needed often in statistical mechanics: Taylor expanding in the exponent.

A function can be replaced by its tangent if it is sufficiently ‘straight’ at this point. Mathematically it means: If dx is added to the argument x, then the function at the new target is f(x + dx), which can be approximated by f(x) + [the slope df/dx] * dx. The next – higher-order term would be proportional to the curvature, the second derivation – then the function is replaced by a 2nd order polynomial. Joseph Nebus has recently published a more comprehensible and detailed post about how this works.

So the first terms of this so-called Taylor expansion are:

$f(x + dx) = f(x) + dx{\frac{df}{dx}} + {\frac{dx^2}{2}}{\frac{d^2f}{dx^2}} + ...$

If dx is small higher-order terms can be neglected.

In the curious case of the ball in hyperspace we are interested in the ‘remaining volume’ V(r – dr). This should be small compared to V(r) = arN (a being the uninteresting constant numerical factor) after we remove a layer of thickness dr with the substantial ‘bulk of the volume’.

However, trying to expand the volume V(r – dr) = a(r – dr)N, we get:

$V(r - dr) = V(r) - adrNr^{N-1} + a{\frac{dr^2}{2}}N(N-1)r^{N-2} + ...$
$= ar^N(1 - N{\frac{dr}{r}} + {\frac{N(N-1)}{2}}({\frac{dr}{r}})^2) + ...$

But this is not exactly what we want: It is finally not an expansion, a polynomial, in (the small) ratio of dr/r, but in Ndr/r, and N is enormous.

So here’s the trick: 1) Apply the definition of the natural logarithm ln:

$V(r - dr) = ae^{N\ln(r - dr)} = ae^{N\ln(r(1 - {\frac{dr}{r}}))}$
$= ae^{N(\ln(r) + ln(1 - {\frac{dr}{r}}))}$
$= ar^Ne^{\ln(1 - {\frac{dr}{r}}))} = V(r)e^{N(\ln(1 - {\frac{dr}{r}}))}$

2) Spot a function that can be safely expanded in the exponent: The natural logarithm of 1 plus something small, dr/r. So we can expand near 1: The derivative of ln(x) is 1/x (thus equal to 1/1 near x=1) and ln(1) = 0. So ln(1 – x) is about -x for small x:

$V(r - dr) = V(r)e^{N(0 - 1{\frac{dr}{r})}} \simeq V(r)e^{-N{\frac{dr}{r}}}$

3) Re-arrange fractions …

$V(r - dr) = V(r)e^{-\frac{dr}{(\frac{r}{N})}}$

This is now the remaining volume, after the thin layer dr has been removed. It is small in comparison with V(r) if the exponential function is small, thus if ${\frac{dr}{(\frac{r}{N})}}$ is large or if:

$dr \gg \frac{r}{N}$

Summarizing: The volume of the N-dimensional hyperball is contained mainly in a shell dr below the surface if the following inequalities hold:

${\frac{r}{N}} \ll dr \ll r$

The second one is needed to state that the shell is thin – and allow for expansion in the exponent, the first one is needed to make the shell thick enough so that it contains something.

This might help to ‘visualize’ a closely related non-intuitive fact about large numbers, like eN: If you multiply such a number by a factor ‘it does not get that much bigger’ in a sense – even if the factor is itself a large number:

Assuming N is about 1025  then its natural logarithm is about 58 and…

$Ne^N = e^{\ln(N)+N} = e^{58+10^{25}}$

… 58 can be neglected compared to N itself. So a multiplicative factor becomes something to be neglected in a sum!

I used a plain number – base e – deliberately as I am obsessed with units. ‘r’ in phase space would be associated with a unit incorporating lots of lengths and momenta. Note that I use the term ‘dimensions’ in two slightly different, but related ways here: One is the mathematical dimension of (an abstract) space, the other is about cross-checking the physical units in case a ‘number’ is something that can be measured – like meters. The co-ordinate  numbers in the vector refer to measurable physical quantities. Applying the definition of the logarithm just to rN would result in dimensionless number N side-by-side with something that has dimensions of a logarithm of the unit.

Using r – a number with dimensions of length – as base, it has to be expressed as a plain number, a multiple of the unit length $R_0$ (like ‘1 meter’). So comparing the original volume of the ball $a{(\frac{r}{R_0})}^N$ to one a factor of N bigger …

$aN{(\frac{r}{R_0})}^N = ae^{\ln{(N)} + N\ln{(\frac{r}{R_0})}}$

… then ln(N) can be neglected as long as $\frac{r}{R_0}$ is not extreeeemely tiny. Using the same argument as for base e above, we are on the safe side (and can neglect factors) if r is of about the same order of magnitude as the ‘unit length’ $R_0$. The argument about negligible factors is an argument about plain numbers – and those ‘don’t exist’ in the real world as one could always decide to measure the ‘radius’ in a units of, say, 10-30 ‘meters’, which would make the original absolute number small and thus the additional factor non-negligible. One might save the argument by saying that we would always use units that sort of match the typical dimensions (size) of a system.

Saying everything in another way: If the volume of a hyperball ~rN is multiplied by a factor, this corresponds to multiplying the radius r by a factor very, very close to 1 – the Nth root of the factor for the volume. Only because the number of dimensions is so large, the volume is increased so much by such a small increase in radius.

As the ‘bulk of the volume’ is contained in a thin shell, the total volume is about the product of the surface area and the thickness of the shell dr. The N-ball is bounded by a ‘sphere’ with one dimension less than the ball. Increasing the volume by a factor means that the surface area and/or the thickness have to be increased by factors so that the product of these factors yield the volume increase factor. dr scales with r, and does thus not change much – the two inequalities derived above do still hold. Most of the volume factor ‘goes into’ the factor for increasing the surface. ‘The surface becomes the volume’.

This was long-winded. My excuse: Also Richard Feynman took great pleasure in explaining the same phenomenon in different ways. In his lectures you can hear him speak to himself when he says something along the lines of: Now let’s see if we really understood this – let’s try to derive it in another way…

And above all, he says (in a lecture that is more about math than about physics)

Now you may ask, “What is mathematics doing in a physics lecture?” We have several possible excuses: first, of course, mathematics is an important tool, but that would only excuse us for giving the formula in two minutes. On the other hand, in theoretical physics we discover that all our laws can be written in mathematical form; and that this has a certain simplicity and beauty about it. So, ultimately, in order to understand nature it may be necessary to have a deeper understanding of mathematical relationships. But the real reason is that the subject is enjoyable, and although we humans cut nature up in different ways, and we have different courses in different departments, such compartmentalization is really artificial, and we should take our intellectual pleasures where we find them.

___________________________________

Further reading / sources: Any theoretical physics textbook on classical thermodynamics / statistical mechanics. I am just re-reading mine.

# Hyper-Jelly – Again. Why We Need Hyperspace – Even in Politics.

All of old.
Nothing else ever.
Ever tried. Ever failed.
No matter.
Try again.
Fail again.
Fail better.

This is a quote from Worstward Ho by Samuel Beckett – a poem as impenetrable and opaque as my post on quantization. There is a version of Beckett’s poem with explanations, so I try again, too!

I stated that the description of a bunch of particles (think: gas in a box) naturally invokes the introduction of a hyperspace having twice as many dimensions as the number of those particles.

But it was obviously not obvious why we need those many dimensions. You have asked:

Why do we need additional dimensions for each particles? Can’t they live in the same space?

Why does the collection of the states of all possible systems occupy a patch in 1026 dimensional phase space. ?

These are nearly the same questions.

I start from a non-physics example this time, because I believe this might convey the motivation for introducing these dimensions better.

These dimensions are not at all related to hidden, compactified, extra large dimensions you might have read about in popular physics books on string theory and cosmology. They are not tangible dimensions in the sense we could feel them – even if we weren’t like those infamous ants living on the inflating balloon.

In Austria we recently had parliamentary elections. This is the distribution of seats in parliament now:

which is equivalent to these numbers:

SPÖ (52)
ÖVP (47)
FPÖ (40)
Grüne (24)
Team Stronach (11)
NEOS (9)

Using grand physics-inspired language I call that ordered collection of numbers: Austria’s Political State Vector.

These are six numbers thus this is a vector in a 6-dimensional Political State Space.

Before the elections websites consolidating and analyzing different polls have been more popular than ever. (There was a website run by physicist now working in finance.)

I can only display two of 6 dimensions in a plane, so the two axes represent any of those 6 dimensions. The final political state is represented by a single point in this space – the tip of an 6D arrow:

After the elections we know the political state vector with certainty – that is: a probability of 1.

Before the elections different polls constituted different possible state vectors – each associated with a probability lower than 1. I indicate probabilities by different hues of red:

Each points represents a different state the system may finally settle in. Since the polls are hopefully meaningful and voters not too irrational points are not scattered randomly in space but rather close to each other.

Now imagine millions of polls – such as citizens’ political opinions tracked every millisecond by directly wiretapping their brains. This would result in millions of points, all close to each other. Not looking too closely, this is a blurred patch or spot – a fairly confined region of space covered with points that seems to merge into a continuous distribution.

Watching the development of this red patch over time lets us speculate on the law underlying its dynamics – deriving a trend from the dynamics of voters’ opinions.

It is like figuring out the dynamics of a  moving and transforming piece of jelly:

Back to Physics

Statistical mechanics is similar, just the numbers of dimensions are much bigger.

In order to describe what each molecule of gas in a room does, we need 6 numbers per molecules – 3 for its spatial coordinates, and 3 for its velocity.

Each particle lives in the same real space where particles wiggle and bump into each other. All those additional dimensions only emerge because we want to find a mathematical representation where each potential system state shows up as a single dot – tagged with a certain probability. As in politics!

We stuff all positions and velocities of particles into an enormous state vector – one ordered collection with about 1026 different numbers corresponds to a single dot in hyperspace.

The overall goal in statistical mechanics is to calculate something we are really interested in – such as temperature of a gas. We aim at calculating probabilities for different states!

We don’t want to look to closely: We might want to compare what happens if if we start from a configuration with all molecules concentrated in a corner of the room with another one consisting of molecules everywhere in the room. But we don’t need to know where each molecule is exactly. Joseph Nebus has given an interesting example related his numerical calculation of the behavior of a car’s shock absorbers:

But what’s interesting isn’t the exact solution of the exact problem for a particular set of starting conditions. When your car goes over a bump, you’re interested in what the behavior is: is there a sudden bounce and a slide back to normal?  Does the car wobble for a short while?  Does it wobble for a long while?  What’s the behavior?

You have asked me for giving you the equations. I will try my best and keep the intro paragraphs of this post in mind.

What do we know and what do we want to calculate?

We are interested in how that patch moves and is transformed – that is probability (the hue of red) as a function of the positions and momenta of all particles. This is a function of 1026 variables, usually called a distribution function or a density.

We know anything about the system, that is the forces at play. Knowing forces is equivalent to knowing the total energy of a system as a function of any system configuration – if you know the gravitational force a planet exerts than you know gravitational energy.

You could consider the total energy of a system that infamous formula in science fiction movies that spies copy from the computer in the secret laboratories to their USB sticks: If you know how to calculate the total energy as a function of the positions and momenta of all particles – you literally rule the world for the system under consideration.

Hyper-Planes

If we know this energy ‘world function’ we could attach a number to each point in hyperspace that indicate energy, or we could draw the hyper-planes of constant energies – equivalent of isoclines in a map.

The dimension of the hyperplane is the dimension of the hyperspace minus one, just as the familiar 2D planes floating through 3D space.

If energy changes more rapidly with varying particle positions and momenta hyper-planes get closer to each other:

Incompressible Jelly

We are still in a classical world. The equations of motions of hyper-jelly are another way to restate Newton’s equations of motion. You start with writing down Force = mass x accelerating for each particle (1026 times), rearrange these equations by using those huge state vectors just introduced – and you end up with an equation describing the time evolution of the red patch.

I picked the jelly metaphor deliberately as it turns out that hyper-jelly acts as an incompressible fluid. Jelly cannot be destroyed or created. If you try to squeeze it in between two planes it will just flow faster. This really follows from Newton’s law or the conservation of energy!

It might appear complicated to turn something as (seemingly) comprehensible as Newton’s law into that formalism. But that weird way of watching the time evolution of the red patch makes it actually easier to calculate what really matters.

Anything that changes in the real world – the time evolution of any quantity we can measure – is expressed via the time evolution of hyper-jelly.

The Liouville equation puts this into math.

As Richard Feynman once noted wisely (Physics Lectures, Vol.2, Ch. 25), I could put all fundamental equations into a big matrix of equations which I then call the Unwordliness, further denoted as U. Then I can unify them again as

U = 0

What I do here is not that obscure but I use some pseudo-code to obscure the most intimidating math. I do now appreciate science writers who state We use a mathematical crank that turns X into Y – despite or because they know exactly what they are talking about.

For every point in hyperspace the Liouville equation states:

(Rate of change of some interesting physical property in time) =
(Some mathematical machinery entangling spatial variations in system’s energy and spatial variations in ‘some property’)

Spatial variations in the system’s energy can be translated to the distance of those isoclines – this is exactly what Newton’s translates into! (In addition we apply the chain rule in vector calculus).

The mathematical crank is indicated using most innocent brackets, so the right-hand side reads:

{Energy function, interesting property function}

Quantization finally seems to be deceptively simple – the quantum equivalent looks very similar, with the right-hand side proportional to

[Energy function, interesting property function]

The main difference is in the brackets – square versus curly: We consider phase space so any function and changes thereof is calculated in phase space co-ordinates – positions and momenta of particles. These cannot be measured or calculated in quantum mechanics with certainty at the same time.

In a related way the exact order of operations does matter in quantum physics – whereas the classical counterparts are commutative operations. The square bracket versus the angle bracket is where these non-commutative operations are added – as additional constraints to classical theory.

I think I have reached my – current – personal limits in explaining this, while still not turning this blog into in a vector calculus lecture. Probably this stuff is usually not popularized for a reason.

My next post will focus on quantum fields again – and I try to make each post as self-consistent anyway.

# On the Relation of Jurassic Park and Alien Jelly Flowing through Hyperspace

Yes, this is a serious physics post – no. 3 in my series on Quantum Field Theory.

I promised to explain what Quantization is. I will also argue – again – that classical mechanics is unjustly associated with pictures like this:

… although it is more like this:

This shows the timelines in Back to the Future – in case you haven’t recognized it immediately.

What I am trying to say here is – again – is so-called classical theory is as geeky, as weird, and as fascinating as quantum physics.

Experts: In case I get carried away by my metaphors – please see the bottom of this post for technical jargon and what I actually try to do here.

Get a New Perspective: Phase Space

I am using my favorite simple example: A point-shaped mass connected to an massless spring or a pendulum, oscillating forever – not subject to friction.

The speed of the mass is zero when the motion changes from ‘upward’ to ‘downward’. It is maximum when the pendulum reaches the point of minimum height. Anything oscillates: Kinetic energy is transferred to potential energy and back. Position, velocity and acceleration all follow wavy sine or cosine functions.

For purely aesthetic reasons I could also plot the velocity versus position:

From a mathematical perspective this is similar to creating those beautiful Lissajous curves:  Connecting a signal representing position to the x input of an oscillosope and the velocity signal to the y input results in a circle or an ellipse:

This picture of the spring’s or pendulum’s motion is called a phase portrait in phase space. Actually we use momentum, that is: velocity times mass, but this is a technicality.

The phase portrait is a way of depicting what a physical system does or can do – in a picture that allows for quick assessment.

Non-Dull Phase Portraits

Real-life oscillating systems do not follow simple cycles. The so-called Van der Pol oscillator is a model system subject to damping. It is also non-linear because the force of friction depends on the position squared and the velocity. Non-linearity is not uncommon; also the friction an airplane or car ‘feels’ in the air is proportional to the velocity squared.

The stronger this non-linear interaction is (the parameter mu in the figure below) the more will the phase portrait deviate from the circular shape:

Searching for this image I have learned from Wikipedia that the Van der Pol oscillator is used as a model in biology – here the physical quantity considered is not a position but the action potential of a neuron (the electrical voltage across the cell’s membrane).

Thus plotting the rate of change of in a quantity we can measure plotted versus the quantity itself makes sense for diverse kinds of systems. This is not limited to natural sciences – you could also determine the phase portrait of an economic system!

Addicts of popular culture memes might have guessed already which phase portrait needs to be depicted in this post:

Reconnecting to Popular Science

Chaos Theory has become popular via the elaborations of Dr. Ian Malcolm (Jeff Goldblum) in the movie Jurassic Park. Chaotic systems exhibit phase portraits that are called Strange Attractors. An attractor is the set of points in phase space a system ‘gravitates’ to if you leave it to itself.

There is no attractor for the simple spring: This system will trace our a specific circle in phase space forever – the larger the bigger the initial push on the spring is.

The most popular strange attractor is probably the Lorentz Attractor. It  was initially associated with physical properties characteristic of temperature and the flow of air in the earth’s atmosphere, but it can be re-interpreted as a system modeling chaotic phenomena in lasers.

It might be apocryphal but I have been told that it is not the infamous flap of the butterfly’s wing that gave the related effect its name, but rather the shape of the three-dimensional attractor:

We had Jurassic Park – here comes the jelly!

A single point-particle on a spring can move only along a line – it has a single degree of freedom. You need just a two-dimensional plane to plot its velocity over position.

Allowing for motion in three-dimensional space means we need to add additional dimensions: The motion is fully characterized by the (x,y,z) positions in 3D space plus the 3 components of velocity. Actually, this three-dimensional vector is called velocity – its size is called speed.

Thus we need already 6 dimensions in phase space to describe the motion of an idealized point-shaped particle. Now throw in an additional point-particle: We need 12 numbers to track both particles – hence 12 dimensions in phase space.

Why can’t the two particles simply use the same space?(*) Both particles still live in the same 3D space, they could also inhabit the same 6D phase space. The 12D representation has an advantage though: The whole system is represented by a single dot which make our lives easier if we contemplate different systems at once.

Now consider a system consisting of zillions of individual particles. Consider 1 cubic meter of air containing about 1025 molecules. Viewing these particles in a Newtonian, classical way means to track their individual positions and velocities. In a pre-quantum mechanical deterministic assessment of the world you know the past and the future by calculating these particles’ trajectories from their positions and velocities at a certain point of time.

Of course this is not doable and leads to practical non-determinism due to calculation errors piling up and amplifying. This is a 1025 body problem, much much much more difficult than the three-body problem.

Fortunately we don’t really need all those numbers in detail – useful properties of a gas such as the temperature constitute gross statistical averages of the individual particles’ properties. Thus we want to get a feeling how the phase portrait develops ‘on average’, not looking too meticulously at every dot.

The full-blown phase space of the system of all molecules in a cubic meter of air has about 1026 dimensions – 6 for each of the 1025 particles (Physicists don’t care about a factor of 6 versus a factor of 10). Each state of the system is sort of a snapshot what the system really does at a point of time. It is a vector in 1026 dimensional space – a looooong ordered collection of numbers, but nonetheless conceptually not different from the familiar 3D ‘arrow-vector’.

Since we are interesting in averages and probabilities we don’t watch a single point in phase space. We don’t follow a particular system.

We rather imagine an enormous number of different systems under different conditions.

Considering the gas in the cubic vessel this means: We imagine molecule 1 being at the center and very fast whereas molecule 10 is slow and in the upper right corner, and molecule 666 is in the lower left corner and has medium. Now extend this description to 1025 particles.

But we know something about all of these configurations: There is a maximum x, y and z particles can have – the phase portrait is limited by these maximum dimensions as the circle representing the spring was. The particles have all kinds of speeds in all kinds of directions, but there is a most probably speed related to temperature.

The collection of the states of all possible systems occupy a patch in 1026 dimensional phase space.

This patch gradually peters out at the edges in velocities’ directions.

Now let’s allow the vessel for growing: The patch will become bigger in spatial dimensions as particles can have any position in the larger cube. Since the temperature will decrease due to the expansion the mean velocity will decrease – assuming the cube is insulated.

The time evolution of the system (of these systems, each representing a possible system) is represented by a distribution of this hyper-dimensional patch transforming and morphing. Since we consider so many different states – otherwise probabilities don’t make sense – we don’t see the granular nature due to individual points – it’s like a piece of jelly moving and transforming:

Precisely defined initial configurations of systems configurations have a tendency to get mangled and smeared out. Note again that each point in the jelly is not equivalent to a molecule of gas but it is a point in an abstract configuration space with a huge number of dimensions. We can only make it accessible via projections into our 3D world or a 2D plane.

The analogy to jelly or honey or any fluid is more apt than it may seem

The temporal evolution in this hyperspace is indeed governed by equations that are amazingly similar to those governing an incompressible liquid – such as water. There is continuity and locality: Hyper-Jelly can’t get lost and be created. Any increase in hyper-jelly in a tiny volume of phase space can only be attributed to jelly flowing in to this volume from adjacent little volumes.

In summary: Classical mechanical systems comprising many degrees of freedom – that is: many components that have freedom to move in a different way than other parts of the system – can be best viewed in the multi-dimensional space whose dimensions are (something like) positions and (something like) the related momenta.

Can it get more geeky than that in quantum theory?

Finally: Quantization

I said in the previous post that quantization of fields or waves is like turning down intensity in order to bring out the particle-like rippled nature of that wave. In the same way you could say that you add blurry waviness to idealized point-shaped particles.

Another is to consider the loss in information via Heisenberg’s Uncertainly Principle: You cannot know both the position and the momentum of a particle or a classical wave exactly at the same time. By the way, this is why we picked momenta  and not velocities to generate phase space.

You calculate positions and momenta of small little volumes that constitute that flowing and crawling patches of jelly at a point of time from positions and momenta the point of time before. That’s the essence of Newtonian mechanics (and conservation of matter) applied to fluids.

Doing numerical calculation in hydrodynamics you think of jelly as divided into small little flexible cubes – you divide it mentally using a grid, and you apply a mathematical operation that creates the new state of this digitized jelly from the old one.

Since we are still discussing a classical world we do know positions and momenta with certainty. This translates to stating (in math) that it does not matter if you do calculations involving positions first or for momenta.

There are different ways of carrying out steps in these calculations because you could do them one way of the other – they are commutative.

Calculating something in this respect is similar to asking nature for a property or measuring that quantity.

Thus when we apply a quantum viewpoint and quantize a classical system calculating momentum first and position second or doing it the other way around will yield different results.

The quantum way of handling the system of those  1025 particles looks the same as the classical equations at first glance. The difference is in the rules for carrying out calculation involving positions and momenta – so-called conjugate variables.

Thus quantization means you take the classical equations of motion and give the mathematical symbols a new meaning and impose new, restricting rules.

I probably could just have stated that without going off those tangent.

However, any system of interest in the real world is not composed of isolated particles. We live in a world of those enormous phase spaces.

In addition, working with large abstract spaces like this is at the heart of quantum field theory: We start with something spread out in space – a field with infinite degrees in freedom. Considering different state vectors in these quantum systems is considering all possible configurations of this field at every point in space!

(*) This was a question asked on G+. I edited the post to incorporate the answer.

_______________________________________

Expert information:

I have taken a detour through statistical mechanics: Introducing Liouville equations as equation of continuity in a multi-dimensional phase space. The operations mentioned – related to positions of velocities – are the replacement of time derivatives via Hamilton’s equations. I resisted the temptation to mention the hyper-planes of constant energy. Replacing the Poisson bracket in classical mechanics with the commutator in quantum mechanics turns the Liouville equation into its quantum counterpart, also called Von Neumann equation.

I know that a discussion about the true nature of temperature is opening a can of worms. We should rather describe temperature as the width of a distribution rather than the average, as a beam of molecules all travelling in the same direction at the same speed have a temperature of zero Kelvin – not an option due to zero point energy.

The Lorenz equations have been applied to the electrical fields in lasers by Haken – here is a related paper. I did not go into the difference of the phase portrait of a system showing its time evolution and the attractor which is the system’s final state. I also didn’t stress that was is a three dimensional image of the Lorenz attractor and in this case the ‘velocities’ are not depicted. You could say it is the 3D projection of the 6D phase portrait. I basically wanted to demonstrate – using catchy images, admittedly – that representations in phase space allows for a quick assessment of a system.

I also tried to introduce the notion of a state vector in classical terms, not jumping to bras and kets in the quantum world as if a state vector does not have a classical counterpart.

I have picked an example of a system undergoing a change in temperature (non-stationary – not the example you would start with in statistical thermodynamics) and swept all considerations on ergodicity and related meaningful time evolutions of systems in phase space under the rug.