How to Introduce Special Relativity (Historical Detour)

I am just reading the volume titled Waves in my favorite series of ancient textbooks on Theoretical Physics by German physics professor Wilhelm Macke. I tried to resist the urge to write about seemingly random fields of physics, and probably weird ways of presenting them – but I can’t resist any longer.

There are different ways to introduce special relativity. Typically, the Michelson-Morely experiment is presented first, as our last attempt in a futile quest to determine to absolute speed in relation to “ether”. In order to explain these results we have to accept the fact that the speed of light is the same in any inertial frame. This is weird and non-intuitive: We probably can’t help but compare a ray of light to a bunch of bullets or a fast train – whose velocity relative to us does change with our velocity. We can outrun a train but we can’t outrun light.

Michelson–Morley experiment

The Michelson–Morley experiment: If light travels in a system – think: space ship – that moves at velocity v with respect to absolute space the resulting velocity should depend on the angle between the system’s velocity and the absolute velocity. Just in the same way as the observed relative velocity of a train becomes zero if we manage to ride besides it in a car driving at the same speed as the train. But this experiments shows – via non-detected interference of beam of alleged varying velocities – that we must not calculate relative velocities of beams of light. (Wikimedia)

Yet, not accepting it would lead to even more weird consequences: After all, the theory of electromagnetism had always been relativistically invariant. The speed of light shows up as a constant in the related equations which explain perfectly how waves of light behaves.

I think the most straight-forward way to introduce special relativity is to start from its core ideas (only) – the constant speed of light and the equivalence of frames of reference. This is the simplicity and beauty of symmetry. No need to start with trains and lightning bolts, as Matthew Rave explained so well. For the more visually inclined there is an ingenious and nearly purely graphical way, called k-calculus (that is however seldom taught AFAIK – I had stumbled upon it once in a German book on relativity).

From the first principles all the weirdness of length contraction and time dilation follows naturally.

But is there a way to understand it a bit better though?

Macke also starts from the Michelson-Morely experiment  – and he adds the fact that it can be “explained” by Lorentz’ contraction hypothesis: Allowing for direction-dependent velocities – as in “ether theory” – but adding the odd fact that rulers contract in the direction of the unobservable absolution motion makes the differences the rays of light traverse go away. It also “explains” time dilatation if you consider your typical light clock and factor in the contraction of lengths:

Light clock

The classical light clock: Light travels between two mirrors. When it hits a mirror it “ticks”. If the clock moves relatively to an observer the path to be traversed between ticks appears to be longer. Thus measurement of time is tied to measurement of spatial distances.

However, length contraction could be sort of justified by tracing it back to the electromagnetic underpinnings of stuff we use in the lab. And it is the theory of electromagnetism where the weird constant speed of light sneaks in.

Contraction can be visualized by stating that like rulers and clocks are finally made from atoms, ions or molecules, whose positions are determined by electromagnetic forces. The perfect sphere of the electrostatic potential around a point charge would be turned into an ellipsoid if the charge starts moving – hence the contraction. You could hypothesize that only “electromagnetic stuff” might be subject to contraction and there might be “mechanical stuff” that would allow for measuring true time and spatial dimensions.

Thus the new weird equations about contracting rulers and slowing time are introduced as statements about electromagnetic stuff only. We use them to calculate back and forth between lengths and times displayed on clocks that suffer from the shortcomings of electromagnetic matter. The true values for x,y,z,t are still there, but finally inaccessible as any matter is electromagnetic.

Yes, this explanation is messy as you mix underlying – but not accessible – direction-dependent velocities with the contraction postulate added on top. This approach misses the underlying simplicity of the symmetry in nature. It is a historical approach, probably trying to do justice to the mechanical thought experiments involving trains and clocks that Einstein had also used (and that could be traced back to his childhood spent basically in the electrical engineering company run by his father and uncle, according to this biography).

What I found fascinating though is that you get consistent equations assuming the following:

  • There are true co-ordinates we can never measure; for those Galileian Transformations remain valid, that is: Time is the same in all inertial frames and distances just differ by time times the speed of the frame of reference.
  • There are “apparent” or “electromagnetic” co-ordinates that follow Lorentz Transformations – of which length contraction and time dilations are consequences.

To make these sets of transformations consistent you have to take into account that you cannot synchronize clocks in different locations if you don’t know the true velocity of the frame of reference. Synchronization is done by placing an emitter of light right in the middle of the two clocks to be synchronized, sending signals to both clocks. This is correct only if the emitter is at rest with respect to both clocks. But we cannot determine when it is at rest because we never know the true velocity.

What you can do is to assume that one frame of reference is absolutely at rest, thus implying that (true) time is independent of spatial dimensions, and the other frame of reference moving in relation to it suffers from the problem of clock synchronization – thus in this frame true time depends on the spatial co-ordinates used in that frame.

The final result is the same when you eliminate the so-called true co-ordinates from the equations.

I don’t claim its the best way to explain special relativity – I just found it interesting, as it tries to take the just hypothetical nature of 4D spacetime as far as possible while giving results in line with experiments.

And now explaining the really important stuff – and another historical detour in its own right

Yes, I changed the layout. My old theme, Garland, had been deprecated by I am nostalgic – here is a screenshot –  courtesy to visitors who will read this in 200 years. with theme Garland using theme Garland – from March 2012 to February 2014 – with minor modifications made to colors and stylesheet in 2013.

I had checked it with an iPhone simulator – and it wasn’t simply too big or just “not responsive”, the top menu bar boundaries of divs looked scrambled. Thus I decided the days of Garland the three-column layout are over.

Now you can read my 2.000 words posts on your mobile devices – something I guess everybody has eagerly anticipated.

And I have just moved another nearly 1.000 words of meta-philosophizing on the value of learning such stuff (theory of relativity, not WordPress) from this post to another draft.

Why Fat Particles Radiate Less

I am just reading Knocking on Heaven’s Door by Lisa Randall which has a chapter on the impressive machinery of Large Hadron Collider. The LHC has been built to smash proton beams against each other: Protons, not electrons. Why protons? I stumbled upon the following statement:

“But accelerated particles radiate, and the lighter they are, the more they do so”.

Electrons would cause higher radiation losses and less energy would be available for the creation of new particles in collisions.

But why is this so? In order to prove this, you would go through a calculation of the electromagnetic field generated by the moving particles based on Maxwell’s equations (which are relativistic per default).

I think you can understand it qualitatively from this chain of reasoning:

If a particle is forced to move on a curved path, it is accelerated – such as planets are accelerated all the time by the gravitational force exerted by the sun.

Consider a curved part of the LHC’s trajectory – the radius is given. The acceleration of particles moving in circles is equal to v2/R with R being the radius of curvature and v the speed of the particle. So acceleration increases with increasing speed.

Charged particles lose energy via electromagnetic radiation when they are accelerated. This can be understood from conservation of energy: If a particle would be slowed down in free space (friction due to collisions with particles in the atmosphere being not an option), the energy has to go somewhere. If a particle is accelerated, some force does work on it (which is also true for an orbiting particle.
This argument has been used to prove the classical model of the atom as a miniature solar system wrong: If an electron would orbit round the core it would lose energy and finally ‘fall down’ into the core. So we need a quantum mechanics to explain the stability of atoms.

Particles are accelerated by electrical fields: the energy transferred to particles of the same electrical charge would be the same for a proton or an electron (except the sign). For particles with velocities close to the speed of light relativistic effects cannot be neglected so the energy of a particle of rest mass m and velocity v is (c = speed of light)

For smaller velocities this reduces to the sum of the ‘rest mass energy’ mc2 and the kinetic energy mv2/2.

If the energy is given a particle with higher rest mass would exhibit a smaller velocity. Thus its acceleration in a toroidal tube (~v2) would be smaller.

CERN LHC Tunnel1

LHC Tunnel, CERN (Wikimedia)


Further reading and notes

– Note that I prefer to call the ‘rest mass’ just ‘mass’ – I would not introduce a so-called relativistic mass.
– As usual, I am recommending Feynman’s Physics Lectures. Even in Volume 1 he gives a concise introduction on the radiation of charges. Actually he states that he developed an expression for the electrical field caused by a single point charge for the purpose of this lecture only that had not been published elsewhere before.  Volume 2 comprises electrodynamics in depth.

Unification of Two Phenomena Well Known

Unification is a key word that invokes some associations: The Grand Unified Theory and Einstein’s unsuccessful quest for it, of course the detection of the Higgs boson and the confirmation of the validity of the Standard Model of Particle Physics, or Kepler’s Harmonices Mundi.

Unification might be driven by the search for elegance and simplicity in the universe. Nevertheless, in retrospect it might be presented as down-to-earth and straight-forward.

Electricty and magnetism have been considered distinct phenomena until they have been “unified” by describing them by Maxwell’s equation that are consistent with the theory of relativity. What does this mean?

I am summarizing the explanation given by Richard Feynman in chapter 13 of volume II of his Physics Lectures:

Consider a wire carrying an electric current and a small test charge near the wire. The test charge moves with constant velocity and follows a path parallel to the wire. The wire comprises positive ions and free electrons and is this electrically neutral, so it exerts no electrical force on the test charge. However, the motion of the electrons gives rise to a magnetic field. Since the test charge is moving, the magnetic field gives rise to a force (the Lorentz force) that makes the charge move in a direction perpendicular to the wire (the sign depends on the type of charge. A negative charge would be attracted to the wire if it travels in the same direction as the electrons in the wire).

Now imagine you would watch this experiment from the perspective of an observer who moves with a velocity equal to the velocity of the test charge. The charge is now at rest. If the test charge had moved with exactly the same speed as the electrons before (this is an assumptions made for the sake of simplicity), from the travelling observer’s perspective the electrons in the wire would be at rest and the positive ions would be moving. So since some carriers of charge do still move, a magnetic field would also exist in that frame of reference. However, the field would not exert a magnetic force on the test charge that is now standing still.

If the charge would move towards the wire an eventually hit it in one frame of reference, the same effect needs to be observed in the other. What kind of force would be accountable for that in the second frame of reference?

It is an electrical force and it is due to the fact that the wire is electrically charged in the second system. Electrical charges of particles do not change with switching to different inertial frame, but dimensions parallel to the relative velocity do. And thus does the charge density – the charge per unit volume or per unit length of the wire. If there a charge density ρ  is measured by an observer at rest, the observer in motion relative the charges measures a larger charge density because the volume has shrunk by a factor of √(1 – v²/c²) (This is the infamous factor appearing in all kinds of equations in relativity, c being the speed of light in vacuum). If charge density changes, there is a net overall charge per unit volume.

Why do the swap of the roles of positive and negative charges not compensate for that? The travelling electrons turned to static charges and the static ions turned to moving positive charges. Remember that the wire is electrically neutral in the system considered first. Thus in this system the charge density of electrons is larger than their density measured in the travelling system. Switching to the latter system, the correction factors are applied to each type of charge in a different way – starting from the densities measured in the system “at rest”: The ion density is increased as these are moving now, but the electron charge density is reduced, as we have measured the increased density in the other system.

Actually, the forces turn out be different by a factor equal to the square root mentioned above, the force is smaller for system 2. But this is needed for consistency: The effect of the force is measured by its impact – its momentum. In special relativity the momentum is often illustrated by the penetration depth of a bullet (driven into some material, in a direction perpendicular to the relative velocity). Momentum is force times the interval of time the force is acting on a particle. But time is dilated according to special relativity, that is: time intervals appear longer if the particle is moving (system 1). Thus by calculating the product of force and time interval, the factors cancel out exactly.

In summary, the forces of electricity and magnetism morph into each other – dependent on the frame of reference chosen. They are two aspects of some underlying “unified” force. On the one hand, this changed the way we think about the electromagnetism.

On the other hand – technically it just means that we take the components of electrical and magnetic fields (3 numbers each – these are vectors) and stuff them into some more general mathematical structure consisting of 6 numbers (This is called a tensor). This sounds simple and there is a reason for that: Historically, Maxwell’s equation that govern the spatial and temporal evolvement of electrical and magnetic fields have been laid down before Einstein developed the theory of special relativity. Maxwell’s equations had already been consistent with special relativity and they did not need amendment – as Newton’s law. So unification did exist already – mathematically, but the consequences had not been fully understood.