Analytical conduction adjoint

The cost function is described as:

$$ J(x) = \frac{(M(x)-y)^2}{\sigma_y^2} + \frac{(x-x_0)^2}{\sigma_x^2} $$

where,

  • $x$ are the unknowns, $k$ and $H$ in our case
  • $x_0$ are prior estimates of unknowns
  • $\sigma_x$ are the uncertainty of these prior estimates
  • $M$ is the model we are solving, in our case the heat equation
  • $y$ are observables on the solution, surface heat flow in our case
  • $\sigma_y$ is the uncertainty of our observables

We are solving the steady state heat equation which requires input variables $k$, $H$, and $q_0$. The second part of this equation can be written as:

$$ \sum \frac{(k-k_0)^2}{\sigma_k^2} , \frac{(H-H_0)^2}{\sigma_H^2} , \dots $$

where all of our prior estimates and uncertainties are added to the cost function.

The temperature, $T$, with respect to depth, $z$ is:

$$ T(z) = T_0 + \frac{Q_0 z}{k} + \frac{H dz^2}{k} \left( 1-e^{-z/dz} \right) $$

we can formulate this into a forward model where the cost function is returned:

def f(x):
    ''' forward model, returns the cost function J(x) '''
    k, H, Q0 = tuple(x)
    T = T0 + Q0*z/k + H*dz**2/k * (1-exp(-z/dz))
    q = k * (T[1]-T[0]) / (z[1]-z[0])
    return (q-q_obs)**2/sigma_q**2 + (k-k_prior)**2/sigma_k**2 + (H-H_prior)**2/sigma_H**2 + (Q0-Q0_prior)**2/sigma_Q0**2

q_obs and sigma_q are surface heat flow observations and uncertainty respectively. The right hand side of this equation is all our prior estimates of $k$, $H$, and $Q_0$. These are all summed to the cost function. Note that k, H, and Q0 are the only parameters that change.

Now the tangent linear model uses the derivatives of each line in the forward model with respect to k, H, and Q0.

def f_tl(x, dx):
    ''' tangent linear model '''
    k, H, Q0 = tuple(x)
    dkdx, dHdx, dQ0dx = 1, 1, 1
    dk, dH, dQ0 = tuple(dx)

    # T = f(k, H, Q0)
    T = T0 + Q0*z/k + H*dz**2/k * (1-exp(-z/dz))
    dTdk = -Q0*np.array(z)/k**2 - H*dz**2/k**2 * (1-exp(z/dz))
    dTdH = dz**2/k * (1-exp(z/dz))
    dTdQ0 = z/k
    dT = dTdk*dk + dTdH*dH + dTdQ0*dQ0 # similar to the dot product

    # q_model = f(T0, T1, k)
    q_model = k * (T[1]-T[0])/(z[1]-z[0])
    dqdT1 = k / (z[1]-z[0])
    dqdT0 = -k / (z[1]-z[0])
    dqdk = (T[1]-T[0]) / (z[1]-z[0])
    dq_model = dot([dqdT0, dqdT1, dqdk], [dT[0], dT[1], dk])

    # return f(q_model, k, H, Q0)
    d2q = (2*q_model-2*q_obs)/sigma_q**2
    d2k = (2*k-2*k_prior)/sigma_k**2
    d2H = (2*H-2*H_prior)/sigma_H**2
    d2Q0 = (2*Q0-2*Q0_prior)/sigma_Q0**2
    d2x = dot([d2q, d2k, d2H, d2Q0], [dq_model, dk, dH, dQ0])

    return d2x

This returns reasonable values of $df$ providing the change in our variables ($k, H, Q_0$) are relatively small. Now to move onto the adjoint model…

We backwards-propagate a small change in $f$ through the function, starting with the last line of the tangent linear model. For every element ($k, H, Q_0$) that appears in more than 1 line, we need to accumulate them and be careful not to overwrite them.

def f_ad(x, df):
    ''' adjoint model '''
    k, H, Q0 = tuple(x)
    T = T0 + Q0*z/k + H*dz**2/k * (1-exp(z/dz))
    q_model = k * (T[1]-T[0])/(z[1]-z[0])

    d2q = (2*q_model-2*q_obs)/sigma_q**2
    d2k = (2*k-2*k_prior)/sigma_k**2
    d2H = (2*H-2*H_prior)/sigma_H**2
    d2Q0 = (2*Q0-2*Q0_prior)/sigma_Q0**2

    dq_ad = d2q*df
    dk_ad = d2k*df
    dH_ad = d2H*df
    dQ0_ad = d2Q0*df


    dqdT0 = -k / (z[1]-z[0])
    dqdT1 = k / (z[1]-z[0])
    dqdk = (T[1]-T[0]) / (z[1]-z[0])

    dT0_ad = dqdT0*dq_ad
    dT1_ad = dqdT1*dq_ad
    dk_ad += dqdk*dq_ad


    dTdk = -Q0*np.array(z)/k**2 - H*dz**2/k**2 * (1-exp(z/dz))
    dTdH = dz**2/k * (1-exp(z/dz))
    dTdQ0 = z/k

    dk_ad += dTdk[0]*dT0 + dTdk[1]*dT1
    dH_ad += dTdH[0]*dT0 + dTdH[1]*dT1
    dQ0_ad += dTdQ0[0]*dT0 + dTdQ0[1]*dT1

    return dk_ad, dH_ad, dQ0_ad

The adjoint model returns $dk, dH, dQ_0$.

Apart from being able to calculate $dx$ from any $f(x)$, the adjoint model is much more computationally friendly than the TLM. E.g. the TLM computed T[0], T[1], T[2]T[n], but the adjoint only requires computation on elements you actually need - in this case only T[0] and T[1].

When writing the adjoint, only enough of the forward model needs to be computed so that you have the appropriate variables. For most cases this is a complete forward run.

Dr. Ben Mather
Dr. Ben Mather
Research Fellow in Geophysics

My research interests include Bayesian inversion, Python programming and Geodynamics.