Analytical conduction adjoint
The cost function is described as:
$$ J(x) = \frac{(M(x)-y)^2}{\sigma_y^2} + \frac{(x-x_0)^2}{\sigma_x^2} $$
where,
- $x$ are the unknowns, $k$ and $H$ in our case
- $x_0$ are prior estimates of unknowns
- $\sigma_x$ are the uncertainty of these prior estimates
- $M$ is the model we are solving, in our case the heat equation
- $y$ are observables on the solution, surface heat flow in our case
- $\sigma_y$ is the uncertainty of our observables
We are solving the steady state heat equation which requires input variables $k$, $H$, and $q_0$. The second part of this equation can be written as:
$$ \sum \frac{(k-k_0)^2}{\sigma_k^2} , \frac{(H-H_0)^2}{\sigma_H^2} , \dots $$
where all of our prior estimates and uncertainties are added to the cost function.
The temperature, $T$, with respect to depth, $z$ is:
$$ T(z) = T_0 + \frac{Q_0 z}{k} + \frac{H dz^2}{k} \left( 1-e^{-z/dz} \right) $$
we can formulate this into a forward model where the cost function is returned:
def f(x):
''' forward model, returns the cost function J(x) '''
k, H, Q0 = tuple(x)
T = T0 + Q0*z/k + H*dz**2/k * (1-exp(-z/dz))
q = k * (T[1]-T[0]) / (z[1]-z[0])
return (q-q_obs)**2/sigma_q**2 + (k-k_prior)**2/sigma_k**2 + (H-H_prior)**2/sigma_H**2 + (Q0-Q0_prior)**2/sigma_Q0**2
q_obs
and sigma_q
are surface heat flow observations and uncertainty respectively.
The right hand side of this equation is all our prior estimates of $k$, $H$, and $Q_0$.
These are all summed to the cost function.
Note that k
, H
, and Q0
are the only parameters that change.
Now the tangent linear model uses the derivatives of each line in the forward model with respect to k
, H
, and Q0
.
def f_tl(x, dx):
''' tangent linear model '''
k, H, Q0 = tuple(x)
dkdx, dHdx, dQ0dx = 1, 1, 1
dk, dH, dQ0 = tuple(dx)
# T = f(k, H, Q0)
T = T0 + Q0*z/k + H*dz**2/k * (1-exp(-z/dz))
dTdk = -Q0*np.array(z)/k**2 - H*dz**2/k**2 * (1-exp(z/dz))
dTdH = dz**2/k * (1-exp(z/dz))
dTdQ0 = z/k
dT = dTdk*dk + dTdH*dH + dTdQ0*dQ0 # similar to the dot product
# q_model = f(T0, T1, k)
q_model = k * (T[1]-T[0])/(z[1]-z[0])
dqdT1 = k / (z[1]-z[0])
dqdT0 = -k / (z[1]-z[0])
dqdk = (T[1]-T[0]) / (z[1]-z[0])
dq_model = dot([dqdT0, dqdT1, dqdk], [dT[0], dT[1], dk])
# return f(q_model, k, H, Q0)
d2q = (2*q_model-2*q_obs)/sigma_q**2
d2k = (2*k-2*k_prior)/sigma_k**2
d2H = (2*H-2*H_prior)/sigma_H**2
d2Q0 = (2*Q0-2*Q0_prior)/sigma_Q0**2
d2x = dot([d2q, d2k, d2H, d2Q0], [dq_model, dk, dH, dQ0])
return d2x
This returns reasonable values of $df$ providing the change in our variables ($k, H, Q_0$) are relatively small. Now to move onto the adjoint model…
We backwards-propagate a small change in $f$ through the function, starting with the last line of the tangent linear model. For every element ($k, H, Q_0$) that appears in more than 1 line, we need to accumulate them and be careful not to overwrite them.
def f_ad(x, df):
''' adjoint model '''
k, H, Q0 = tuple(x)
T = T0 + Q0*z/k + H*dz**2/k * (1-exp(z/dz))
q_model = k * (T[1]-T[0])/(z[1]-z[0])
d2q = (2*q_model-2*q_obs)/sigma_q**2
d2k = (2*k-2*k_prior)/sigma_k**2
d2H = (2*H-2*H_prior)/sigma_H**2
d2Q0 = (2*Q0-2*Q0_prior)/sigma_Q0**2
dq_ad = d2q*df
dk_ad = d2k*df
dH_ad = d2H*df
dQ0_ad = d2Q0*df
dqdT0 = -k / (z[1]-z[0])
dqdT1 = k / (z[1]-z[0])
dqdk = (T[1]-T[0]) / (z[1]-z[0])
dT0_ad = dqdT0*dq_ad
dT1_ad = dqdT1*dq_ad
dk_ad += dqdk*dq_ad
dTdk = -Q0*np.array(z)/k**2 - H*dz**2/k**2 * (1-exp(z/dz))
dTdH = dz**2/k * (1-exp(z/dz))
dTdQ0 = z/k
dk_ad += dTdk[0]*dT0 + dTdk[1]*dT1
dH_ad += dTdH[0]*dT0 + dTdH[1]*dT1
dQ0_ad += dTdQ0[0]*dT0 + dTdQ0[1]*dT1
return dk_ad, dH_ad, dQ0_ad
The adjoint model returns $dk, dH, dQ_0$.
Apart from being able to calculate $dx$ from any $f(x)$, the adjoint model is much more computationally friendly than the TLM.
E.g. the TLM computed T[0]
, T[1]
, T[2]
… T[n]
, but the adjoint only requires computation on elements you actually need - in this case only T[0]
and T[1]
.
When writing the adjoint, only enough of the forward model needs to be computed so that you have the appropriate variables. For most cases this is a complete forward run.