Monday, October 18, 2010

Refining the Peak Oil Rosy Scenario Part 7: An improved logistic model

Inherent problems with the logistic equation

During the course of modeling my simulated data sets (especially the data set which assumed a 2%/yr decline in “a” after the peak in production) I realized that although the NLLS model made a reasonable estimate of the average “a” for the 5-year time spans considered (see part 4; Results for non-linear least squares analysis of simulated data) the fits to the data do not look particularly good.  That is the fits seem to have a fairly high sum of residual sums of squares (Srss) difference between the true values of dP/dt and the predicted values from the best fit. 

This is illustrated below, for the simulated data set were Q∞ =170 bbls, “a’ = 0.0687 yr-1 up to the peak production year 1956 and thereafter “a” is assumed to decline 2%/yr.

Figure 17 shows the best fits from NLLS analysis of the twenty year span on the decline side of the production curve (1956-1975), using different assumptions as part of the fitting process:

1) Fix Q∞ at a constant value of 170 but vary “a” to get the best fit (red line)
2) Fix “a” at a constant value of 0.0687 but vary Q∞ to get the best fit (blue line)
3) Let both “a” and Q∞ vary to get the best fit (brown line)




The best fit values and rss for each of these fits along with the true value (i.e., the average value over the same time span) is summarized below:


Table 3: summary of best fit values to 1956-1975 time span shown in Figure 17
Model assumptions
best fit or fixed value of “a”
true value “a”
best fit or fixed value of Q∞
true value Q∞
Srss
Q∞ fixed; “a” variable
0.0592


0.0571

170 (fixed)


170
1.310
Q∞ variable; “a” fixed
0.0687 (fixed)
155
0.299
Both Q∞ and “a” variable
0.0782
148
0.028

It is apparent the best fits where either Q∞ is fixed and “a” is varied or “a” is fixed and Q∞ is varied are not particularly good in terms of minimizing the rss, as compared to the case where both “a” and Q∞ are varied.  For the fit done with Q∞ fixed to the true value and “a” varied the best fit value is pretty close to the true value (the same as found in Part 4, when fitting 5 year spans) but the fit is poor.  It seemed odd to me that you could get a better fit by fixing “a” to a value you know is wrong (i.e., pre-decline value of “a”) and varying the value of Q∞.  Although the best fit was obtained when varying both “a” and Q∞, the best values of these parameter are is poor agreement with the true values.  That is, “a” is over estimated by 37% and Q∞ is under estimated by 17%. 

To gain a better understanding of what is going on here, I found it useful to make a series of plots using different values of “a” (with Q∞ fixed to 170) or of Q∞ (with “a” fixed to the growth side value of 0.0687).  These plots are shown in Figures 18 and 19, respectively. 



Now we start to see how these parameters operate to define the production (dQ/dt) versus time curve.  In general varying “a” causes the simulated curve to move up and down the vertical plus there is an effect on the negative slope of the curve.  Notice that at the right-hand side of the span (1975) the effect of changing “a” on moving the curve up and down is quite small as compared to at the left-hand side of the span (1956).  In general varying Q∞ also moves the simulated curve up and down but with little effect on the slope of the curve.  It is now clear to see what happens when one of “a” or Q∞ is fixed and the other is varied: basically the simulated curve is moved up or down so that it crosses through about the mid-point of the 20-year span of the simulated data.  I can also start to see what happen when both “a” or Q∞ are allowed to vary:  NLLS analysis finds a low Q∞ to lower the simulated curve and chooses a high “a” value to increase the slope of the simulated curve so as to match the data. 

This analysis shows that there are problems inherent in the logistic equation when used to model production data where there is a declining value in the rate constant for production, as simulated here.  A NLLS fit to the decline side data, where the value of Q∞ is fixed, and “a” is allow to vary, can give reasonably good estimates of the value of “a” but the fit will be poor.  A NLLS fit where both Q∞ and “a” are allowed to vary will tend to over-estimate “a” and underestimate Q∞.

I can imagine what would happen in the inverse case for production data where after the peak, there was an increase in the rate constant for production: a NLLS fit where both Q∞ and “a” are allowed to vary will tend to under-estimate “a” and overestimate Q∞.  That is, the NLLS analysis will find a high Q∞ to raise the simulated curve and choose a low “a” value to decrease the slope of the simulated curve to match the data. 

We need a better model to specifically account for a changing “a” on the decline side of the production curve.

An improved logistic equation to account for changes in “a” on the decline side of the production curve.

Let’s consider again the logistic equation (equation [3] from Part 4):

dQ/dt = (Q∞ / (1 + No∙e–a(t-to)))/(Dt),            [3]

where No = (Q∞-Qo)/Qo,  Dt = time increment (1 year).

Also let’s consider how I generated the simulated data for the case where “a” was assumed to decline by some percentage amount per year (say 2%/yr).  That is, in 1957, a equals a(1956) x 0.98; in 1958, a equals a(1957) x 0.98; etc....

It is easy to see that the value of “a” in any one year a(t) on the decline side will be given by:

a (t) = ap∙ (fca)(t – td),              [4]

where ap equals the rate constant estimated from the growth-side of the production curve, td equals the year where the change in a is believed to start, and fca is the fractional yearly change in “a.” 

For example, in the case where a 2%/yr decrease in “a” occurs beginning in 1957, td equals 1957 and fca equals 0.98.  And, in 1958, a(1958) equals ap x 0.982 (i.e., ap x.98 x 0.98);  in 1959,a(1959) equals ap x 0.983, etc...

Now we can substitute “a” from equation [3] with a(t) from equation [4] to produce the improved logistic equation:

dQ/dt = Q∞ / (1 + No∙exp(-(ap * (fca)(t – td))∙(t-to))) / (Dt),     [5]

where No = (Q∞-Qo)/Qo.

Notice that if we just fix “fca” equal to 1 then equation [5] reduces to equation [3] (i.e., “1” to the power of any whole number is still “1”), so equation [5] is useful for analyzing data with and without the assumption of a change in “a” on the decline side.

Figure 20 shows the NLLS fit to the same time span depicted in Fig. 17, with the values of Q∞ and “a” fixed to the growth side values of 170 and 0.0687, respectively and fca allowed to vary.



Table 4 compares the best fit values summarized in Table 3 with the best fit using the new logistic equation:

Table 4: summary of best fit values to 1956-1975 time span shown in Figure 17 and Figure 20
Model assumptions
best fit or fixed value of “a”
true value “a”
best fit or fixed value of Q∞
true value Q∞
Srss
Q∞ fixed; “a” variable
0.0592


0.0571

170 (fixed)


170
1.310
Q∞ variable; “a” fixed
0.0687 (fixed)
155
0.299
Both Q∞ and “a” variable
0.0782
148
0.028
Both Q∞ and “a” and fca variable
0.0687 (fixed)
fca = 0.0981
170 (fixed)

0.013

As one might expect the fit to the simulated is excellent although not perfect—after all we are fitting the simulated data with an equation that corresponds to how the data was created in the first place.  The reason why the fit is not a perfect fit, as indicated by the non-zero Srss, is because I choose 1956 as the first data point—but 1956 was actually the last year before the value of “a” was decreased by 2%.  Indeed, if I redo the fit to a 19-year span from 1957 to 1975 the best fit gives a  Srss equal to zero and fca equals 0.098 (actually, 0.97996, which is about 0.98). 

An improved logistic equation to account for changes in Q∞ on the decline side of the production curve.

This is somewhat anticipating the re-analysis of the USA production data, but, there is the possibility that Q∞, the total recoverable oil, will increase over time.  This could occur because more oil is discovered, or, because the techniques to recover the oil have become more efficient with time.  We need to have the ability to model this, similar to modeling to detect changes in the production rate constant, “a.” 

Analogous to the procedure described above, we can modify the logistic equation to account for yearly fractional changes in Q∞ on the decline side of the production curve, by defining Q∞(t) as follows: 

Q∞(t) = Q∞p∙ (fcq)(t – td),              [6]

where Q∞p equals the total recoverable oil estimated from the growth-side of the production curve, td equals the year where the change in Q∞ is believed to start, and fcq is the fractional yearly change in Q∞.  For example, in the case where a 0.5%/yr increase in Q∞ starts to occur on the decline side of the curve in 1956, then td equals 1956 and fcq equals 1.005.

Now we can substitute Q∞ from equation [3], where we have explicitly written out the value of No = (Q∞-Qo)/Qo, with Q∞ (t) from equation [6] to yield the improved logistic equation:

dQ/dt = (Q∞p∙ (fcq)(t – td))/ (1 + (((Q∞p∙ (fcq)(t – td))-Qo)/Qo)∙exp(-a∙ (t-to)))/(Dt)),            [7]

A further-improved logistic model to account for either or both changes in “a” and changes in Q∞ on the decline side of the production curve.

Now, if we include both expression for Q∞ (t) and a(t) into the logistic equation we have:

dQ/dt = Q∞ / (1 + ((Q∞-Qo)/Qo)∙exp(-(ap * (fca)(t – td))∙(t-to)))/(Dt),                [8]

dQ/dt = (Q∞p∙ (fcq)(t – td)) / (1 + (((Q∞p∙ (fcq)(t – td))-Qo)/Qo)∙exp(-(ap * (fca)(t – td))∙(t-to)))/(Dt),    [9]

If you think equation 9 is starting to look a bit complicated, you should try inputting it as an equation into an EXCEL spreadsheet—great fun, but, it can be done (hint: it helps to use simulated data so that you can verify that the equation has been correctly entered!!).

The advantage of using equation 9 is its great flexibility.   For example, by fixing fca and fcq to 1, equation 9 reduces back to the familiar logistic equation 3.  Or, we can fix fcq to 1, and make fca variable, to examine yearly fractional changes in “a” on the decline side where Q∞ is fixed and has no fractional change.  Or, we can fix fca to 1, and make fcq variable, to examine yearly fractional changes in Q∞  on the decline side where “a” is fixed and has no fractional change.  Or, we can allow both fca and fcq to be variable and fix Q∞   and “a” to their best fit growth-side values. 

Testing the improved logitic model with simulated data where “a” increases on the decline side.

It might seem paradoxical to be modeling for increases in “a” for the decline side of a production curve.  Why in the world would this ever happen?  Think about it, if we are on the decline side, it means that there is no longer growing production.  To try and mitigate the socio-economic consequences of this, a country or company might try to increase the rate of production (i.e., increase “a”) by pumping oil from existing wells at greater rate.  Of course, in the absence of any increase in Q∞, that will just more rapidly deplete whatever remaining oil there is in the long term.  However, this would tend to mitigate the declining production in the short term.

Let try and model this with a new simulated data set and using equation 9 for the modeling.  Once again, I assume Q∞ equals 170 bbls and “a” equals 0.0687 yr-1 on the growth side of the production curve.  Then in the year of peak production, 1956, I assumed that “a” increases by 2%/yr thereafter.

Figure 21 shows the simulated data and NLLS best fits to the data for the first 20 years on the decline side using equation 9, with the various parameters, “a”; Q∞; fca; and fcq either fixed to the best-fit growth side value (i.e., “a”=0.0687; Q∞=170), or to 1 (i.e., one or both of fca and fcq fixed equal to 1), or allowed to vary, as described in the figure legend.



The simulated data on the decline side shows an initial boost in production—the peak year of production actually would be shifted to about 1963.   For this analysis, however I just stuck with examining the twenty year period from 1956-1975.  

It is apparent from the best fit plots shown in Figure 21 that poor fits to this 20-year time span are obtained for the models where one of Q∞, "a" or fcq are varied and all other parameters are fixed to their respective best fit value from the growth-side of the curve (or set to equal 1 in the cases if fca and fcq).  

When both Q∞ and  "a" are allowed to vary (with fca and fcq fixed equal to 1) the fit is better than the above cases.  But the estimate of "a" (0.0647) and Q∞ (197) rather poorly under and over estimate the true values (i.e., the average "a" equals 0.0.0835 for the 20 year time span and Q∞ is held constant at 170).

The best fit, a near exact fit with the ∑rss ~ 0, is for the model where fca is varied and all the other parameters are fixed.  The best fit value of fca is 1.019947 which is about equal to a 2% per year increase, which is what was assumed to generate the simulated data in the first place.

Testing improved logistic model with simulated data where Q∞ increases on the decline side.

To further test equation 9, I generated another simulated data set.  Once again, I assume Q∞ equals 170 bbls and “a” equals 0.0687 yr-1 on the growth side of the production curve.  Then, in the year of peak production, 1956, I assumed that Q∞ increases by 0.5%/yr thereafter.

Analogous to Figure 21, Figure 22 shows the simulated data and NLLS best fits to the data for the first 20 years on the decline side using equation 9, with the various parameters, “a”; Q∞; fca; and fcq either fixed to the best-fit growth side value, or allowed to vary, as described in the figure legend.


Similar to the data shown in Figure 21 the simulated data on the decline side shows an initial slight increase in production, and the peak year of production is shifted to about 1960.  Again, for this analysis, however I just stuck with examining the twenty year period from 1956-1975.  

In have not shown the best fits for the cases where only "a" or only fca were varied and the other parameter were fixed to their best-fit growth side values (or to 1 for fcq), as these fits were quite poor and looked about the same as depicted in the previous figures for other simulated data.  

I did show in Figure 22,  the case of where Q∞ was varied (and all other parameters fixed) to illustrate that, even though the best fit value of Q∞ (182) was close to the true value of 179 (the average value over the 20-year span), the fit to the simulated data was poor.  

Again when both Q∞ and  "a" are allowed to vary (with fca and fcq fixed equal to 1) the fit is better than the above cases.  But again the But the estimate of "a" (0.0604) and Q∞ (196) rather poorly under and over estimate the true values (i.e., "a" held constant at 0.0687; the average Q∞ equals 179 for the 20 year time span).

As expected, the best fit, a near exact fit with the ∑rss ~ 0, is for the model where fcq is varied and all the other parameters are fixed.  The best fit value of fcq is 1.004987 which is about equal to a 0.5% per year increase, which is what was assumed to generate the simulated data in the first place.

Conclusions

Based on this and the previous analysis done Parts 4 & 6, I have revised my approach for modeling real oil production and consumption data as follows:

1) Obtain estimates of Q∞ and "a" from the NLLS best fits to the growth side production data, and progressive increments of the decline side data.  If the NLLS model blows up because there is no decline side data or data with a lot of scatter, use the hybrid linear logistic model to estimate Q∞ and "a."

2) Examine the most recent 10-20 year time span of the decline-side production data to see if there are discernable trends of a departure from the best fits using the logistic model where a constant Q∞ and "a" are assumed.

3) If there is a recent time span with a discernable departure detected in (2), apply NLLS and the improved logistic model to estimate to fractional change in "a" (fca) or Q∞ (fcq), or both, for this time span.

4) Apply the best fit model results from (1) or (3) to predict the production rates out to 2030.

5) Use the predicted production rates as part of the export land modeling for the USA (remember, that's what this whole series was about in the first place).  

Okay, its time to step away from modeling simulated data and enter world of real data—back to the USA!

No comments:

Post a Comment

Your comments, questions and suggestions are welcome! However, comments with cursing or ad hominem attacks will be removed.