Saturday, October 30, 2010

Trends in USA Petroleum Production and Consumption

Introduction
As I hope was made clear from my previous series, “Refining the Peak Oil Rosy Scenario,” the logistic equation developed by Hubbert for the analysis of oil production data (henceforth “Hubbert’s Equation”), has a number of inherent limitations, especially when it comes to modeling the decline-side of the production curve.  In brief, Hubbert’s Equation inherently assumes that the increasing side and the declining side of the production curve will be symmetric and can be accurately described by a single rate constant for production “a” and single total recoverable amount of oil, Q∞.  In particular, Hubbert’s Equation inherently assumes that a peak in production will occur and the decline side of the production curve will be the mirror image of the growth-side of the curve.  This model cannot specifically account for changes (i.e., increases or decreases) in the production rate constant “a” or in total recoverable amount of oil, Q∞. that may occur on the decline-side of the curve.

As I showed in Refining the Peak Oil Rosy Scenario,” when the assumptions of a constant “a” and Q∞ do not hold, and for instance, “a” or Q∞ have yearly fractional changes, the best fits using Hubbert’s Equation give systematically poor estimates of “a” or Q∞ as compared to the true values (in this case, known values because the data in my analysis was simulated data).  Consequently, the predicted future production rates can be radically different from the true value.  Using the linearized version of Hubbert’s Equation does not cure this problem.

To address this, I modified Hubbert’s Equation to include the possibility of accounting for fractional yearly changes in “a” (fca) or in Q∞ (fcq), as compare to the best fit estimates for “a” and Q∞,  from the growth side and plateau region of the production curve.  The testing of this approach using simulated data is described in part 7 of the “Refining the Peak Oil Rosy Scenario” series.

Some hypotheses to consider
How might society responds to the experience of hitting peak oil production?

On one hand, there may be an effort to conserve this important depleting resource by decreasing production.  In this case I would expect to see, at some point at, or past, the plateau in production rate, a fractional yearly decrease (i.e., fca < 1)  in “a” as compared to “a” on the growth side of the production curve.  On the other hand, there may be an attempt to step up production in the hopes of continuing economic growth, maybe at least until alternative\ energy sources are found or developed, or because the price of oil has increased.  In this case, I would expect to see a fractional yearly increase in “a”   (i.e., fca > 1). 

Either an increasing or decreasing “a” could occur in the face of changing total recoverable oil, Q∞. 

In some cases, one or more of: increasing costs of oil recovery (e.g., due to a decreasing EROEI), decaying infrastructure, increasing regulations or outright prohibitions on pumping (e.g., consider the moratorium in the Gulf or off the Coast or California), the effective total amount of oil that is recoverable could decrease, as compared to the estimate of Q∞ from mainly the growth side of the production curve (i.e., fcq < 1).   Alternatively, there could be a technological improvement in the recovery of oil (i.e., a greater percentage of oil in the ground can be extracted), or, new oil sources could be put into production (i.e., fcq > 1).

Of course, none of the above may occur, in which case there would no signs of a change in “a” or in Q∞ (i.e., fca =1 and fcq =1).  In this case, the decline side of the production curve should be a mirror-image of the growth side of the curve. 

I suppose that this might be the kind of scenario one would expect if one were monitoring micro-organisms or insects converting a finite fixed amount of an energy source into waste products.  Thought of this way, my main hypothesis, is that when it comes to the use of their “energy source” of petroleum, humans will not behave the same as bugs, and, the growth and decline sides of the production curve will not be symmetric.

The USA data set and procedures for data analysis
The sources of data describing total USA petroleum production was already described and briefly considered in Part 5 of the Refining the Peak Oil Rosy Scenario” series. 

I relied on the EIA's Table 5.1  Petroleum Overview, which provides yearly production and consumption data back to 1949 for the USA.  As I will discuss in some detail later on, Table 5.1 also give break down in the production data into three general sources. 

Non-linear least squares (NLLS) analysis of total production
As in the preliminary analysis done in Part 5, I started my NLLS analysis with the whole span of the production data from1949 to 2009 and used Hubbert’s equation (e.g, eqaution [3] from Part 5) to obtain the best fit to this data. 

Then I repeated the NLLS analysis, again using Hubbert’s equation, to obtain best fits to progressively smaller data sets: 1949 to 1999, 1949 to 1989, 1949 to 1979 and 1949 to 1969.

Figure 1 shows the best fit result from the NLLS analysis using the Hubbert’s equation for each of these time ranges.  All of the parameters “a,” Qo and Q∞ were allowed to vary to minimize the sum of the residual sums of squares (Srss).


Table 1 summarizes the best fit Srss, “a”, Qo and Q∞ for each of these time ranges.


Discounting the 1949-69 period, where the NLLS analysis blows up (which is, as discussed previously, is likely because there was not enough plateau in the production curve to estimate Q∞), the longer the time period considered, the larger the Q∞ and the smaller the “a.” 

This suggested to me that the production curve is not following a simple symmetric relationship where the growth and decline sides of the curve are identical. 

For instance, look at the fit to the 1949-79 range of data in Figure 1: the fit on the decline side of the curve departs widely from the measured production data. Every measured value from 1980-2008 is to the right of the curve predicted from the best fit parameters.  The same trend is present, though less prominent, for the best fits to the 1949-89 and 1949-99 ranges:  for instance, all of the last fifteen years of production data are to the right of the best fit curves. 

Modified analysis of the production data from 1980-2009
I used the best fit values of “a”, Qo and Q∞ obtained from fitting the 1949-79 using Hubbert’s equation, as “fixed” parameters for a subsequent fit to the 1980-2009 production data, using equation [9], derived in Part 7 of the Refining the Peak Oil Rosy Scenario” series:

dQ/dt = (Q∞p∙ (fcq)(t – td)) / (1 + (((Q∞p∙ (fcq)(t – td))-Qo)/Qo)∙exp(-(ap * (fca)(t – td))∙(t-to)))/(Dt),    [9]

I put quotes around “fixed” because, of course, I have to use these best parameters of “a”, Qo and Q∞ to calculate a new Qo for 1979 and then use this value as the starting input for equation [9] to use to analyze the 1980-2009 period.  But, “a” and Q∞ were fixed to the values shown in the 1949-1979 column shown in Table 1 throughout the analysis.  The parameters fca or fcq, or both, were allow to be variable for the fit to the 1980-2009 period.  

Figure 2 shows the best fit to the 1980-2009 data (solid green line) when, along with “a” and Q∞, fca was also fixed equal to 1 (i.e., fca is not a parameter in this case) and only fcq was allowed to vary.  That is, the solid green line in Figure 2 shows the best-fit obtained using equation [9] to the 1980-2009 data with “a” fixed to 0.0656 yr-1 and Q∞ fixed to 233 bbs, fca eqauls 1 and fcq equal to 1.0053.

For comparision, in Figure 2, I also included the full curves, previously shown in FIG. 1, for the best-fit to the full 1949-2009 data (dashed red line) and the 1949-1979 data (solid blue up to 1979 then dashed from 1980-2009) using the traditional Hubbert equation. 

Discussion Points

Making long term predictions using the traditional logistic equation is dangerous
One of the most striking features of Figure 2, is how poorly Hubbert’s equation underestimated the present 2009 production rate when using only 1949-1979 data.  Of course, this is due to the assumption inherent in this equation, that the decline side of production will mirror the growth of production.  For instance, if we extend to best-fit out 30 years from 1979, the dashed blue line predicts  2009 yearly production would be ~1.3 bbls/yr instead of the actual value of ~2.6 bbls/yr—a 100% error!  The lesson should be clear: trying to use the traditional Hubbert’s equation to make long term estimates of production is only as good as the inherent assumption of symmetry.

The fit using a variable fcq is a significantly better fit (p-value < 0.05)
Another note-worthy feature from Figure 2 is that the fit to the 1980-2009 subdata set using equation [9] appears to be better than the fit obtained from the fit to the full 1949-2009 data set using Hubbert’s equation, for the 1980-2009 time span of data (i.e., the dashed red line from 1980-2009).  However, is the fit using equation [9] significantly better? 

It is well known that when comparing the fits of two related or “nested” equations to a set of data, the equation with the larger number of parameter will always have an equal or better fit, such as quantified by having a lower Srss.  It is also well known that a way to test if the fit using the larger parameter equation is statistically significant better, beyond simply adding the additional parameter, is the F-test. 

For instance, for the case where n data points are used to estimate parameters for a larger parameter model (i.e., number of parameter = p2) and smaller parameter model (i.e., number of parameters = p1), then the F-statistic is given by:

F = ((Srss (smaller) - Srss (larger) / (p2-p1)) / ((Srss (larger)) / (n-p2))

F is the critical value for a certain probability value (p-value) and the degrees of freedom associated with the two equations (i.e., n-p1 and n-p2 degrees of freedom for the smaller and larger models respectively).  The value of F can be looked up in a statistical table.  Or using EXCEL, the value of F is given by the FINV function (specifically, FINV(p-value, n-p1, n-p2)).

In the present case, for the fit to the 1980-2009 data, the larger model, equation [9], has four parameters (p2 = 4, corresponding to “a”, Qo and Q∞ and fcq; fca was not used) and the smaller model the traditional logistic equation, has three parameters (p1 = 3, corresponding to “a”, Qo and Q∞).  My view is that even though “a”, Qo and Q∞ are fixed to their best fit values from the NLLS analysis of the 1949-1979 data, they still play a roll in determining the best fit to the 1980-2009 data and therefore should be considered as parameters in the larger model.  This tends to make the F-statistic larger, and therefore, more difficult to overcome for a finding of statistical significance, than if equation [9] was considered to only have one parameter, fcq, that is being compared to a three parameter equation, Hubbert’s equation.  

The value of Srss (larger) is equal to 0.3050 which is the best-fit value obtained for this data range using equation [9], and Srss(smaller) is equal to 0.6213 which is the best fit value obtained for this same data range using the Hubbert equation. 

Accordingly, my calculated F value equals 27 and this is greater than the F-statistic (0.01, 27,26) of 2.54.  That is, there is at least a 99% probability that the fit using equation [9] is better than the fit using the traditional logistic Hubbert equation, other than being to due to just random scatter in the data. Or in other words, there is a less than 1% chance that random scatter in the data would explain the better fit of equation 9 to the data as compared to the Hubbert equation.

Similar fits and tests for significantly better fits were done to the 1980-2009 data using equation [9] where fcq was not used (i.e., fcq fixed equal to 1) and fca was allowed to vary and both fcq and fca were allowed to vary.   The best fit when fca was varied was not significantly better than the best fit using the traditional logistic equation (p-value >0.1).  Similarly the best fit when both fac and fcq where allowed to vary was not significantly better than the best fit when only fcq was varied (p-value >0.1). 

But does the best-fit from this modified approached give a significantly better fit to the measured values for the data set as a whole (i.e., the full range from 1949-2009) than the Hubbert equation? Yes it does.

To answer this question, I needed take the sum of Srss for the best-fit of the Hubbert equation to 1949-1979 data (Srss=1.0293; Table 1) plus the Srss for the best fit of equation [9] to the1980-2009 data (Srss=0.3050).  The sum of the two Srss values equals 1.3343.  In comparison the best fit of the Hubbert equation to the 1949-2009 data equals 1.946 (Table 1).  The Hubbert equation, again has three parameters, and the modified equation has four parameters and the total number of data points equals 60. 

Therefore the calculated F value (Fcal) equals:

Fcal = ( (1.946-1.3343) /(4-3) ) / (1.3347/ (60-4)) = 25

Again, this is significant at p < 0.01.

Implications from the modified NLLS analysis
The statistically significantly better fit obtained using equation 9, estimates an annual increase in Q∞ (starting from the Q∞=233 bbs estimate from the fit to the 1949-79 data using Hubbert’s equation) of about +0.53 percent per year through the 1980 to 2009 period.

Why might there be an annual half-percent per year increase in Q∞, the total estimated recoverable oil?  Possible answers to this question were already presented at the beginning of this article: improvements in the recovery of oil or new oil sources put into production. 

Is there any evidence to support either of these scenarios?  Yes, there are.

Improved recovery
Well, as far as improvements in recovery are concerned, according to Leonardo Maugeri, VP for corporate strategies and planning at the Italian energy company ENI, an important tertiary recovery technique, horizontal drilling,  become commercially adopted in the 1980s:

One of the most important developments so far has been the horizontal well, a dramatic breakthrough compared with the traditional vertical drilling used since the inception of the oil industry. Commercially adopted in the 1980s, this technique is particularly suitable for reservoirs where oil and natural gas occupy thin, horizontal strata, or in sections where vertical drilling can no longer be useful. With their flexible “L” shapes, horizontal wells can change direction and penetrate a reservoir horizontally, thus “assaulting” virgin sections of a reservoir. Squeezing More Oil Out of the Ground  

So, the improved recovery of oil associated with the widespread adoption of horizontal drilling in the 1980s might at least partial explain why there is a trend for Q∞ to increase by half-a-percent per year: as more and more wells use this technique the effective amount of recoverable oil increases.

New petroleum sources
What about new oil sources put into production from 1980-2009?

It turns out that the break down in the production data given in the EIA’s Table 5.1 provides some interesting answers to this question.  Table 5.1 not only presents total petroleum production and consumption it also gives a break down in production from the lower 48 states, Alaska and natural gas liquids.  These data are presented in Figure 3 below:


A few trends are immediately obvious here, and they help explain why Q∞ would be increasing in the 1980-2009 time frame:

1)  There was brief upward spike in the over-all downward trend in lower 48 state production running from about 1981 to 1989.  Plus the decline side of the production curve is more gradual than the growth side.
2) There is a steady trend of increase in liquid natural gas throughout 1980 to 2009.
3) Alaskan production went from substantially nowhere in 1976 to a peak in production in 1988 and then when in decline thereafter. 

Examining the components of total USA production
I thought that it would be interesting to examine the three components of production in some further detail, in the hopes of gaining some further insights into future production trends.

In particular I was interested in examining how the sum of these three individual components, when analyzed using my modified Hubbert equation [9],  would compare to the analysis I did on the total production data.

Lower 48 state production
Figure 4 shows the NLLS best fits using the Hubbert equation to the full time span of 1949-2009 production data for the lower-48 states and also to progressively 10-year shorter spans:
Once again the NNLS analysis blew up for the 1949-69 data range.  The best fits to the longer data sets show the same trend as for the total production data: for progressively longer time spans, the estimated “a” and  Q∞ progressively decreased and increased respectively.  Once again this suggests to me that the production rate data for the lower 48-states does not follow the symmetric behavior inherent in the Hubbert equation.  For instance, consider the best fit to the 1949-1979 data: almost every data point from 1980 and on is to the right of the best-fit curve.

Figure 5 shows the best fit value from the NLLS modified analysis of the lower 48-state production data, where once again, I have used to best fit values of “a”, Qo and Q∞ (0.0703, 32.05, 177.9, respectively) from the analysis of the 1949-1979 data using Hubbert equation as my fixed parameters for a subsequent fit to the 1980-2009 data using equation [9]. 

Once again the fit using equation [9] with fcq variable gave a significantly better fit to the 1980-2009 data range than the best fit using the Hubbert equation fit to the full data range (p < 0.01). However the fit using equation [9] with both fca and fcq variable gave a still better fit to this data range as compared to the best where only fcq was varied (p < 0.01).  The solid green line shows the later's best fit, obtained when fca = 0.976 and fcq = 1.0097.  This best fit results using equation [9] therefore suggest that in addition to an annual increase of almost 1 percent in Q∞, the rate constant for production in the lower 48-states was actually declining annually at about 2.4 percent.  The 1 percent increase in Q∞ more than offests the 2.4 percent decline in “a” to give a more gradual decline in production than predicted using the traditional Hubbert equation. 

Natural gas liquids
Figure 6 shows the NLLS best fits using the Hubbert equation to the full time span of 1949-2009 production data for natural gas liquids and for short time spans, analogous to that shown in Figure 4.  The same trends are present in the best fits to progressively longer time spans, again suggest asymmetry in the production curve.  In fact this trend is much more prominent than it was with the total production or lower 48-state production.  Again look at the best fit to the 1949-1979 time span—after a local maxima in about 1972, the NGL production dips slightly an then takes off in 1980.  Of course, the best fit to the 1949-79 data (blue line) using the Hubbert equation misses this completely.  

Figure 7 shows the best fit value from the NLLS modified analysis of the NGL production data, where I once more have used to best fit values of “a”, Qo and Q∞ (0.0999, 1.59, 24.36, respectively) from the analysis of the 1949-1979 data using Hubbert equation as my fixed parameters for a subsequent fit to the 1980-2009 data using equation [9]. 

The fit using equation [9] with fcq variable gave a significantly better fit to the 1980-2009 data range than the best fit using the Hubbert equation fit to the full data range (p < 0.01).  Unlike the lower 48-state data, however, the fit using equation [9] with both fca and fcq variable did not give a significantly better fit as compared to the best-fit where only fcq was varied (p > 0.1). 

The solid green line shows the best fit, obtained when fcq = 1.0183 using equation [9].  The result suggests an annual increase of about 1.8 percent in Q∞.  The extension of this trend out to 2030 looks quite dramatic and suggests an increasing important contribution of NGL to the total petroleum production, as both the lower-49 state (Figure 5) and Alaska production are predicted to decline (Figure 8, below).  Will there be continued NG production to support this trend?  That is a question for another day.

Alaska
The solid red line in Figure 8 shows the NLLS best fits using the Hubbert equation to the time span of 1976-2009 production data for Alaskan production analogous to that shown in Figure 4 and 6.  Including the set of zero or relatively near-zero values for 1949-1975 seriously biased the NLLS fit and gave poor fits to the 1976-2009 time span. 



I cannot do the same kind of modified analysis on the 1980-2009 data that I did with the total, lower-48 or NGL production data, because there is not enough data before 1980 to make reasonable estimate of “a” Qo and Q∞ for use in equation [9].

I tried to look at a smaller data range of 1990-2009 using equation [9]; using the best fit of the Hubbert equation to the 1976-1989 time span for the fixed values of “a” Qo and Q∞.   However neither varying fcq or fca gave significantly better fits than the fit to the fulll data range using the the Hubbert equation.  

I think, however, that this finding is consistent with Hubbert equation fit to the measured values presented in Figure 8.  The Hubbert Equation gives a fairly good fit to data on the decline side of the production curve, as say, compared to the lower 48-states or NGL measured production values (shown in Figures 4 and 6, respectively).  There are some trends for the fit to systematically over-estimate production in the 1990s and then under-estimate production in the 2000s but these are small trends.

Note: the dashed line shows the back extrapolated production values using the best fit “a” Qo and Q∞ values (0.135, 2.52, 20.13, respectively) obtained from the best-fit to the 1976-2009 range.  This amounts to fixing "a" and Q∞  to these best fit values and then adjusting Qo to the value it would have to have in 1948 to give the value Q would have in 1975, as predicted by the best-fit parameters.  I used these back-extrapolated value when combining the production data from the three components so there would not be a discontinuity in the plot (Figure 9 below).

Putting it together: combining predicted lower 48-states, NGL and Alaskan production.
Figure 9 shows the sum of the individual best fits to the individual production data for the lower 48-states, NGL and Alaskan production.  The NLLS best fits of the Hubbert equation to 1949-1979 data and then the modified best fit of equation [9] to 1980-2009 data and extrapolation into the future, are used as the best-fit models of the lower 48-states and NGL production data.  The NLLS best fit using the Hubbert equation to 1976-2009 data and extrapolation into the future and past are used as the best model for the Alaskan production data.



For comparison, Figure 9 also shows two other best-fit models.  The dash green line shows the NLLS best fit of the Hubbert equation to 1949-1979 total production data and then the modified best fit of equation [9] to the 1980-2009 total production data, and extrapolation into the future as previously described above in the context of Figure 2.  The dashed red line shows the best fit to the total production data for 1949-2009 using the Hubbert equation, as described in the context of Figure 1.

Both of the modified NLLS analysis of the total production (dash green) and the sum of the three components parts of the production (solid green) are in pretty good agreement with each other throughout the data range, as well as extrapolated into the future until about 2020, where the predicted production rates start to diverge. 

Does the modified fit to combined components provide a significantly better fit than the modified fit to the total production data?  No, actually the Srss from the combined components production (1.568) is slight higher than the Srss from the analysis of the total production. 

However, I think the analysis of the combined components does provide some additional useful insights as to why the rate of decline in total production is slowed down.  The total production declined more slowly due to increases the amount of recoverable oil Q∞ in both the lower-48 states and in NGL (i.e., fcq>1). 

The difference in production rates predicted for 2015 and 2020 by the Hubbert equation (1.76 and 1.45, respectively) are about 25% (1.76) and 50% (1.45) lower than the production rates predicted for 2015 and 2020 using the modified equation (2.20-2.26 and 2.06-2.21, for the total and combined analysis, respectively).  And by 2030, while the Hubbert equation predicts a production rate of 0.95, the modified equation predicts 1.86 (analysis of total production) or 2.26 (combined three components of production).  That’s an about a 100% difference! 

This reminds me of about how far off the Hubbert equation when fit to only the 1949-1979 total production data, underestimates total production in 2009 (see Figure 5 and discussion).

Non-linear least squares (NLLS) analysis of total consumption
Figure 10 shows the best fit to USA petroleum consumption data (which I assume is equal to petroleum products supplied, in EIA Table 5.1).  I show the best-fit to the full data set 1949-2009 and to a more limited data set 1984-2009, which I consider to give a more realistic view of future trends, because it does not include the rapid increase and then decrease in consumption in the 1970s and early 1980s.

Table 2 summarizes the best fit Srss, “a”, Qo and Q∞ for these two time ranges.

The plots and values in Figure 10 and Table 2 are obviously projections based on the past behavior, and I am not expecting that USA consumption can continue as it has in the past.  It is useful, however, for showing what the future demand for petroleum products might be in the absence of a disruption in supply.   The rate constant for USA consumption, whether we look at the last 60 years, or the last 15 year,  are consistent with an increase in the rate constant of consumption (“a”)  at about 4 percent per year.  If the trend in consumption were to continue unabated, the USA would ultimately consume about 700-750 bbs of petroleum products in total (Q∞). 

It is interesting that about the last 6 years of data from 2004-2009, show the signs of a plateau and roll over to the decline side of the consumption curve.  This is likely reflective of the downturn in the economy as well as demographic trends in the USA.  Time will tell if the decline in consumption shows a very sharp drop off as suggest by the last two years of data, or, a more gradual decline as suggested by the best-fit curve to the 1984-2009 data set. 

Comparing future trends is USA petroleum production and consumption
Figure 11 compares the best fits obtained using my modified analysis of the total USA production data and the best fit to the 1984-2009 consumption data. 


Although the modified Hubbert equation predicts a slower decline in USA’s production than predicted using the traditional Hubbert equation, and consumption is predicted to peak and then decline, the discrepancy between what the USA is predicted to produce and consume for 2010 to 2030 is enormous.  The projected curves suggest that the maximum difference in consumption and production will occur in about 2016.  But, by 2018, the USA's production is predicted to be down to about the same level it was at in 1949.  If present trends continue, the USA's consumption is predicted to remain about three times higher than its internal production for the next 20 years.  Therefore for the foreseeable future, the USA will continue to have to rely on getting about two-thirds of its petroleum from international sources. 

Who will be able to supply that amount of oil in the coming decade?  Or, is the USA in for a dramatic, forced decline in consumption?  In future posts I will examine the prospects of the USA getting oil from its present international sources.





Monday, October 18, 2010

Refining the Peak Oil Rosy Scenario Part 7: An improved logistic model

Inherent problems with the logistic equation

During the course of modeling my simulated data sets (especially the data set which assumed a 2%/yr decline in “a” after the peak in production) I realized that although the NLLS model made a reasonable estimate of the average “a” for the 5-year time spans considered (see part 4; Results for non-linear least squares analysis of simulated data) the fits to the data do not look particularly good.  That is the fits seem to have a fairly high sum of residual sums of squares (Srss) difference between the true values of dP/dt and the predicted values from the best fit. 

This is illustrated below, for the simulated data set were Q∞ =170 bbls, “a’ = 0.0687 yr-1 up to the peak production year 1956 and thereafter “a” is assumed to decline 2%/yr.

Figure 17 shows the best fits from NLLS analysis of the twenty year span on the decline side of the production curve (1956-1975), using different assumptions as part of the fitting process:

1) Fix Q∞ at a constant value of 170 but vary “a” to get the best fit (red line)
2) Fix “a” at a constant value of 0.0687 but vary Q∞ to get the best fit (blue line)
3) Let both “a” and Q∞ vary to get the best fit (brown line)




The best fit values and rss for each of these fits along with the true value (i.e., the average value over the same time span) is summarized below:


Table 3: summary of best fit values to 1956-1975 time span shown in Figure 17
Model assumptions
best fit or fixed value of “a”
true value “a”
best fit or fixed value of Q∞
true value Q∞
Srss
Q∞ fixed; “a” variable
0.0592


0.0571

170 (fixed)


170
1.310
Q∞ variable; “a” fixed
0.0687 (fixed)
155
0.299
Both Q∞ and “a” variable
0.0782
148
0.028

It is apparent the best fits where either Q∞ is fixed and “a” is varied or “a” is fixed and Q∞ is varied are not particularly good in terms of minimizing the rss, as compared to the case where both “a” and Q∞ are varied.  For the fit done with Q∞ fixed to the true value and “a” varied the best fit value is pretty close to the true value (the same as found in Part 4, when fitting 5 year spans) but the fit is poor.  It seemed odd to me that you could get a better fit by fixing “a” to a value you know is wrong (i.e., pre-decline value of “a”) and varying the value of Q∞.  Although the best fit was obtained when varying both “a” and Q∞, the best values of these parameter are is poor agreement with the true values.  That is, “a” is over estimated by 37% and Q∞ is under estimated by 17%. 

To gain a better understanding of what is going on here, I found it useful to make a series of plots using different values of “a” (with Q∞ fixed to 170) or of Q∞ (with “a” fixed to the growth side value of 0.0687).  These plots are shown in Figures 18 and 19, respectively. 



Now we start to see how these parameters operate to define the production (dQ/dt) versus time curve.  In general varying “a” causes the simulated curve to move up and down the vertical plus there is an effect on the negative slope of the curve.  Notice that at the right-hand side of the span (1975) the effect of changing “a” on moving the curve up and down is quite small as compared to at the left-hand side of the span (1956).  In general varying Q∞ also moves the simulated curve up and down but with little effect on the slope of the curve.  It is now clear to see what happens when one of “a” or Q∞ is fixed and the other is varied: basically the simulated curve is moved up or down so that it crosses through about the mid-point of the 20-year span of the simulated data.  I can also start to see what happen when both “a” or Q∞ are allowed to vary:  NLLS analysis finds a low Q∞ to lower the simulated curve and chooses a high “a” value to increase the slope of the simulated curve so as to match the data. 

This analysis shows that there are problems inherent in the logistic equation when used to model production data where there is a declining value in the rate constant for production, as simulated here.  A NLLS fit to the decline side data, where the value of Q∞ is fixed, and “a” is allow to vary, can give reasonably good estimates of the value of “a” but the fit will be poor.  A NLLS fit where both Q∞ and “a” are allowed to vary will tend to over-estimate “a” and underestimate Q∞.

I can imagine what would happen in the inverse case for production data where after the peak, there was an increase in the rate constant for production: a NLLS fit where both Q∞ and “a” are allowed to vary will tend to under-estimate “a” and overestimate Q∞.  That is, the NLLS analysis will find a high Q∞ to raise the simulated curve and choose a low “a” value to decrease the slope of the simulated curve to match the data. 

We need a better model to specifically account for a changing “a” on the decline side of the production curve.

An improved logistic equation to account for changes in “a” on the decline side of the production curve.

Let’s consider again the logistic equation (equation [3] from Part 4):

dQ/dt = (Q∞ / (1 + No∙e–a(t-to)))/(Dt),            [3]

where No = (Q∞-Qo)/Qo,  Dt = time increment (1 year).

Also let’s consider how I generated the simulated data for the case where “a” was assumed to decline by some percentage amount per year (say 2%/yr).  That is, in 1957, a equals a(1956) x 0.98; in 1958, a equals a(1957) x 0.98; etc....

It is easy to see that the value of “a” in any one year a(t) on the decline side will be given by:

a (t) = ap∙ (fca)(t – td),              [4]

where ap equals the rate constant estimated from the growth-side of the production curve, td equals the year where the change in a is believed to start, and fca is the fractional yearly change in “a.” 

For example, in the case where a 2%/yr decrease in “a” occurs beginning in 1957, td equals 1957 and fca equals 0.98.  And, in 1958, a(1958) equals ap x 0.982 (i.e., ap x.98 x 0.98);  in 1959,a(1959) equals ap x 0.983, etc...

Now we can substitute “a” from equation [3] with a(t) from equation [4] to produce the improved logistic equation:

dQ/dt = Q∞ / (1 + No∙exp(-(ap * (fca)(t – td))∙(t-to))) / (Dt),     [5]

where No = (Q∞-Qo)/Qo.

Notice that if we just fix “fca” equal to 1 then equation [5] reduces to equation [3] (i.e., “1” to the power of any whole number is still “1”), so equation [5] is useful for analyzing data with and without the assumption of a change in “a” on the decline side.

Figure 20 shows the NLLS fit to the same time span depicted in Fig. 17, with the values of Q∞ and “a” fixed to the growth side values of 170 and 0.0687, respectively and fca allowed to vary.



Table 4 compares the best fit values summarized in Table 3 with the best fit using the new logistic equation:

Table 4: summary of best fit values to 1956-1975 time span shown in Figure 17 and Figure 20
Model assumptions
best fit or fixed value of “a”
true value “a”
best fit or fixed value of Q∞
true value Q∞
Srss
Q∞ fixed; “a” variable
0.0592


0.0571

170 (fixed)


170
1.310
Q∞ variable; “a” fixed
0.0687 (fixed)
155
0.299
Both Q∞ and “a” variable
0.0782
148
0.028
Both Q∞ and “a” and fca variable
0.0687 (fixed)
fca = 0.0981
170 (fixed)

0.013

As one might expect the fit to the simulated is excellent although not perfect—after all we are fitting the simulated data with an equation that corresponds to how the data was created in the first place.  The reason why the fit is not a perfect fit, as indicated by the non-zero Srss, is because I choose 1956 as the first data point—but 1956 was actually the last year before the value of “a” was decreased by 2%.  Indeed, if I redo the fit to a 19-year span from 1957 to 1975 the best fit gives a  Srss equal to zero and fca equals 0.098 (actually, 0.97996, which is about 0.98). 

An improved logistic equation to account for changes in Q∞ on the decline side of the production curve.

This is somewhat anticipating the re-analysis of the USA production data, but, there is the possibility that Q∞, the total recoverable oil, will increase over time.  This could occur because more oil is discovered, or, because the techniques to recover the oil have become more efficient with time.  We need to have the ability to model this, similar to modeling to detect changes in the production rate constant, “a.” 

Analogous to the procedure described above, we can modify the logistic equation to account for yearly fractional changes in Q∞ on the decline side of the production curve, by defining Q∞(t) as follows: 

Q∞(t) = Q∞p∙ (fcq)(t – td),              [6]

where Q∞p equals the total recoverable oil estimated from the growth-side of the production curve, td equals the year where the change in Q∞ is believed to start, and fcq is the fractional yearly change in Q∞.  For example, in the case where a 0.5%/yr increase in Q∞ starts to occur on the decline side of the curve in 1956, then td equals 1956 and fcq equals 1.005.

Now we can substitute Q∞ from equation [3], where we have explicitly written out the value of No = (Q∞-Qo)/Qo, with Q∞ (t) from equation [6] to yield the improved logistic equation:

dQ/dt = (Q∞p∙ (fcq)(t – td))/ (1 + (((Q∞p∙ (fcq)(t – td))-Qo)/Qo)∙exp(-a∙ (t-to)))/(Dt)),            [7]

A further-improved logistic model to account for either or both changes in “a” and changes in Q∞ on the decline side of the production curve.

Now, if we include both expression for Q∞ (t) and a(t) into the logistic equation we have:

dQ/dt = Q∞ / (1 + ((Q∞-Qo)/Qo)∙exp(-(ap * (fca)(t – td))∙(t-to)))/(Dt),                [8]

dQ/dt = (Q∞p∙ (fcq)(t – td)) / (1 + (((Q∞p∙ (fcq)(t – td))-Qo)/Qo)∙exp(-(ap * (fca)(t – td))∙(t-to)))/(Dt),    [9]

If you think equation 9 is starting to look a bit complicated, you should try inputting it as an equation into an EXCEL spreadsheet—great fun, but, it can be done (hint: it helps to use simulated data so that you can verify that the equation has been correctly entered!!).

The advantage of using equation 9 is its great flexibility.   For example, by fixing fca and fcq to 1, equation 9 reduces back to the familiar logistic equation 3.  Or, we can fix fcq to 1, and make fca variable, to examine yearly fractional changes in “a” on the decline side where Q∞ is fixed and has no fractional change.  Or, we can fix fca to 1, and make fcq variable, to examine yearly fractional changes in Q∞  on the decline side where “a” is fixed and has no fractional change.  Or, we can allow both fca and fcq to be variable and fix Q∞   and “a” to their best fit growth-side values. 

Testing the improved logitic model with simulated data where “a” increases on the decline side.

It might seem paradoxical to be modeling for increases in “a” for the decline side of a production curve.  Why in the world would this ever happen?  Think about it, if we are on the decline side, it means that there is no longer growing production.  To try and mitigate the socio-economic consequences of this, a country or company might try to increase the rate of production (i.e., increase “a”) by pumping oil from existing wells at greater rate.  Of course, in the absence of any increase in Q∞, that will just more rapidly deplete whatever remaining oil there is in the long term.  However, this would tend to mitigate the declining production in the short term.

Let try and model this with a new simulated data set and using equation 9 for the modeling.  Once again, I assume Q∞ equals 170 bbls and “a” equals 0.0687 yr-1 on the growth side of the production curve.  Then in the year of peak production, 1956, I assumed that “a” increases by 2%/yr thereafter.

Figure 21 shows the simulated data and NLLS best fits to the data for the first 20 years on the decline side using equation 9, with the various parameters, “a”; Q∞; fca; and fcq either fixed to the best-fit growth side value (i.e., “a”=0.0687; Q∞=170), or to 1 (i.e., one or both of fca and fcq fixed equal to 1), or allowed to vary, as described in the figure legend.



The simulated data on the decline side shows an initial boost in production—the peak year of production actually would be shifted to about 1963.   For this analysis, however I just stuck with examining the twenty year period from 1956-1975.  

It is apparent from the best fit plots shown in Figure 21 that poor fits to this 20-year time span are obtained for the models where one of Q∞, "a" or fcq are varied and all other parameters are fixed to their respective best fit value from the growth-side of the curve (or set to equal 1 in the cases if fca and fcq).  

When both Q∞ and  "a" are allowed to vary (with fca and fcq fixed equal to 1) the fit is better than the above cases.  But the estimate of "a" (0.0647) and Q∞ (197) rather poorly under and over estimate the true values (i.e., the average "a" equals 0.0.0835 for the 20 year time span and Q∞ is held constant at 170).

The best fit, a near exact fit with the ∑rss ~ 0, is for the model where fca is varied and all the other parameters are fixed.  The best fit value of fca is 1.019947 which is about equal to a 2% per year increase, which is what was assumed to generate the simulated data in the first place.

Testing improved logistic model with simulated data where Q∞ increases on the decline side.

To further test equation 9, I generated another simulated data set.  Once again, I assume Q∞ equals 170 bbls and “a” equals 0.0687 yr-1 on the growth side of the production curve.  Then, in the year of peak production, 1956, I assumed that Q∞ increases by 0.5%/yr thereafter.

Analogous to Figure 21, Figure 22 shows the simulated data and NLLS best fits to the data for the first 20 years on the decline side using equation 9, with the various parameters, “a”; Q∞; fca; and fcq either fixed to the best-fit growth side value, or allowed to vary, as described in the figure legend.


Similar to the data shown in Figure 21 the simulated data on the decline side shows an initial slight increase in production, and the peak year of production is shifted to about 1960.  Again, for this analysis, however I just stuck with examining the twenty year period from 1956-1975.  

In have not shown the best fits for the cases where only "a" or only fca were varied and the other parameter were fixed to their best-fit growth side values (or to 1 for fcq), as these fits were quite poor and looked about the same as depicted in the previous figures for other simulated data.  

I did show in Figure 22,  the case of where Q∞ was varied (and all other parameters fixed) to illustrate that, even though the best fit value of Q∞ (182) was close to the true value of 179 (the average value over the 20-year span), the fit to the simulated data was poor.  

Again when both Q∞ and  "a" are allowed to vary (with fca and fcq fixed equal to 1) the fit is better than the above cases.  But again the But the estimate of "a" (0.0604) and Q∞ (196) rather poorly under and over estimate the true values (i.e., "a" held constant at 0.0687; the average Q∞ equals 179 for the 20 year time span).

As expected, the best fit, a near exact fit with the ∑rss ~ 0, is for the model where fcq is varied and all the other parameters are fixed.  The best fit value of fcq is 1.004987 which is about equal to a 0.5% per year increase, which is what was assumed to generate the simulated data in the first place.

Conclusions

Based on this and the previous analysis done Parts 4 & 6, I have revised my approach for modeling real oil production and consumption data as follows:

1) Obtain estimates of Q∞ and "a" from the NLLS best fits to the growth side production data, and progressive increments of the decline side data.  If the NLLS model blows up because there is no decline side data or data with a lot of scatter, use the hybrid linear logistic model to estimate Q∞ and "a."

2) Examine the most recent 10-20 year time span of the decline-side production data to see if there are discernable trends of a departure from the best fits using the logistic model where a constant Q∞ and "a" are assumed.

3) If there is a recent time span with a discernable departure detected in (2), apply NLLS and the improved logistic model to estimate to fractional change in "a" (fca) or Q∞ (fcq), or both, for this time span.

4) Apply the best fit model results from (1) or (3) to predict the production rates out to 2030.

5) Use the predicted production rates as part of the export land modeling for the USA (remember, that's what this whole series was about in the first place).  

Okay, its time to step away from modeling simulated data and enter world of real data—back to the USA!