WHAT ARE THE MAIN UNCERTAINTIES IN ESTIMATING EARTHQUAKE RISK?

This paper is concerned chiefly with the first of these, although a few remarks concerning the others appear in the final section. Its intention is to identify the major uncertainties which enter into the estimation of seismic risk (in the sense of 1. above) and to place an order of magnitude estimate on the errors each is likely to cause. Some earlier papers with a similar theme are by Cornell and Vanmarke (1969), Donovan and Bernstein (1978), McGuire and Shedlock (1981), McGuire and Barnhard (19 81) among others. Evidently there is a considerable element of personal judgement in such an attempt; the principal aim is to draw attention to the effects of aspects which are difficult to bring into explicit consideration and as a consequence are in danger of being ignored or otherwise brushed under the carpet.


INTRODUCTION
The general heading of seismic risk covers a variety of problems, of which the following at least may be distinguished: risk analyses for a particular structure on a particular site; 2. the determination of large-scale zoning schemes; 3. microzoning; 4. risk analyses for earthquake insurance ; 5. risk analyses for civil defence and hazard reduction programmes.
This paper is concerned chiefly with the first of these, although a few remarks concerning the others appear in the final section.
Its intention is to identify the major uncertainties which enter into the estimation of seismic risk (in the sense of 1. above) and to place an order of magnitude estimate on the errors each is likely to cause.
Evidently there is a considerable element of personal judgement in such an attempt; the principal aim is to draw attention to the effects of aspects which are difficult to bring into explicit consideration and as a consequence are in danger of being ignored or otherwise brushed under the carpet.

NEGLECT OF INDEPENDENT MODES OF FAILURE:
Failure is frequently treated as a single phenomenon, because of lack of detailed information concerning building response or (more commonly) the earthquake loading or ground motion.
If in fact several distinct types of ground motion may occur, each of which may trigger off a distinct mode of failure, predicting the ground motion as some kind of average of the two may lead to a serious distortion of the risk.
Suppose, to take a hypothetical example, a structure is susceptible to damage either by a sharp jolt from a local earthquake or prolonged swaying from a distant earthquake.
The use of an average "response spectrum" for these different types of loading might suggest a 10% probability of a kind of averaged motion which would not produce failure in either mode, in place of a 5% probability of a motion producing one type of failure and a 5% probability of a motion producing the other type of failure. A similar effect has been pointed out by *Institute of Statistics & Operations Research, Victoria University of Wellington, New Zealand.
der Kiureghian and Ang (1977) in connection with variability in the attentuation law.
Allowance for this variability produces a significant upward movement in the mean risk estimates.
Variability in b-values, in directional effects, in spectral content, in the distribution of energy between vertical and horizontal components, and in combinations of these factors, could all produce similar biases.
"Penalty factors" which have a similar mathematical origin occur in Bayesian analyses, but here the variability is an artefact of our ignorance rather than a physically based reality.
In both cases the bias in caused by the fact that failure is a non-linear phenomenon, so that the expectation of an average differs from the average of the expectations.
The underlying procedural fault which gives rise to this difficulty is the reduction of a complex, multivariate situation to a situation where all the information is channelled through a single variable.
It is very common, for example, for all aspects of the ground motion to be predicted from the maximum acceleration, which in turn is predicted from various parameters of the earthquake source.
No matter how much care is devoted to determining the regression of (say) spectral content or duration of shaking on maximum acceleration, essential information, which can significantly effect the estimates of risk, will be lost if in fact these different features can vary independently of each other. This provokes the question, what is the minimum number of independent components that is needed to provide a description of the ground motion sufficiently complete to allow all relevant features of the building response to be computed? If all earthquakes displayed a fixed relation between energy content in different modes and frequency bands it would be enough to specify just one -say the energy content in one frequency band -to determine the rest.
But if these quantities vary independently, then more information, a larger vector, will be needed.
What is the most appropriate vector for such a description, and what is the smallest dimension it could have? There seems to be much scope for further research on this question.
As with other features we shall consider, it is difficult to put quantitative estimates on the errors deriving from this effect.
The disturbing feature is that it generally leads to an underestimate of the risk.
Underestimates by a factor of h or more would seem quite likely, and an underestimate by an order of magnitude not impossible.

NON-STATIONARITY IN REGIONAL SEISMICITY
Most of the calculations involved in obtaining risk estimates can be stated in terms of rates : rates of occurrence of certain classes of earthquakes; rates of occurrence of a given intensity of ground motion at a particular site; rates of failure.
For such concepts to be meaningful some degree of constancy in the tectonic processes producing the earthquakes has to be supposed, so that the rates remain reasonably constant over time, and past history can be used as a relevant guide to the future.
How strong is the evidence that such an assumption is valid, and what degree of uncertainty does this assumption inject into estimates of the rates of occurrence (or probabilities) of given ground motions?
Instrumental data over the long time scales needed to answer this question barely exist, so one is forced back to an examination of historical and possible geological data, with all its uncertainties concerning completeness of the records, consistency from one time period to another, possible exaggerations of historical reports etc, etc.
Some of the most convincing historical studies so far prepared are those of Ambraseys (e.g. 1971) for regions in the Near and Middle East. The picture which emerges from his studies certainly does not preclude local changes in seismicity over time intervals of the order of centuries.
For example, his work on historical seismicity in Iran suggests the possiblity of periods of high seismicity alternately affecting different parts of the region.
An account of similar alternations in the seismic history of China is given by Shi et al (1978).
The evidence for such alternations for seismic regions around the Pacific margin appears less convincing. Even in New Zealand, however, there are some suggestions of varying levels in seismicity in the modest century and a half which constitute our total recorded seismic history.
For example, some 15 earthquakes with Richter magnitude estimated at 7 or greater have occurred within the last 130 years (mainly last century), giving an average value of 1.1 earthquakes/decade.
Instrumental data on earthquakes with M > 5 in the period since 1940 suggests a mean rate of about 0.5 earthquakes per decade for earthquakes with M >_ 7.
The difference seems too large to be attributed to random variations within a stationary model, even after making some allowance for clustering (see §5).
Thus the evidence suggests a genuine lull over the last 30 years or so.
Clearly the main danger here is undue reliance on data gathered over too short a time period.
Recent studies (e.g., Peek (1980)) have been rather careful in this regard, however, and large errors from this source seem unlikely.
Ambrasey's studies suggest that even where such fluctuations exist they are relatively small -factors of 3 or 4 rather than 30 or 40.
There seems little evidence to suggest that the major foci of seismic energy release have shifted greatly in historical or even recent geological times.
It would seem reasonable to ,suggest that careful current estimates of occurrence rates of large shocks should not be in error by factors of more than 2 or so from this source,

EXTRAPOLATING THE FREQUENCY-MAGNITUDE LAW
Every worker in the area of seismic risk must be uneasily aware of the vulnerability of his or her calculations to assumptions concerning the extrapolation of the frequency/magnitude law.
The underlying difficulty here is the lack of local data on very large earthquakes. Several recent papers, notably McGuire and Shadlock (1981), illustrate the effect on the risk of varying the parameters of these laws within reasonable limits. Such "sensitivity studies" provide a constructive approach to this problem, but a few additional comments may be in order.
The use of extreme value distributions in this context is something of a red herring.
There is no information in the data which can be extracted by extreme value methods which cannot also be extracted by more conventional methods, and with lower error bounds.
The only point which can perhaps be made in favour of extreme value estimates is that they give more weight to the high magnitude end of the frequency/magnitude plot, whereas the maximum likelihood estimate of b (which is just the reciprocal of the mean value of magnitude over the threshold) tends to weight the low magnitude end of the plot.
However, the same end could easily be achieved by other means (e.g., weighted least squares). The real point is that no statistical method can be relied upon in the absence of direct data on large earthquakes, since any such method acts within the framework of an assumed model, and*where there is no data, there is no method of determining the applicability of the model.
In such circumstances it seems preferable to turn to considerations of quite a different character, such as geological evidence on rates of deformation and fault movement, or geophysical evidence concerning the existence of maximum possible magnitudes associated with a given tectonic structure.
The critical question in looking at the sensitivity of risk estimates to extrapolations of the curve, is how far the tail of the magnitude distributions influences the risk.
Here two opposing effects are involved : the decreasing probability of occurrence with increasing magnitude; and the increase, with increasing magnitude, in the area (or distance along fault in a linear model) within which the earthquake 1 s epicentre may fall and still produce a given movement at the site.
Let us examine these two opposing effects in a typical case.
The decrease in frequency with magnitude is measured by the frequency/ magnitude law, which may be written in the form where for New Zealand data we may assume b = 1.1 and M 0 is a threshold value representing the minimum earthquake capable of causing the particular movement at the site.
It is convenient to use powers of e rather than powers of 10, which gives Prob{ magnitude > M} -e~2 .53(M-Mo) with corresponding probability density The second factor is measured by the attenuation law which we may take in the form (Donovan (1973)), , oc -0"58M, OA , .-1.52 , ON a = 1.35e (20 + r) where a is the acceleration at the site as a fraction of g, and r is the epicentral distance (km). Taking a and M as given, and solving for r, we find that the earthquake epicentre must lie within a distance if it is to produce the required acceleration at the site, i.e., within a circle of area Ilr^ a » Integrating over possible values of M, the overall rate of occurrence of earthquakes producing the required acceleration at the site is given by the integral XJL |M" 4,a f ( M ) d M - (4) The relative contributions from earthquakes of differing magnitudes can now be ascertained.
The conditional probability that, if a damaging earthquake occurs, its magnitude will lie in the range (M, M + dM) will be given by the ratio where c is a constant.
We see from this calculation that the relative contribution drops off fairly rapidly with M.
Only 5% of the risk comes from earthquakes 2 magnitude units above the threshold M 0 and 1% of the risk from earthquakes 3 magnitude units above M 0 .
The practical implication of these calculations depends critically on the value of Mo.
In designing for an office structure, one might accept M Q = 5 or even less as a reasonable lower bound, in which case only a small proportion of the risk would be due to earthquakes with M _> 7, and uncertainties in extrapolating the curve beyond this point would be essentially immaterial.
For highly sensitive structures, unfortunately, the situation is likely to be very different.
Here the structure may be designed to withstand damage from all but the very largest earthquakes, so that the bulk of the risk comes from earthquakes outside the range for which there exists direct information.
In such a case, a wrong estimation of the slope could easily produce errors of one or even two orders of magnitude.
The uncertainties are so gross that, as mentioned earlier, it seems essential to seek supplementary information of a geological or geophysical character.
Indeed the whole question of risk estimation seems somewhat unreal in these circumstances, not so much because of the likely errors, but because the probabilities are so small they become devoid of any operational meaning.
It seems more important to give attention to minimising the unpleasant consequences of an event such as a nuclear reactor being near to the epicentre of major earthquake. ^As the number of reactors escalates (10 cannot be far away), the chance of such a rare event occurring must approach unity, and then one will be more concerned about safety and back-up provisions than about recriminations over the risk calculations.

THE ROLE OF THE POISSON ASSUMPTION
It is sometimes asked whether the use of more complex models of earthquake occurrence, perhaps of the cluster process type described in Vere-Jones (1970), might substantially improve estimates of risk.
I do not think that this is the case.
Much of the discussion of earthquake risk is really a discussion of rates, for which stationarity, rather than the distributional model, is the critical assumption.
The improvements that might be achieved by fitting a more complex model would tend to be swamped by the uncertainties concerning the frequency/magnitude law, stationarity, etc., already alluded to.
Nevertheless there are some places where the Poisson assumptions do play a vital role, and it may be as well to identify these.
In the first place it enters into the confidence intervals for the estimates of rates of occurrence. If a total number N of earthquakes in some category are recorded in a certain time interval T, then under the Poisson assumption the coefficient of variation (ratio of standard deviation to mean) of the resulting estimate X = ty/T is approximately , whatever the value of T.
This quantity may be used as a rough estimate of the statistical precision of the estimate when the Poisson assumption is justified. In general, however, earthquakes appear in clusters, both immediately apparent (aftershock sequences or swarms) or more subtle (short periods of locally heightened activity).
In such cases the coefficient of variation should be inflated by a factor As mentioned earlier, this difference seems too large to be attributed to random variations.
The Poisson assumption is also used in converting rates (A) to probabilites, through the formula p ( T ) = Prob(damaging earthquakes in t, t + T) = 1 -e-^.
The use of this formula as an approximation when AT is very small, and no detailed information about the process is available, seems very reasonable. For intervals of moderate length, however, the formula is rather susceptible to departures both from stationarity and from the independence assumptions implicit in use of the Poisson model.
Both departures produce similar effects.
In the first case (non-stationarity) the rate X becomes a function X(t) which varies explicitly with time.
In the second case (failure of independence) it becomes a function of the past history of the process, the type of function depending on the type of dependence assumed.
Clustering, for example, would give a high risk immediately following an earthquake, while a periodic or relaxation model would give a high risk when the next earthquake became "due". Some examples are given by Vere-Jones (1978), Vere-Jones and Ozaki (1981). It must be stated, however, that the data by itself is rarely sufficient to distinguish between such models, while a physical theory which might distinguish between them is also lacking.
While this situation continues, allowance for such factors might modify medium-term estimates of the probability by a few fold, but hardly by more than this.
This conclusion could be quite dramatically altered by developments in the area of earthquake forecasting, which indeed may take the form of conditional risk forecasts, updated as new precursory information comes to hand. (Rhoades and Evison (1979) give some graphic examples of how this might be done).
The new feature here would be the addition of effective precursory information to the past history.
With such information incorporated, it seems quite possible that risk enhancement factors of 10-20 could arise.
The engineering and design implications of such forecasts have yet to be worked out, but they would surely be considerable, and surely raise question marks about the future of traditional procedures.
Despite recent pessimism about earthquake prediction, I am confident myself that such refinements to risk forecasting will become available, and that, while they may not be sufficient to provide an explicit prediction, they will be sufficient to produce worthwhile economic savings if the updated information is properly incorporated into design and maintenance procedures.
A final area where use of the Poisson model could lead to seriously misleading results concerns the assessment of risk from repeated events.
Such problems arise particularly in the discussion of networks, where the occurrence of several earthquakes in close succession, even in different parts of the region, could constitute a major hazard.
They also arise in discussing the effects of aftershocks on structures already weakened by exposure to one major event.
In any such situation some explicit modelling of the clustering or interdependence effects would be necessary.
Analytically tractable models may be hard to come by, but elegant methods for simulating nonconstant Poisson processes with time and history-dependent rate functions have been derived by Lewis and Shedler (1979), Ogata (1981), and should be applicable in this context.

SEISMICITY MAPS
Here we move from the first to the second of the topics listed in the introduction.
Many of the concerns already expressed continue to apply in this context, particularly the problems of non-stationarity, of extrapolation (and even interpolation) of the frequency magnitude law, and the effects of clustering and other departures from the Poisson model. In general, existing techniques for producing seismicity maps are based on ad-hoc smoothing procedures with little if any consideration given to the assessment of precision or the development of an underlying statistical model. Controversy also exists over the use of direct data or intensities as an alternative to the combination of instrumental and historical data on seismicity coupled with a generalized atten uation law.
The incorporation of geological evidence is another vexed question in this area.
Although it might be considered the longest standing of all aspects of risk estimation, the construction of seismicity maps has received relatively scant attention from the statisticians -no doubt on account of its complexity.
In essence the problem is the same as that of estimating a probability density function from repeated observations. This problem has attracted an extensive literature in the 1-dimensional case, relatively little in higher dimensions.
It is substantially complicated in the earthquake context by the phenomena of clustering and nonstationarity, which would distort any confidence intervals based on assumptions of independent, identically distributed observations.
From the practical viewpoint, a simple if rather crude way round these difficulties is to block the region into a series of subregions for which the seismicity (and perhaps the b-value as well) can be estimated on the assumption that it is uniform within each subregion. Some allowance for the effect of clustering on the individual rate estimates can be made in most applications through the factor /m 2 /mi described earlier.
The fact that in most applications the overall risk will be obtained by integrating the contributions from different subregions around the site will tend to mitigate the crudities of the initial model.
A typical illustration is in Peek (1980). If smooth contours are required there are two basic approaches which may be used.
The non-parametric approach estimates the seismicity at any point x by way of a smoothing kernel, through •che formula is a suitable smoothing function. where the x^ are the earthquake epicentres, and dN(u) is the counting measure which counts the number of earthquakes in the infinitesimal area du.
Alternatively one can seek a parametric form for X(u).
For example, we have been experiementing recently with the possibility of representing X(u) in exponential polynomial form / say~ Mx, y) = exp{ H^j^j (x)^k(y) } where the a., are coefficients to be estimated, and the ^(x) a suitable family of orthogonal polynomials.
In principle such densities can be fitted by the log-linear models techniques incorporated into such packages as GLIM and GENSTAT, but many problems remain in this area.
In general I am not in favour of using the direct evidence on past intensities or ground accelerations as the main basis for risk estimates or contour maps.
There is a considerable danger of making repeated use of the same event and hence distorting average contours.
A more balanced picture is likely to result from taking into account the seismicity in a wide region round the site, particularly where extrapolations to high intensities have to be made. Perhaps the most useful role for direct information on site intensities is as a control or check on risk estimates obtained from seismicity studies.
Another difficult but important question relates to the incorporation of geological evidence on seismicity.
Much depends here on the knowledge of the faulting structure at the depths where the earthquakes actually occur, and the extent to which instrumentally derived seismicity patterns appear to be related to known fault traces.
Unless both features are favourable it may be difficult to find any satisfactory basis for relating the geological evidence to current seismicity.

SOME MOKE GENERAL POINTS
It is desirable in any statistical analysis to check back from time to time on the overall aims of the project, to ensure that the data collected and the proposed analysis are actually capable of fulfilling those aims.
Presumably the general aim of risk analysis and aseismic design is the reduction of economic and especially human losses due to earthquakes.
Evidently, estimation of the probability of failure of a particular structure during its expected life is only one small, and rather indirect, component of such a general programme.
It would be unfortunate if undue emphasis on this question led to other aspects being neglected: I do not know, for example if sufficient attention is given to reducing the risks from secondary shocks following in the wake of a major earthquake.
It seems to me important that some attention be given, even at this early stage, to the possible strategies which might be adopted in connection with long-range risk forecasts.
These are a natural consequence of current work on historical seismicity and at some stage may be expected to link up with work on earthquake predictions, or risk forecasting based on the observation of precursory phenomena.
How should such information be used in a design context? I have argued elsewhere (Vere-Jones (1978)) for the desirability of incorporating many of the aspects referred to in the introduction (i.e., problems (1) - (5) listed there) within an overall government programme of risk reduction.
No amount of statistical expertise, for example, will make a zoning scheme effective, unless it is backed up by legislation and a sufficient determination to enforce the building codes.
I feel strongly that earthquake insurance should be regarded as one of the tools in a state programme of risk reduction, and should not be left solely to the selected interests and slanted viewpoint of private insurance.
The collection of data for a state insurance scheme and for private insurance raise very different questions. In a state scheme, for example, the information on existing structures needed for assessing premiums could also be used in connection with decisions concerning the deployment of civil defence units and emergency planning.
Risk estimates should be available to determine building and site priorities.
In all cases the information collected and the analysis chosen should be related back to the use of the information and the general aims of the exercise.
Such considerations point to the setting up of a planning and research facility to provide information and advice to local authorities , engineers, and architects, and to recommend appropriate legislation to government.

8.
CONCLUSIONS: The general conclusions from this review are that the estimated probabilities of a given ground motion are likely to be in doubt in factors of 4-5 for moderate ground motions and perhaps twice these amounts for intense ground motions.
The principal sources of uncertainty lie in the interpretation and extrapolation of historical data on variations of seismicity with time, and the extrapolation of the upper end of the frequency-magnitude law.
Less clearly defined and probably more serious uncertainties arise in estimating the probabilities of different types of structural failure from given information on the ground motion, particularly when this information is restricted to maximum accelerations.
In general, the topic of risk estimation should not be viewed too narrowly, but in the context of an overall strategy of risk reduction.