Multivariate design via Copulas

Introduction Conclusions References


Introduction
The Return Period (hereinafter, RP) of a prescribed event is frequently adopted in hydrology (as well as in water resources and civil engineering, and more generally in geophysical and environmental sciences) as a criterion for the identification of critical events, and provides a means for rational decision making (for a review see Singh et al., 2007, and references therein).
The traditional definition of the RP is as "the average time elapsing between two successive realizations of a prescribed (critical) event", which clearly has a statistical base.Equally important is the related concept of design quantile, usually defined as "the value of the variable(s) characterizing the event associated with a given RP".In engineering practice, the choice of the RP depends upon the importance of the structure, and the consequences of its failure.For example, the RP of a dam design quantile is usually 1000 years or more (Midttomme et al., 2001), while for a sewer it is about 5-10 years (Briere, 1999) While in the univariate case the RP and the design quantile are usually identified without ambiguity -and widely used in the engineering practice (Chow et al., 1988) in the multivariate one this is not so, essentially due to the lack of a natural total order in multi-dimensional Euclidean spaces.In particular, the identification problem of design events in a multivariate context is of fundamental importance: however, its troublesome nature has strongly limited the application of multivariate analyses.Recently, several efforts have been spent on the issues of multivariate RP's and quantiles (see, e.g., Serfling, 2002;Belzunce et al., 2007;Chebana andOuarda, 2011a, 2009;Chaouch and Goga, 2010, and references therein; for a methodology to identify multivariate extremes by using depth functions see Chebana and Ouarda, 2011b).Here we address the following crucial question: "How is it possible to calculate, in a meaningful way, the critical RP's and the corresponding design event(s) in the multivariate case?" Below, we outline how to circumvent the problem, and provide consistent answers.
As we shall show later, the notion of multivariate RP is strictly related to the one of Copula.The use of copulas in environmental sciences is recent and rapidly growing.
Shortly, a multivariate copula C is a joint distribution on I d = [0,1] d with Uniform margins.The link between a multivariate distribution F and the associated d -dimensional copula C is given by the functional identity stated by Sklar's Theorem (Sklar, 1959): For a thorough theoretical introduction to copulas see Joe (1997); Nelsen (2006); for a practical approach see Salvadori et al. (2007); Jaworski et al. (2010).In order to avoid troublesome situations, hereinafter we shall assume that F is continuous (but not necessarily absolutely continuous), and strictly increasing in each marginal: these regularity constraints are rather weak, and satisfied by the majority of the distributions used in applications.Clearly, also pathological cases can be carried out, but they require suitable techniques that go beyond the scope of this work.
Later we shall use the Kendall's distribution (or measure) function K C : I → I (Genest andRivest, 1993, 2001) given by where t ∈ I is a probability level, W = C(U 1 ,...,U d ) is a univariate random variable (hereinafter, r.v.) taking value on I, and the U i 's are Uniform r.v.s on I with copula C. Note that Eq. ( 3) represents a multivariate quantile relationship (Genest and Rivest, 2001;Nappo and Spizzichino, 2009): practically, it measures the probability that a random event will appear in the region of I d defined by the inequality C(u) ≤ t.Thus, as we shall see, K C turns out to be a fundamental tool for introducing a suitable (copula-based) definition of RP for multivariate events.Unfortunately, at present no general analytical expressions of K C are known -except for special cases, like the one of bivariate Extreme Value copulas (Ghoudi et al., 1998), and some Archimedean copulas (McNeil and Nešlehov á, 2009) -and it is necessary to resort to simulations (see, e.g., Algorithm 1 outlined later).
The paper is organized as follows.In Sect. 2 we first illustrate the case study.In Sect. 3 we introduce a suitable notion of multivariate RP, and in Sect. 4 we show how to calculate the corresponding multivariate quantile.Then, in Sect. 5 we present two elementary strategies to calculate critical design events in a multivariate context.Finally, in Sect.6 we discuss the results outlined in the paper, and draw some conclusions.(2005), "undisturbed" flood hydrographs incoming the reservoir were fixed by using the inverse reservoir routing, the water levels in the reservoir, and the operations on the controlled outlets.Then, maximum annual flood peaks Q and volumes V were identified and selected for 49 years, from 1937 to 1994.As a result of a thorough investigation, almost all of the occurrence dates of the Q's and the V 's were the same: i.e., they happened during the same flood event.
As an improvement over De Michele et al. (2005), beyond the pair (Q,V ), also the initial water level L in the reservoir before the flood event (Q,V ) is considered in this work, in order to analyse the triple (Q,V,L) of practical interest: in fact, on the one hand L represents the starting state of the dam; on the other hand, (Q,V ) is the hydrologic "forcing" to the structure.Clearly, there are physical reasons to assume that L is independent of (Q,V ) -see also below water level, in order to get the maximum benefit concerning the production of electric energy.
Using the pair (Q,V ), it is possible to calculate the associated flood hydrograph with peak Q and volume V , once the shape of the hydrograph has been chosen.As first approximation, it is possible to consider a triangular shape, where the base time is equal to T b = 2V/Q, the time of rise equals T r = T b /2.67, and the time of recession is equal to 1.67T r (see Soil Conservation Service, 1972;Chow et al., 1988, p. 229).Consequently, the flood hydrograph q is given by Later, in Sect.5, we shall test the behavior of the dam subject to selected hydrographs.
More particularly, we shall first operate the reservoir routing of the flood hydrograph (see, e.g., Bras, 1990, p. 475-478;Zoppou, 1999) considering as outlet only the uncontrolled spillway, and then we shall check whether or not the reservoir level exceeds the crest level of the dam.
In Fig. 1 we show the trivariate plot of the available observations, as well as the fits of the marginal distributions.However, we shall not insist on this point, being of secondary importance with respect to the actual methodological target of the paper.The GEV law is used to model the statistics of both Q and V , since these are annual maxima: the estimates of the parameters are reported in Table 1.The fits are valuable, as they passed standard goodness-of-fit tests (namely, Kolmogorov-Smirnov and Anderson-Darling) at all usual levels (viz., 1 %, 5 %, and 10 %).Instead, the behavior of the variable L is quite tricky (as explained above, the water level is arbitrarily fixed by the dam manager): for this reason, its law is calculated via a non-parametric Normal Kernel estimation (Bowman and Azzalini, 1997).As a result, also in this case the Kolmogorov-Smirnov test is passed at all usual levels.the phenomenon.However, we want to stress that this type of graph only provides partial information, and should not be used to draw rough conclusions about the dependence structure of (Q,V,L) -see below, and also Genest and Favre (2007) for a thorough review.A further step concerns the investigation of the joint behavior of the variables (Q,V,L): as is typical in copula analysis, we shall use the normalized ranks to carry out a nonparametric study.The trivariate rank-plot shown in Fig. 2 provides some rough indications about the global dependence structure (i.e., the copula) linking the three variables (Q,V,L): as pointed out in Remark 1, this graph is not equivalent to the trivariate plot shown in Fig. 1.
As already mentioned above, there are physical reasons to assume that L is independent of (Q,V ): the rank-plots shown in Fig. 2 support this fact.Indeed, the sample is rather uniformly sparse in both the (Q,L) and (V,L) planes.Also, the estimates of the Kendall's τ and the Spearman's ρ are not statistically significant (as confirmed by the corresponding p-values), and formal tests of independence suggest to accept the hypothesis that L is effectively independent of (Q,V ).On the contrary, the variables (Q,V ) are significantly positively associated (i.e., concordant), and thus Q and V are not independent: the estimates of both the Kendall's τ and the Spearman's ρ are large, and the corresponding p-values are negligible (see the values reported in Fig. 2).
As in De Michele et al. (2005), a Gumbel copula was used to model the dependence between Q and V , with parameter θ ≈ 3.1378, calculated via the inversion of the Kendall's τ.The ability of this copula to model the available bivariate data is checked via robust multivariate goodness-of-fit tests (Genest et al., 2009;Berg, 2009): the resulting large p-values indicate that the Gumbel copula C QV cannot be rejected at all standard levels.Furthermore, the analysis of the (Q,V ) rank-plot in Fig. 2 shows a significant association between these two variables in the upper-right corner of the unit square: in fact, the extreme pairs practically lie on the main diagonal.Thus, it is not a surprise that the fitted Gumbel copula, having a large upper tail dependence coefficient λ Upp ≈ 0.75 (Nelsen, 2006;Salvadori et al., 2007) is suitable for modeling the Figures

Back Close
Full dependence structure of the pair (Q,V ).In passing, note that C QV is an Extreme Value copula (Nelsen, 2006): since both F Q and F V are GEV distributions, it turns out that ) is a bivariate Extreme Value law (after all, Q and V are annual maxima).
Given the previous results, since L can be assumed to be independent of (Q,V ), it is immediate to construct a suitable trivariate copula C QV L to model the dependence structure of the triple (Q,V,L): where (u,v,w) ∈ I 3 .As above, the ability of this copula to model the trivariate data is properly checked, and the resulting large p-values indicate that it cannot be rejected at all standard levels.In passing, note that also C QV L is an Extreme Value copula.In addition, since C QV is Archimedean (Nelsen, 2006), then C QV L is a particular case of a "nested" Archimedean copula (Joe, 1997;Grimaldi and Serinaldi, 2006;Hering et al., 2010;H ärdle and Okhrin, 2010).However,

Multivariate return period
In order to provide a consistent theory of multivariate RP's, it is first necessary to precisely define the abstract framework where to embed the question.Hereinafter, we shall consider as the object of our investigation a sequence X = {X 1 ,X 2 ,...} of independent and identically distributed d -dimensional random vectors, with d > 1: thus, each In applications, usually, the event of interest is of the type {X ∈ C}, where C is a nonempty Borel set in R d collecting all the values judged to be "critical" according to some suitable criterion.Note that the Borel family includes all the sets of interest in practice (like, e.g., the intervals (−∞,x 1 ),(x 1 ,x 2 ),(x 2 ,∞), as well as the corresponding multivariate versions).Let µ > 0 be the average inter-arrival time of the realizations in X (viz., µ is the average time elapsing between X i and X i +1 ).Following, e.g., Embrechts et al. (2003), it is clear that the univariate r.v.'s {B i = I C (X i )} form a Bernoulli process (where I is an indicator set function), with positive probability of "success" p C given by Then, it makes sense to calculate the first random time A C that the sequence B = {B 1 ,B 2 ,...}, generated by X, takes on the value 1 (viz., the first random time that X enters the "critical" set C): Clearly, A C follows a Geometric distribution with parameter p C , and therefore its expected value is Given the well known "memoryless property" of the Geometric distribution, and the features of the Bernoulli process (see, e.g., Feller, 1971), it is clear that µ C also corresponds to the average inter-arrival time between two successive realizations of the event {X ∈ C}.Evidently, µ C ranges in [µ,+∞): for example, if annual maxima are investigated, then µ = 1 year, and hence µ C = 1/p C ≥ µ.We are now ready to introduce a consistent notion of RP.DEFINITION 1.The RP associated with the critical set C ⊂ R d is given by µ C .REMARK 2. Definition 1 is a very general one: the critical set C may be constructed in order to satisfy broad requirements, useful in different applications.Indeed, most of Introduction

Conclusions References
Tables Figures

Back Close
Full the approaches already present in literature are particular cases of the one outlined above.
As a univariate example, let X be a r.v. with distribution F X , and let x * denote a prescribed critical design value.Then, e.g., in hydrology, if droughts are of concern, x * may represent a small value of river flow, and the critical realizations of interest are those for which X < x * (viz., C = [0,x * )).Instead, if floods are of concern, x * may indicate a large value of river flow, and the critical realizations of interest are those for which According to Definition 1 and Eq. ( 7), the corresponding RP's are µ/F X (x * ) in the former case, and µ/(1 − F X (x * )) in the latter one.
REMARK 3. The examples given above emphasize the two basic elements necessary (and sufficient) in the traditional theory of univariate return periods to define the events of interest: (i) a design threshold (here, x * ), and (ii) a total order relation (here, "≤" on R).The same approach can be also adopted in the multivariate case, provided that consistent notions of multivariate threshold and total order are properly introduced in R d .
The origin of all difficulties in introducing a coherent notion of multivariate RP is the lack of a "natural" total order in multi-dimensional Euclidean spaces.Unfortunately, this basic problem is not clear to practitioners, and needs to be fixed before proceeding, since it represents a pivotal step.We briefly recall that, if a binary order relation is a total order on a given set, then any pair of elements in the set are mutually comparable under F might be a payoff or an utility function (see, e.g., Keeney and Raiffa, 1976;Belloni and Winkler, 2011).Before discussing the practical applications and consequences of Definition 2 we need to introduce the following notion.DEFINITION 3. Given a d -dimensional distribution F = C(F 1 ,...,F d ) and t ∈ (0,1), the critical layer L F t of level t is defined as Clearly, L F t is the iso-hyper-surface (having dimension d −1) where F equals the constant value t: thus, L F t is a (iso)line for bivariate distributions, a (iso)surface for trivariate ones, and so on.Evidently, for any given x ∈ R d , there exists a unique critical layer L F t supporting x (say, using a quick-and-dirty notation, L F x ): namely, the one identified by the level t = F(x).Note that, thanks to Eq. ( 2), there exists a one-to-one correspondence between the two iso-hyper-surfaces The critical layer L F t partitions R d into three non-overlapping and exhaustive regions: 1. R < t , the sub-critical region, containing all the points that (in the F -order sense) are smaller than any point on L F t ; 2. L F t , the critical layer, where F ≡ t; 3. R > t , the super-critical region, containing all the points that (in the F -order sense) are larger than any point on L F t .
Practically, at any occurence of the phenomenon, only three mutually exclusive and exhaustive things may happen: either the realization lies in the sub-critical region, or it is critical, or it lies in the super-critical region.Note that all the three regions R Thanks to the above discussion, it is now clear how to introduce consistent notions of multivariate threshold, i.e.L F t , and total order, i.e.

F
, in R d .Obviously, many others are possible, but those given above are quite "natural" ones, since essentially they depend upon the multivariate distribution F ruling the stochastic dynamics of the phenomenon under investigation.In particular, a variant of Definition 2 can be given as follows: where r and s are, respectively, the levels of the critical layers associated with x and y.Then, the following notion of multivariate RP is analogous to the one used in the univariate framework.DEFINITION 4. Let X be a multivariate r.v. with distribution F = C(F 1 ,...,F d ).Also, let L F t be the critical layer supporting X, and R > t be the corresponding super-critical region.Then, the (super-critical) RP T X associated with X is defined as Clearly, an analogous definition can also be given by considering a sub-critical region R < t .Now, in view of the results outlined in Nelsen et al. (2001Nelsen et al. ( , 2003)), it is immediate to show that where ν F is the probability measure induced by F over R Instead, all the realizations associated with L F t share the same KRP κ X .
For the sake of convenience, we report below the algorithm explained in Salvadori and De Michele (2010) for the calculation of K C (see also Genest and Rivest, 1993), which yields a consistent maximum-likelihood estimator of K C .Here we assume that the copula model is well specified, i.e. it is available in a parametric form.ALGORITHM 1. Calculation of the Kendall's measure function K C .

For
As an illustration, in Fig. 3a we plot an estimate of the function K C associated with the copula C QV L : here Algorithm 1 is used, running a simulation of size m = 5 × 10 7 (thus, almost a "continuous" approximation is achieved).Also shown is the empirical estimate of K C calculated by using the available observations: the horizontal ties are simply due to the small sample size.
Roughly, if the super-critical region R > t is used to identify the "dangerous" occurrences (viz., involving the events "exceeding" the prescribed critical layer, as in the Figures

Back Close
Full approach outlined above), then, the sub-critical ones are safe realizations, whereas the critical ones are alert events, and the super-critical ones are potentially destructive occurrences.As a consequence, a notion of criticality (total) order can be introduced in R d via the following formula: In simple words, x is "less critical" than y if its KRP is smaller than the one of y: this exactly corresponds to the intuitive meaning of the concept of return period.Conversely, y is "more critical" than x if it is more improbable to exceed the critical layer associated with y than the one associated with x.

Multivariate quantile
Traditionally, in the univariate framework, once a RP (say, T ) is fixed (e.g., by design or regulation constraints), the corresponding critical probability level p is calculated as 1 − p = P(X > x p ) = µ/T , and by inverting F X it is then immediate to obtain a critical quantile x p = F (−1) X (p), which is usually unique.Then, x p is used in practice for design purposes and rational decision making.As shown below, the same approach can also be adopted in a multivariate setting: the next definition is fundamental, since it introduces a coherent notion of multivariate quantile (to be compared with Belzunce et al., 2007).DEFINITION 6.Given a d -dimensional distribution F = C(F 1 ,...,F d ) with d -copula C, and a critical probability level p ∈ I, the critical multivariate quantile q p ∈ I of order p is defined as where

Conclusions References
Tables Figures

Back Close
Full REMARK 6. Definition 6 provides a close analogy with the definition of univariate quantile: indeed, recall that K C is a univariate distribution function (see Eq. 3), and hence q p is simply the quantile of order p of K C .Thanks to Eq. ( 2), it is clear that the critical layer L F q p is the iso-hyper-surface in R d where F takes on the value q p , while L C q p is the corresponding one in I d where the related copula C equals q p .Now, let L F q p be fixed.Then, according to Eq. ( 3), p = K C (q p ) = P(C(F 1 (X 1 ),...,F d (X d )) ≤ q p ).Therefore, p is the probability measure induced by C on the sub-critical region R < q p , while (1 − p) is the one of R > q p .From a practical point of view this means that, in a large simulation of n independent d -dimensional vectors extracted from F, np realizations are expected to lie in R < q p , and the others in R > q p .REMARK 7. It is worth stressing that a common error is to confuse the value of the copula C with the probability induced by C on I d (and, hence, on R d ): on the critical layer L C q p it is C = q p , but the corresponding sub-critical region R < q p has probability p = K C (q p ) = q p , since K C is usually non-linear (the same rationale holds for the super-critical region R > q p ).In other words, while in the univariate case the values of F X correspond to the probabilities induced on the Real line (specifically, on the sub-critical region), this is not so in the multivariate case.As a consequence, F X cannot be freely substituted for C, otherwise inconsistencies have to be expected, and the different roles played by these two functions must not be confused and used improperly (see, e.g., Zhang, 2005;Singh et al., 2007, and below).
Since K C is a probability distribution, and q p is the corresponding quantile of order p, we could use a standard bootstrap technique (see, e.g., Davison and Hinkley, 1997) to estimate q p if it cannot be calculated analytically.The idea is simple, and stems directly from the very definition of q p : viz., to look for the value q p of C such that, in a simulation of size n, np realizations show a copula value less than q p .Then, by performing a large number of independent simulations of size n, the sample average of the estimated q p 's is expected to converge to the true value of q p by virtue of the Law of Large Numbers (see also Genest and Rivest, 1993;Barbe et al., 1996).A possible Introduction

Conclusions References
Tables Figures

Back Close
Full algorithm is given below, most suitable for vectorial software.Here we assume that the copula model is well specified, i.e. it is available in a parametric form.ALGORITHM 2. Calculation of q p .First of all, choose a sample size n, a critical probability level p, the total number of simulations N, and fix the critical index k = np .for i = 1 : N S = sim(C;n); % simulate n d -vectors from copula C C = C(S); % calculate C for simulated vectors C = sort(C); % sort-ascending simulated C values E (i ) = C(k); % store new estimate of q p into vector E q = Mean(E ); % calculate new mean estimate of q p Q(i ) = q; % store new mean estimate of q p into vector Q end Then, once the loop is completed, q provides a consistent estimate of the critical multivariate quantile q p of order p. Practically, Algorithm 2 does the "inverse" task of Algorithm 1.The bootstrap method may also yield an approximate confidence interval for q p (see DiCiccio and Efron, 1996, for more refined solutions): for instance, at a 10 % level, the random interval (q 0.05 ,q 0.95 ) can be used, where q 0.05 and q 0.95 are, respectively, the quantiles of order 5 % and 95 % extracted from the vector Q.Using Algorithm 2 (and setting n = 5 × 10 5 , p = 0.999, and N = 5 × 10 5 , for a total of 250 × 10 9 simulated triples), we estimated q 0.999 ≈ 0.946519 for the copula C QV L of interest here, and a 10 % confidence interval (0.944570,0.948446), a process that took about 4 days of CPU time on a iMac equipped with a 3,06 GHz Intel Core 2 Duo processor and 8 GB RAM.As an illustration, in Fig. 3b we show an estimate of the function K C , associated with the copula C QV L , at the critical quantile t * ≈ 0.946519 (corresponding to a millenary KRP): as expected, the value is almost exactly equal to 99.9 %.
As a further illustration, in Fig. 4 we plot the critical iso-surface L C t * of the trivariate copula C QV L for the critical level t * ≈ 0.946519, corresponding to a regulation return period of 1000 years (viz., all the realizations on L C t * have a KRP equal to 1000 years).Figures

Back Close
Full t * , the one containing the upper corner 1 = (1,1,1).On average, only 0.1% of the realizations extracted from a simulation of C QV L are expected to lie in the super-critical region.However, the level of the critical layer is t * = q 0.999 ≈ 0.946519 < p = 0.999, as indicated by the diamonds markers in the plot.
As a further example, consider that the super-critical region identified by the critical layer L F 0.999 (where the multivariate distribution F, or, equivalently, the copula C, takes on the value 0.999) has an estimated probability smaller than 10 −6 , and a corresponding KRP of about 3 × 10 6 years: practically, only one realization of C QV L out of 3 × 10 6 simulations is expected to lie in such a super-critical region (instead of 1 out of 1000).Evidently, if F (or C) were substituted for K C in Eq. ( 13) during the design phase, then the structure to be constructed would result over-sized (being expected to withstand stunning extreme events), and would yield a waste of money.

Design event
The situation outlined in the previous section is generally similar to the one found in the study of univariate phenomena, where a single r.v.X with distribution F X is used to model the stochastic dynamics.However, as already mentioned, the multivariate case generally fails to provide a natural solution to the problem of identifying a unique design realization.In fact, even if also the layer L F t acts as a (multi-dimensional) critical threshold, there is no natural criterion to select which of the ∞ d −1 points lying on L F t should be used as a design realization.In other words, in a multivariate environment, the sole concept of RP may not be sufficient to identify a design quantile, and additional considerations may be required.Mathematically speaking, while in the univariate framework a single total order over R may suffice to identify a critical design realization, in the Figures

Back Close
Full multivariate case it may be necessary to introduce suitable criteria in order to pick out a "characteristic" realization over the critical layer of interest.In the following, we outline possible ways to carry out such a selection.Clearly, several approaches can be proposed, each one possibly yielding a different solution: below, we show two possible elementary strategies to deal with the problem.
REMARK 8.The design phase should not be confused with the risk assessment one.
In fact, the target of the former one is to provide characteristic realizations useful for planning a structure before it is built (i.e., only the hazard component is taken into account), whereas the latter one aims at pointing up possible critical situations -after the structure has been built -by further introducing the impact ingredient.In simple words, the design phase should only provide the "typical size" of the realizations that the structure to be constructed is expected to withstand during its lifetime in a given meteo-climatic region.
The basic idea is simply to introduce a suitable function (say, w) that "weighs" the realizations lying on the critical layer of interest.Following this approach, the practitioner can then freely choose the criterion (i.e., the function w) that best fits the practical needs.Clearly, without loss of generality, w can be assumed to be non-negative.In turn, a "design realization" can be defined as follows.DEFINITION 7. Let w : provided that the argmax exists and is finite.REMARK 9.In general, the unicity of the maximum may not be guaranteed.When this happens, a recourse to physical/phenomenological considerations, or to additional procedures (like, e.g., maximum information/entropy schemes Jaynes, 2003), may help Figures

Back Close
Full copulas with the same Kendall's τ Ghoudi et al., 1998).However, in general, the critical layers of such copulas will have a different geometry, and, in turn, will provide different design realizations.
In passing, we note that, in the present case study, the distribution F QV L and the copula C QV L are trivariate, and hence the corresponding critical layers are simply twodimensional surfaces in R 3 and I 3 , respectively.Figure 4 shows the critical layer of level t * pertaining to C QV L , and the corresponding one pertaining to F QV L can be drawn by exploiting Eq. ( 2).Now, for the sake of graphical illustration, it is possible to parametrize L F t * in polar coordinates (say, (α,r)) via a one-to-one transformation, and thus re-map and plot any function w defined over L F t * onto the rectangle (0,π/4)×(0, r), for a suitable maximum ray r.In turn, it is rather easy to have a peek of the behavior of any weigh function w over L F t * .REMARK 10.A delicate problem may arise when adopting the approach outlined above: to make the point clear, consider the following example.Suppose that we use the duration of a storm and the storm intensity as the two variables of interest.In a fast responding system (e.g., a sewer structure), a storm having short duration but high intensity may cause a failure, whereas the same storm may not cause any problem at a catchment level.In the catchment, however, a storm with long duration and intermediate to low intensity may cause a flood event, whereas the same storm does not cause any problem to the sewer system.Now, as a matter of principle, the design realization δ w for the given return period (i.e., the "typical" critical storm calculated according to the strategy illustrated here) may not cause any problem in both systems, and therefore these would be wrongly designed.Practically, the sewer systems should be designed using critical design storms of short durations and high intensities, whereas a structure in the main river of the watershed should be designed using storms of long durations and low intensities.However, the problem is more apparent than real.In fact, there are neither theoretical nor practical limitations to restrict the search for the maxima in Eq. ( 16) over a suitable sub-region of L F t * : remember that all the realizations on the critical layer share the same prescribed KRP.Thus, when a sewer system is of concern, Introduction

Conclusions References
Tables Figures

Back Close
Full only storms having short durations and high intensities could be considered, whereas a critical design storm for a structure in the main river could be spotted by restricting the attention to storms of long durations and low intensities.Roughly speaking, in the approach outlined here, the calculation of the critical design realization can be made dependent on both the environment in which a structure should be designed, as well as on the stochastic dynamics of the phenomenon under investigation.
For the sake of illustration, below we introduce two elementary weigh functions.

Component-wise excess design realization
A realization lying on the critical layer L F t may be marked as critical when all of its marginal components are exceeded with the largest probability.In simple words, we suggest to look for the point(s) x = (x 1 ,...,x d ) ∈ L F t such that it is maximum the probability that a super-critical realization y = (y 1 ,...,y d ) satisfies all the following componentwise inequalities: or y > x using a simplified notation.The next definition is immediate.
DEFINITION 8.The Component-wise excess weigh function w CE is defined as where X has distribution F = C(F 1 ,...,F d ), and [x,∞) is the hyper-rectangle in R d whose points satisfy all the inequalities stated in Eq. ( 17).Then, by restricting our attention to the critical layer L where t ∈ (0,1).REMARK 11.Via the Probability Integral Transform and Sklar's Theorem, it is easy to show that where U has the same copula C as of X and Uniform marginals, u(x) = (F 1 (x 1 ),...,F d (x d )), and [u,1] is the hyper-rectangle in I d with "lower" corner u and "upper" corner 1.Thus, the probabilities of interest can be directly computed in the unit hyper-cube (see, e.g., Joe, 1997) by working directly on the critical layer L C t (instead of L F t ), a solution numerically more convenient.Note that, for large d -dimensional problems, the CPU time involved may become prohibitive, though clever solutions have been proposed for d 1 (see, e.g., Cherubini and Romagnoli, 2009).In some cases, δ CE can be calculated analytically; otherwise, it can be empirically estimated (e.g., by performing a suitable sampling over In Fig. 5 we show the behavior of w CE over L F t * , as well as the Component-wise Excess design realization δ CE (t * ) calculated for the case study investigated here.This point has the largest probability to be component-wise exceeded by a super-critical millenary realization, and therefore it should be regarded as a sort of (statistical) "safety lower-bound": viz., the structure under design should, at least, withstand realizations having (multivariate) size δ CE (t * ), as reported in Table 2.
As anticipated in Sect.2, using the design realization δ CE (t * ), we operated the reservoir routing of the corresponding flood hydrograph.Then, we checked whether or not the reservoir level exceeds the crest level of the dam (i.e., 784 m).The column "M.W.L." in Table 2 reports the value 782.11 m: thus, no over-topping occurs, i.e. the dam seems to be safe against Component-Wise Excess millenary realizations.Introduction

Conclusions References
Tables Figures

Back Close
Full

Most-likely design realization
A further approach to the definition of a characteristic design event consists in taking into account the density of the multivariate distribution describing the overall statistics of the phenomenon investigated: in fact, assuming that the density f of F is well defined over L F t , we may think of using it as a weigh function.
Clearly, the restriction f t of f over L F t is not a proper density, since it does not integrate to one.However, it may provide useful information, since it induces a (weak) form of likelihood over L F t : in fact, it can be used to weigh the realizations lying on L F t , and spot those that are (relatively) "more likely" than others.Indeed, f t inherits all the features of interest here directly from the true global density f.The next definition is immediate.
DEFINITION 10.The Most-Likely weigh function w ML is defined as where f is the density of F = C(F 1 ,...,F d ).
Then, by restricting our attention to the critical layer L F t , the following definition is immediate.
DEFINITION 11.The Most-Likely design realization δ ML of level t is defined as where t ∈ (0,1).REMARK 12.As a rough interpretation, δ ML plays the role as of a "characteristic critical realization", i.e. the one that has to be expected if a critical event with given KRP happens.In some cases, δ ML can be calculated analytically; otherwise, it can be empirically estimated (e.g., by performing a suitable sampling of f over L F t ).In general, provided that weak regularity conditions are satisfied, f can be calculated by using the marginal densities f i 's of X, and the density c = Since our target is to compare the "weight" of different realizations, from a computational point of view it may be better to minimize −ln(f) over L F t (since the maxima are preserved).
As an illustration, in the present (absolutely continuos) case, the expression of the trivariate density f QV L is given by where (x,y,z) ∈ R 3 , and c QV is the density of the Gumbel copula modeling the pair (Q,V ).In Fig. 6 we show the behavior of (the logarithm of) w ML (i.e., f QV L ) over L F t * , as well as the Most-Likely design realization δ ML (t * ) calculated for the case study investigated here.The actual values of the function w ML are irrelevant (since they do not represent a true density): in fact, we are only interested in spotting where f QV L is maximal.Therefore, the Most-Likely design realization could be regarded as the "typical" critical realization: viz., the structure under design should be expected to withstand critical events having (multivariate) size δ ML (t * ), as reported in Table 2. Introduction

Conclusions References
Tables Figures

Back Close
Full

Conclusions
Before concluding, another interesting test can be carried out.In fact, as a further possible strategy, suppose that a critical design realization δ 1D = (x 0.999 ,y 0.999 ,z 0.999 ) is defined in terms of the millenary univariate quantiles of the three variables of interest here (see the last row of Table 2).In turn, the layer L supporting δ 1D has a critical level t * 1D ≈ 0.997754 (see Fig. 3b), corresponding to a value of K C (t * 1D ) ≈ 0.999998, and a KRP of about 5 × 10 5 years.It is then immediate to realize that, in order to provide a true millenary multivariate design realization, it may not be enough (or necessary) to rely upon millenary univariate quantiles.Also, operating the reservoir routing using δ 1D , yields a reservoir level of about 784.80 m (see Table 2), which may cause an over-topping and a dam failure.
The example given above, as well as the illustrations presented in Sect.5, may suggest the following empirical consideration (which, however, should be taken with care).Both the millenary multivariate design realizations δ CE and δ ML yielded a maximum water level of about 782 m, whereas δ 1D (with a KRP of the order of 10 5 years) generated a maximum level over-topping the dam crest by only about 80 cm.Thus, apparently, the dam is over-sized, i.e. it could withstand events with a RP much larger than 1000 years.Clearly, from the safety point of view, this is a good news.On the other hand, a smaller structure, correctly sized for withstanding true millenary multivariate events, would probably have costed less money.
In summary, this paper is of methodological nature, and introduces original techniques for the calculation of multivariate design quantiles.Preliminary studies concerning multivariate RP's can be found in Salvadori (2004) -by providing consistent frameworks (the total order F ) and techniques (the weighing functions on the critical layers) to address the identification of the critical design events when several dependent variables are involved.In particular, the "CE" and "ML"design values may provide basic realizations with given KRP, of interest in multivariate design problems.To the best of our knowledge, this is the first time that a similar study is presented, and further studies are necessary.Overall, we believe that this paper provides a significant advancement, and should serve as a guideline for further researches and applications in all areas of water resources and civil engineering, as well as in geophysical and environmental sciences.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full   2.
Discussion Paper | Discussion Paper | Discussion Paper |

F
(x 1 ,...,x d ) = C(F 1 (x 1 ),...,F d (x d )) (1) for all x ∈ R d , where the F i 's are the univariate margins of F. If all the F i 's are continuous, then C is unique.Most importantly, the F i 's in Eq. (1) only play the role of (geometrically) re-mapping the probabilities induced by C on the subsets of I d onto suitable subsets of R d , without changing their values: viz., the dependence structure modeled by C plays a central role in tuning the probabilities of joint occurrences.In fact, under weak regularity conditions, any point x ∈ R d can be uniquely re-mapped over u ∈ I d (and vice-versa) via the Probability Integral Transform: (u 1 ,...,u d ) = (F 1 (x 1 ),...,F d (x d )).Discussion Paper | Discussion Paper | Discussion Paper | . The sample mean of L is about 780.44 m a.s.l., with a sample standard deviation of about 1 m a.s.l.The small variability of L with respect to its range (here [774.75,780.75] is the regulation range), is mainly due to the management policy of the reservoir: the target of the dam manager is to keep a high Discussion Paper | Discussion Paper | Discussion Paper |

REMARK 1 .
The trivariate plot of the observations, as shown in Fig. 1, is the first step usually carried out by unskilled practitioners to investigate the multivariate behavior of 5528 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

X
i has the same multivariate distribution F as of the random vector X ∼ F = C(F 1 ,...,F d ) describing the phenomenon under investigation, with suitable marginals F i 's and dcopula C. For example, we may think of a set of flood observations given by the pairs of non-independent r.v.'s Flood Peak -Flood Volume, joined by the copula C. The case of a non-stationary sequence X is rather tricky, and will be discussed in a future work.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

F
the relation.The following definition introduces a total order in R d .DEFINITION 2. Let x,y ∈ R d , and let F = C(F 1 ,...,F d ) be a multivariate distribution.Then, by definition, is called the total order induced by F over R d .REMARK 4. The idea behind Definition 2 can be found, for instance, in Decision Analysis, when the effects in a decision-making problem are multi-dimensional in nature, and Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | d , and K C is the Kendall's distribution function associated with C (see Eq. 3 and the ensuing discussion).Kendall's RP of the realization X belonging to L F t (hereinafter, KRP).REMARK 5. Clearly, κ X is implicitly a function of the critical level t, uniquely identified by the relation t = F(X).The KRP partitions the sample space R d into three disjoint and exhaustive regions: namely, , all the sub-critical realizations (i.e., those Y ∈ R < t ) have a KRP κ Y < κ X , whereas all the super-critical ones (i.e., those Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |Then, C QV L = t * for all points belonging to L C t * .Instead, C QV L < t * (and κ X < 1000 years) in the region R < t * "below" L C t * , the one containing the origin 0 = (0,0,0), whereas C QV L > t * (and κ X > 1000 years) in the region R > t * "above" L C Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

F
The Component-wise Excess design realization δ CE of level t is defined as δ CE (t) = argmax Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ∂ d ∂u 1 •••∂u d C(u 1 ,...,u d ) of the copula C: Discussion Paper | Discussion Paper | Discussion Paper | f Discussion Paper | Discussion Paper | Discussion Paper | ; Salvadori and De Michele (2004); Durante and Salvadori (2010); Salvadori and De Michele (2010), and some applications are presented in De Michele et al. (2007); Vandenberghe et al. (2010); Salvadori and De Michele (2010).In this work we made an effort to reduce the troublesome nature of multivariate analysis -which has always limited its practical application Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 2 .
Fig. 2. Trivariate rank-plot of the available (Q,V,L) observations, and bivariate rank-plots of the marginals -see text.Also shown are the estimates of the Kendall's τ and the Spearman's ρ, as well as the corresponding p-values (derived from non-parametric tests of independence based on rank statistics).

Fig. 3 .Fig. 4 .Fig. 5 .Fig. 6 .
Fig. 3. (a) Simulation-based estimate of the function K C (continuous line) associated with the copula C QV L ; also shown is its empirical estimate (markers) calculated by using the available observations -see text.(b) Plot of the (millenary KRP) multivariate quantile t * ≈ 0.946519 (thick-dashed line) associated with the critical probability level p = 0.999; also shown (thindashed line) is the estimate of the value K C ≈ 0.999998 associated with the critical level t * 1D ≈ 0.997754 -see text.
Michele et al. (2005) of methodological nature, we feel important to illustrate with practical examples the new concepts introduced.For this reason, we first present the case study that we shall use throughout the paper.The data are collected at the Ceppo Morelli dam, and are essentially the same as those investigated in DeMichele et al. (2005), to which we make reference for further details.The dam, completed in 1929, is located in the valley of Anza catchment, a subbasin of the Toce river (Northern Italy), and was built to produce hydroelectric energy.The dam is characterized by a small water storage of about 0.47 × 10 6 m 3 .The min- Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |2 The case study imum level of regulation is 774.75 m a.s.l., while the maximum is 780.75 m a.s.l.The maximum water level is at 782.5 m a.s.l., and the dam crest level is at 784 m a.s.l.The dam has an uncontrolled spillway (84 m long) at 780.75 m a.s.l., and also intermediate and bottom outlets (the latter ones are obstructed by river sediments).In De Michele et al.

Table 1 .
Maximum-likelihood estimates of the GEV parameters for Q and V , and corresponding 95% confidence intervals.