UvA-DARE ( Digital Academic Repository ) DREAM ( D ) : An adaptive Markov chain Monte Carlo simulation algorithm to solve discrete , noncontinuous , and combinatorial posterior parameter estimation problems

Formal and informal Bayesian approaches have found widespread implementation and use in environmental modeling to summarize parameter and predictive uncertainty. Successful implementation of these methods relies heavily on the availability of efficient sampling methods that approximate, as closely and consistently as possible the (evolving) posterior target distribution. Much of this work has focused on continuous variables that can take on any value within their prior defined ranges. Here, we introduce theory and concepts of a discrete sampling method that resolves the parameter space at fixed points. This new code, entitled DREAM(D) uses the recently developed DREAM algorithm (Vrugt et al., 2008, 2009a,b) as its main building block but implements two novel proposal distributions to help solve discrete and combinatorial optimization problems. This novel MCMC sampler maintains detailed balance and ergodicity, and is especially designed to resolve the emerging class of optimal experimental design problems. Three different case studies involving a Sudoku puzzle, soil water retention curve, and rainfall – runoff model calibration problem are used to benchmark the performance of DREAM (D). The theory and concepts developed herein can be easily integrated into other (adaptive) MCMC algorithms. Correspondence to: J. A. Vrugt (jasper@uci.edu)


Introduction
Formal and informal Bayesian methods have found widespread application and use to summarize parameter and model predictive uncertainty in hydrologic modeling. These parameters generally represent model dynamics, but could also include rainfall multipliers Kuczera et al., 2006;Vrugt et al., 2008), error model variables (Smith et al., 2008;Schoups and Vrugt, 2010), and calibration data measurement errors (Sorooshian and Dracup, 1980;Schaefli et al., 2007;Vrugt et al., 2008). Monte Carlo methods are admirably suited to generate samples from the posterior parameter distribution, but generally inefficient when confronted with complex, multimodal, and high-dimensional model-data synthesis problems. This has stimulated the development of Markov Chain Monte Carlo (MCMC) methods that generate a random walk through the search (parameter) space and iteratively visit solutions with stable frequencies stemming from an invariant probability distribution. If well designed, such MCMC methods should be more efficient than brute force Monte Carlo or importance sampling methods.
To visit configurations with a stable frequency, an MCMC algorithm generates trial moves from the current position of the Markov chain x t to a new state z. The earliest MCMC approach is the random walk Metropolis (RWM) algorithm (Metropolis et al., 1953). Assume that we have already sampled points {x 0 ,...,x t } this algorithms proceeds in the following three steps. First, a candidate point z is sampled from a proposal distribution q(·) that depends on the present location x t . Next, the candidate point is accepted with acceptance probability, α(z,x t ) (Metropolis et al., 1953;Hastings, 1970): where p(·) represents the posterior density, and q(x t → z) (q(z → x t )) denotes the conditional probability of the forward (backward) jump. This last ratio cancels out if a symmetric proposal distribution is used. Finally, if the proposal is accepted the chain moves to z, otherwise the chain remains at its current location x t . Following a so called burn-in period (of say, l steps), the chain approaches its stationary distribution and the vector {x l+1 ,...,x l+m } contains samples from π(·). The desired summary of the posterior distribution, π(x) is then obtained from this sample of m points. In Bayesian applications, π(·) is the distribution of partially unknown parameters given the data at hand, and is obtained by combining the prior distribution and the data likelihood. The dependence of π(·) on any fixed data is assumed throughout. The standard RWM algorithm has been designed to maintain detailed balance with respect to π(·) at each individual step in the chain: where π(x t ) (π(z)) denotes the probability of finding the system in state x t (z), and p(x t → z) (p(z → x t )) denotes the conditional probability to perform a trial move from x t to z (z to x t ). The detailed balance condition essentially ensures that the samples of {x l+1 ,...,x l+m } are exactly distributed according to the target distribution, π(x). Existing theory and experiments prove convergence of well-constructed MCMC schemes to the appropriate limiting distribution under a variety of different conditions. In practice, this convergence is often observed to be impractically slow. This deficiency is frequently caused by an inappropriate selection of q(·) used to generate trial moves in the Markov Chain. This inspired Vrugt et al. (2008Vrugt et al. ( , 2009a to develop a simple adaptive RWM algorithm called Differential Evolution Adaptive Metropolis (DREAM) that runs multiple chains simultaneously for global exploration, and automatically tunes the scale and orientation of the proposal distribution during the evolution to the posterior distribution. This scheme is an adaptation of the Shuffled Complex Evolution Metropolis (Vrugt et al., 2003) global optimization algorithm and has the advantage of maintaining detailed balance and ergodicity while showing excellent efficiency on complex, highly nonlinear, and multimodal target distributions (Vrugt et al., , 2009a.
In DREAM, N different Markov Chains are run simultaneously in parallel. If the state of a single chain is given by a single d-dimensional vector x, then at each generation the N chains in DREAM define a population X, which corresponds to an N x d matrix, with each chain as a row. Jumps in each chain i = {1,...,N} are generated by adding a multiple of the difference of the states of randomly chosen pairs of chains of X to the current state x i : where δ signifies the number of pairs used to generate the proposal, x r 1 (j ) and x r 2 (n) are randomly selected without replacement from the population X −i t (the population without x i t ); r 1 (j ),r 2 (n) ∈ {1,...,N} and r 1 (j ) = r 2 (n). The values of e d and d are drawn independently from U d (−b,b) and N d (0,b * ) with, typically, b = 0.1 and b * small compared to the width of the target distribution, b * = 10 −12 say. Not all dimensions of x i need to be updated in each step, so some dimensions of z i may be reset to those of x i . The value of the jump-size, γ , depends on δ and d , the number of dimensions updated jointly in the step. By comparison with RWM, a good choice for γ = 2.4/ √ 2δd (Roberts and Rosenthal, 2001;Ter Braak, 2006). This choice is expected, for Gaussian and Student target distributions, to yield an acceptance probability of 0.44 for d' = 1, 0.28 for d' = 5 and 0.23 for large d'. Every 5th generation γ = 1.0 to facilitate jumping between disconnected posterior modes .
The difference vector in Eq.
(3) contains the desired information about the scale and orientation of the target distribution, π(x). By accepting each jump with the Metropolis ratio min π(z i )/π(x i ),1 , a Markov chain is obtained, the stationary or limiting distribution of which is the posterior distribution. The proof of this is given in Ter Braak and  and Vrugt et al. (2008Vrugt et al. ( , 2009a. Because the joint pdf of the N chains factorizes to π(x 1 ) × ... × π(x N ), the states x 1 ...x N of the individual chains are independent at any generation after DREAM has become independent of its initial value. After this burn-in period, the convergence of DREAM can thus be monitored with theR-statistic of Gelman and Rubin (1992). This convergence diagnostic compares the within and in-between variances of the N different chains.
Although much progress has been made in the past decade towards the application of Bayesian methods to quantify and analyze parameter, model structural, forcing, and calibration data uncertainty, virtually all this research involved continuous parameters that can take on any value within their prior defined ranges. Much less attention has been given hitherto to posterior sampling problems involving discrete variables. Such non-continuous parameter estimation problems are not only of considerable theoretical interest, but also of practical significance. For instance, contributions are beginning to appear in the hydrologic literature that attempt to discern optimal experimental design strategies that minimize cost, parameter and model predictive uncertainty, or combinations thereof. This involves selecting one or multiple different measurement locations in time and/or space amongst a discrete set of possibilities. Various algorithms have been proposed in the applied mathematics and computer science literature that solve problems of this kind, yet their main focus is on finding the optimal solution, without recourse to estimating the underlying posterior uncertainty. This is of particular importance in experimental design problems, where one cannot convincingly claim that one set of measurements is significantly better than another plausible combination of observations.
In this paper we present a discrete implementation of DREAM, that is especially designed to efficiently retrieve the posterior distribution of noncontinuous and combinatorial search and optimization problems. This new code, hereafter referred to as DREAM (D) uses DREAM as its main building block, and implements three novel proposal distributions to explicitly recognize differences in topology between discrete and Euclidean search spaces. This sampling method maintains detailed balance and ergodicity, and provides explicit information about the posterior uncertainty of the optimal solution. Within the context of optimal experimental design problems this would provide very valuable information about which measurements are absolutely required, and which data are of less importance. The width of the marginal posterior distribution essentially conveys this information.
The remainder of this paper is organized as follows. Section 2 presents a short introduction to discrete optimization problems, followed by a detailed description of DREAM (D) in Sect. 3. Section 4 demonstrates the performance of DREAM (D) using three different case studies involving a Sudoku puzzle, water retention curve and rainfall-runoff model calibration problem. These results illustrate the ability of DREAM (D) to solve search and optimization problems involving discrete, combinatorial, and experimental design problems. Finally, in Sect. 5 we summarize the theory, concepts and applications presented herein.

Nonlinear optimization involving discrete variables
Discrete optimization problems are abundant in many fields of study, and have also started to begin to appear in the hydrologic literature (Furman et al., 2004;Harmancioglu et al., 2004;Perrin et al., 2008;Kleidorfer et al., 2008;Neuman et al., 2011). Figure 1 illustrates a simple discrete parameter estimation problem involving a tile puzzle. Each tile contains a different letter of the alphabet. The goal is to list the letters in the appropriate order. The solution to this problem is obvious to a human, but not immediately clear to a computer. A search algorithm is therefore required to solve this problem.
If we assign numbers to each letter, A = 1, B = 2 and so forth, we could measure the distance from our initial guess to the actual solution, and iteratively refine this solution by sampling from a (discrete) proposal distribution. Many algorithms have been developed in the past decades to efficiently resolve problems of this kind. Yet, the main trust of these algorithms is on finding a single optimum solution, without recourse to estimating the underlying posterior uncertainty. For example, within the context of the traveling salesman problem many different routes will exist that only deviate marginally from the optimal solution. Brute force sampling of the search space can be used to assess all these plausible routes, but this seems rather inefficient. Instead, a more intelligent search procedure is warranted that more efficiently samples the space of possible solutions. The goal of this paper therefore is to introduce theory and concepts of DREAM (D) , a MCMC simulation algorithm that is especially designed to efficiently retrieve the posterior distribution of discrete and combinatorial search problems. Figure 1a illustrates the application of DREAM (D) to the tile puzzle considered herein. The left panel depicts the initial guess, whereas the remaining three panels ( Fig. 1b-d) illustrate how DREAM (D) translates this random starting point into the final position of the letters. The next sections describe the theory and concepts of DREAM (D) and present three different case studies.

DREAM (D) ⇒ DiffeRential Evolution Adaptive Metropolis with Discrete Sampling
We now describe our new code, entitled DREAM (D) , which uses DREAM as main building block. The algorithmic parameters (with defaults in parentheses) are N (d) the number of chains, δ (1) the number of pairs used to generate the proposal, CR the crossover probability (see later), K (10) the thinning rate in storing samples and T max (10 6 ) the maximum number of generations, typically so large that DREAM (D) automatically stops after convergence has been achieved. Let X be a N × d-matrix with rows x i , i = 1,...,N, that contain the current states of the N different Markov chains and that are iteratively updated in turn in the algorithm. The initial X = [X t ;t = 0] is obtained by drawing N samples from p d (x), the prior distribution. Also, let S be an external archive that stores the elements of X at regular intervals. The initial population [X t ;t = 0] is translated into a sample from the posterior target distribution using the following pseudo code: (a) Draw the number of dimensions to be updated, d , from the binomial distribution with total d and probability CR.
where the function · d rounds each element j = 1,...,d of the jump vector to the nearest integer. The various symbols have been defined after Eq.
The goal is to list the letters in order of the alphabet. Each letter is assigned a different integer value, and the resulting inverse problem is solved using DREAM (D) . The left panel (a) presents the initial state, whereas the remaining three graphs (b-d) demonstrate how DREAM (D) iteratively solves this integer sampling problem.
and replace the corresponding proposal element z i j by x i j . The proposal z i thus updates d elements of x i only.
(d) Compute p(z i ) and accept the candidate points with Metropolis acceptance probability, α(x i ,z i ), (e) If accepted, move the chain to the candidate point, 3. Compute the Gelman-Rubin convergence diagnostic,R j (Gelman and Rubin, 1992) for each dimension j = 1,...,d using the last 50 % of the samples of S.
4. IfR j ≤ 1.2 for j = 1,...,d or t > T max , stop and go to step 5, otherwise set t ← t +K and go to POPULATION EVOLU-TION.
5. Summarize the posterior pdf using S after discarding the initial and burn-in samples.
In words, DREAM (D) runs multiple different chains in parallel, and generates jumps in each individual chain using information from the location of the other N − 1 chains. The rounding operator, · d in Eq. (4) is used to enforce integer values and accommodate discrete search problems. To speed up convergence to the target distribution, DREAM (D) estimates a distribution of CR values during burn-in that favors large jumps over smaller ones in each of the N chains. Details can be found in previous studies (Vrugt et al., , 2009a(Vrugt et al., ,b, 2011b. The transition kernel of DREAM (D) is especially designed for integer variables. This appears to be a major limitation as many discrete search problems involve non-integer variables. For instance, consider the discrete variable x 1 ∈ [0,0.2,0.4,...,5]. We cannot sample this parameter with the proposal distribution presented in Eq. (4). A simple linear transformation of the search space, x 1 = 0.2j ;j ∈ {0,...,25} suffices to facilitate inference with DREAM (D) . If appropriate, a different transformation can be used for each individual parameter. An extreme case would simultaneously involve discrete and continuous variables. This would constitute some combination of jumps generated with DREAM and DREAM (D) .

DREAM (D) ⇒ detailed balance?
We are now left with a proof that DREAM (D) yields an invariant distribution that is identical to the posterior target of interest. For this we need to demonstrate that the transition kernel of DREAM maintains detailed balance, and thus results in a reversible Markov chain. In other words, the forward p(x i → z i ) and backward p(z i → x i ) jump should have equal probability at every single step in the chain. This is easy to proof for a standard RWM algorithm that uses a fixed proposal distribution. Hence, the forward and backward jump will exhibit equal probability. Yet, the jump distribution used in DREAM (D) continuously changes scale and orientation en route to the posterior target distribution. This adaptation significantly enhances search efficiency, but does this also ensure reversibility of the sampled Markov chains?
Previous manuscripts have provided formal proofs of convergence of DREAM , DREAM (ZS) (Vrugt et al., 2011b), and MT-DREAM (ZS)  to the appropriate limiting distribution. These proofs appear rather simple, but might not be easy to understand for those that are not directly familiar with the underlying statistics and mathematics. We therefore resort to a simple hypothetical example. Lets consider a two-dimensional discrete sampling problem with N = 5 different chains, and thus Without loss of generality, lets further assume that δ = 1, and e j = j = 0; j = 1,...,2. We randomly sample each individual chain from the prior parameter distribution, and color code their initial position in Fig. 2. We now use the transition kernel of DREAM (ZS) to generate candidate points. Lets assume that we select the blue and green chain as r 1 and r 2 respectively. . The color dots denote the starting points of the N = 5 different chains. The red square signifies the candidate point of the first chain and is created by selecting the blue chain as r 1 and green chain as r 2 . One step later, after moving to the red square, the backward jump has equal probability, as there is an equal chance of drawing r 1 (green) and r 2 (blue) in reversed order.
The candidate point of the first (purple) chain thus becomes: This proposal point is indicated in Fig. 2 with the red square.
For the sake of this illustration, lets now assume that we accept this candidate point, and transition the first chain to this new state.
We now need to demonstrate that the reverse jump has equal probability. This backward jump can be obtained by selecting the blue and green chain in the opposite order from the forward jump: This point is exactly similar to the initial state of chain one prior to the forward jump. Apparently, the rounding operator used to sample integer values does not destroy detailed balance. By sampling r 1 and r 2 from the remaining N −1 chains with a uniform random number generator enforces symmetry of the proposal distribution. Indeed, the chance to select r 1 as chain two, and r 2 as chain four is equal to drawing r 2 = 4, and r 2 = 2 in the reverse order. This also holds when e d , and d are drawn from their respective symmetric probability distributions, and when δ > 1 and d > 2. We leave this up to the reader. This concludes our proof of detailed balance.
A final remark is appropriate. Theoretically, it is possible that at least one of the d arguments of the rounding function, · d has a fractional part of .5. In this case, convention dictates to round down to the nearest integer. This directed rounding introduces a possible bias in jumping direction and thus strictly speaking violates detailed balance. The chance that this happens in practice is virtually zero. If nothing else, the stochastic nature of e d ∼ U d (−b,b), and d ∼ N d (0,b * ) will eliminate this possibility. But, in the rare event that any argument of · d has a factional part of .5 we implement stochastic rounding to the nearest integer. This guarantees reversibility of the Markov chains generated with DREAM (D) .

Combinatorial search problems: proposal distribution using position swapping
The parallel direction update of Eq. (4) works well for a range of discrete problems but is not necessary optimal for combinatorial problems in which the values of the final solution are known a-priori but not their order in the parameter vector. The topology of such search problems differs substantially from the Euclidean search problems considered hitherto. We therefore introduce an alternative jump distribution that creates candidate points by randomly swapping two coordinates in each individual chain. 3. Swap the j th and k th coordinates of z i ; y i j = x i k and y i k = x i j .
4. Compute π(z i ) and accept the candidate points with Metropolis acceptance probability, α(x i ,z i ).
5. If accepted, move the chain to the candidate point, x i = z i , otherwise remain at the old location, x i .

END FOR (CHAIN EVOLUTION) END RANDOM SWAP
In words, if the current state of the ith chain is given by where j,k are discrete uniformly sampled from {1,...,d} without replacement. It is straightforward to see that this proposal distribution satisfies detailed balance as the forward and backward jump have equal probability.
A swap update is admirably suited to solve the tile problem considered previously in Fig. 1. Yet, the coordinate swaps are fully random and do not exploit any information about the topology of the solution encapsulated in the position of the other N − 1 chains. Essentially, each chain evolves independently to the posterior target distribution. This appears rather inefficient, particularly for complicated search problems. We therefore introduce a second and alternative swap rule that takes explicit information from 3706 J. A. Vrugt and C. J. F. Ter Braak: DREAM (D) → discrete MCMC simulation the dissimilarities in coordinates of the N chains running in parallel. We implement this idea as follows: PROPOSAL DISTRIBUTION: DIRECTED SWAP DO (CHAIN EVOLUTION) 1. Randomly select two chains, X r 1 and X r 2 without replacement from the population X −r 1 ,r 2 t .
2. Find those coordinates of x r 1 and x r 2 that are dissimilar and store these locations in ζ .

Swap the elements of
9. If accepted, move both chains to their respective candidate points, x r 1 = z r 1 and x r 2 = z r 2 ; otherwise remain at their old location, x r 1 and x r 2 .

END FOR (CHAIN EVOLUTION) END DIRECTED SWAP
This proposal distribution swaps location j and k within two different chains, r 1 ,r 2 ∈ {1,...,N} from the differences in their coordinates. This update needs to be carefully implemented so that all different chains are updated only once. This requires that the number of chains, N > 1 and hence is easiest to implement if N is an even number. This approach considerably improves sampling efficiency, as will be illustrated later with a Sudoku puzzle.
The swap move is fully Markovian, that is, it uses only information from the current time for proposal generation, and retains detailed balanced with respect to π(·) because the reverse move is equally probable. If the swap is not feasible (less than two dissimilar coordinates), the current chain is simply sampled again. This is necessary to avoid complications with unequal probabilities of move types (Denison et al., 2002); the same trick is applied in reversible jump MCMC (Green, 1995). The restriction of the update to the dissimilar coordinates does not destroy detailed balance in any way; it just selects a subspace to sample on the basis of the current state. Coordinate swapping is especially powerful for combinatorial problems. Based on some preliminary studies, we use a 90/10 % mix of directed and random swaps, respectively.

Case studies
We now present three different case studies with increasing complexity. The first study consists of a typical integer estimation problem, and involves a Sudoku puzzle. This puzzle has become quite popular in the past 10 yr, and many newspapers, journals, and magazines around the world publish Sudokus for entertainment. This synthetic study illustrates the ability of DREAM (D) to help solve a relatively difficult and high-dimensional combinatorial optimization problem. The second study revisits a classical site characterization problem in vadose zone hydrology and involves inference of the hydraulic properties of unsaturated soils from laboratory or in situ measured water retention data. This example demonstrates how DREAM (D) can be utilized to solve experimental design problems and guide measurement collection. The third and final study resolves discrete parameters in a parsimonious lumped watershed model using observed daily discharge data from the Guadalupe River in Texas. The posterior distribution derived with DREAM (D) is compared against its counterpart derived with DREAM assuming a continuous formulation of the model calibration problem.

The daily Sudoku
The first case study considers a Sudoku puzzle, a popular and widely practised integer estimation problem. We use this example to illustrate the ability of DREAM (D) to successfully find the optimum of a discrete search problem. The objective is to fill a 9 × 9 grid with numbers so that each column, each row, and each of the nine 3×3 sub-grids that make up the total square contains the values of 1 to 9. The same integer may only appear once in each column and row of the 9 × 9 playing board including in any of the nine 3×3 subregions. Each puzzle starts with a partially completed grid, and typically has a single (unique) solution. The puzzle was popularized in 1986 by the Japanese puzzle company Nikoli, under the name Sudoku, meaning single number (Hayes, 2006). Nowadays, Sudoku puzzles are very popular, and widely practised by many millions of people throughout the world.
We consider the Sudoku puzzle in Fig. 3, taken from Wikipedia (http://en.wikipedia.org/wiki/Sudoku). The initial grid is depicted at the left-hand side, whereas the final solution is presented at the right-hand side. In this particular puzzle, the solution at 29 different cells is known, leaving us with d = 81 − 29 = 52 parameter values to be estimated. These parameters can take on values between 1 to 9. Figure 4 illustrates the sampled values of DREAM (D) at various stages during the search. A total of N = 20 different chains were used to search the parameter space, and the initial sample was created using random permutation of the available numbers. This ensures that each value of 1 to 9 appears only nine times in the grid. The log-likelihood function measures the sum of the horizontal, vertical and subgrid constraint violation.
The first grid, on the left-hand side of Fig. 4 depicts the starting solution of one of the different chains. This initial state is randomly drawn from the prior distribution, and appears rather poor. For example, the subregion at the bottom right hand corner contains three numbers eight, a severe violation of the constraints. Figure 4b displays the state of the same chain after about 25 000 function evaluations. The solution has improved considerably, but still contains several noticeable deviations from the actual solution. The third panel in Fig. 4c, derived after about 50 000 Sudoku evaluations is a further refinement of the solution presented in Fig. 4b. A few important swaps have been introduced that further reduce the constraint violation. Finally, after about 100 000 samples, Fig. 4d illustrates that the puzzle has been successfully solved.
The results of this study inspire confidence in the ability of DREAM (D) to successfully locate the optimum solution of a discrete search problem. This constitutes a necessary benchmark to inspire confidence in the search capabilities and convergence properties of the algorithm. A branch and bound optimization approach is about 2 orders of magnitude more efficient than DREAM (D) in solving the Sudoku puzzle, but this method violates detailed balance, and hence cannot be used to derive the posterior target distribution. This is the main purpose of DREAM (D) and this ability will be highlighted in the remaining two case studies of this paper.
If our sole interest is in finding the optimum solution of a discrete search problem, then significant efficiency gains can be achieved with DREAM (D) by relaxing the assumption of detailed balance of the sampled Markov chains. For instance, if we modify the directed swap in DREAM (D) so that each candidate point is accepted/rejected based on its own Metropolis ratio independently of the associated move of the corresponding chain, then far fewer function evaluations would be needed to solve the Sudoku puzzle. The consequence of such non-Markovian adaptation, however is that the algorithm no longer adequately samples the underlying target distribution.

Optimal experimental design for soil hydraulic characterization
The second case study considers a common problem in vadose zone hydrology and involves characterization of the hydraulic properties of variably saturated soils. This serves to demonstrate the ability of DREAM (D) to guide experimental data collection and help determine which soil water retention measurements to collect. We assume the capillary pressuresaturation relationship of (van Genuchten, 1980): where θ (cm 3 cm −3 ) is the volumetric water content, h (cm) denotes the soil water pressure head, θ s (θ r ) (cm 3 cm −3 ) signifies the saturated (residual) water content, α (cm −1 ) and n (-) are coefficients that determine the shape of the water retention function, and m = 1 − 1/n. A synthetic data set of water retention observations was created for a sandy soil by evaluating Eq. (8) for a given set of pressure head values. The soil water pressure head was discretized into M = 501 equidistant points between h = 0 (saturation) and h = −1000 (cm), and the corresponding θ(h) curve is plotted in Fig. 5 with a solid black line. Then, the soil water pressure head observations are corrupted with a normally distributed error, h ← N (h,2) to represent the combined effect of measurement error and soil inhomogeneity. The final data set is depicted with the circles in Fig. 5. From this data set of M = 501 water retention measurements we now use DREAM (D) to find those four (h,θ (h)) observations that are most informative and best constrain the hydraulic properties, x = [θ s ,θ r ,α,n] of the sandy soil under consideration. For each selected combination of four different (h,θ (h)) measurements in DREAM (D) we estimate the corresponding values of x by nonlinear minimization using the Nelder-Mead Simplex algorithm. The hydraulic parameters obtained this way are then used to predict the water retention curve for the entire range of pressure head values considered in Fig. 5. A standard squared-deviation likelihood function is subsequently used to calculate π(x) in step (d) of the pseudo code of DREAM (D) and summarize the information content for each selected set of four (h,θ(h)) observations. We use the standard settings of DREAM (D) and run N = 10 different Markov chains in parallel to search the discrete measurement space. About 10 000 function evaluations were required to reach convergence to a limiting distribution. Figure 6 presents the results of our analysis and plots histograms of the DREAM (D) derived posterior measurement samples after sorting each combination in increasing order. The marginal posterior distributions appear clustered around distinctly different soil water pressure head values. This includes soil water pressure head measurements around (Fig. 6a) h = 0, (Fig. 6b) h = −40, (Fig. 6c) h = −220, and (Fig. 6d) h = −1000 cm respectively. Indeed, the most informative (h,θ (h)) observations are found close to saturation, around the air-entry value of the sandy soil, at the inflection point of the water retention function, and in the very dry moisture range. The location of these most informative θ (h) measurements matches very well with our previous findings (Vrugt et al., 2002) and are in agreement with the dynamic behavior of the marginal sensitivity coefficients, ∂θ/∂θ s , ∂θ/∂α, ∂θ/∂n, and ∂θ/∂θ r derived from the partial derivatives of Eq. (8). A similar pattern is observed for loamy and clayey soils (not shown herein), but the second, third and last observation depicted in Fig. 4b-d move towards lower soil water pressure head values, consistent with the increased water holding capacity of these more fine textured soils. An accurate soil hydraulic characterization thus requires water retention measurements that span the entire range of soil moisture values.
It is interesting to observe the differences in the posterior spread of the four different histograms. Whereas the first two pressure head observations in Fig. 4b for a redundancy in information content. This is an interesting finding, and explains why the parameters n and θ r are often found to be highly correlated in the fitting of water retention curves. The posterior spread derived with DREAM (D) is hence a useful byproduct to analyze measurement uniqueness, and determine the information content of experimental data.

Watershed model calibration using discrete parameter estimation
The third and final case study involves flood forecasting, and consists of the calibration of a mildly complex lumped watershed model using historical data from the Guadalupe River at Spring Branch, Texas. This is the driest of the 12 MOPEX basins described in the study of Duan et al. (2006). The model structure and hydrologic process representations are found in Schoups and Vrugt (2010). The model transforms rainfall into runoff at the watershed outlet using explicit process descriptions of interception, throughfall, evaporation, runoff generation, percolation, and surface and subsurface routing. Table 1 summarizes the seven different model parameters, and their prior uncertainty ranges. Each parameter is discretized equidistantly in 250 intervals with respective step size listed in the last column at the right hand side. This gridding is necessary to create a non-continuous, discrete, parameter estimation problem. Unlike the previous case study in which integer values are sampled only, this particular study (mostly) involves non-integer values. Daily discharge, mean areal precipitation, and mean areal potential evapotranspiration were derived from Duan et al. (2006) and used for model calibration. Details about the basin, experimental data, and likelihood function can be found there, and will not be discussed herein. The same model and data was used in a previous study (Schoups and Vrugt, 2010), and used to introduce a generalized likelihood function for heteroscedastic, non-Gaussian, and correlated (streamflow) prediction errors. Figure 7 presents histograms of the marginal distribution of a few selected hydrologic model parameters using five years of observed daily discharge data. The top panel presents the results of DREAM (D) , whereas the bottom panel presents the results for a continuous parameter space. These histograms were derived by separately running DREAM for the same data set and model. Notice the close agreement between the histograms derived with both MCMC methods. This is a testament to the ability of DREAM (D) to successfully solve discrete posterior parameter estimation problems. The influence of gridding is hardly noticeable, but becomes apparent if we use at least 25 bins to represent the marginal density (not shown herein).
To better illustrate the effect of discretization, please consider Fig. 8 that presents two-dimensional scatter plots of the DREAM (D) derived posterior samples for a few selected parameter pairs. The bottom panel shows similar plots but then assuming continuity of the parameter space. The effect of gridding is immediately apparent. Whereas the original bivariate scatter plots sample the parameter space in a (bloodstain) spatter pattern, two-dimensional plots of the posterior samples derived with DREAM (D) exhibit an obvious grid pattern with horizontally and vertically aligned points. The posterior samples take on discrete values with a distance between subsequent points that is similar to the intervals listed in Table 1. Despite this difference in sampling pattern, the shape of the DREAM and DREAM (D) derived bivariate scatter plots are very similar, commensurate with the covariance structure of the posterior distribution. The results presented in Fig. 7 inspire confidence in the ability of DREAM (D) to solve noncontinuous posterior sampling problems.
The excellent correspondence of the posterior parameter distributions derived with DREAM and DREAM (D) results in marginal differences of the resulting streamflow predictions. We therefore do not show any times series plots of model predictions, and corresponding 95 % uncertainty ranges. Such plots can be found in Schoups and Vrugt (2010)   and details can be found in that publication. It is interesting to observe that the maximum log-likelihood value of 543 found with DREAM (D) is somewhat larger than its counterpart estimated with DREAM (540). This difference was consistently observed for multiple different trials with both MCMC algorithms.
To provide better insights into the efficiency of DREAM (D) , Fig. 9a plots the evolution of theR-statistic of Gelman and Rubin (1992) for each of the different parameters. To benchmark these results, the bottom panel (Fig. 9b) illustrates the convergence results of DREAM assuming a continuous parameter space. The presented trace plots represent an average of 25 different MCMC trials. The  convergence speed for both algorithms is strikingly similar. Both MCMC methods require about 30 000 model evaluations to converge to a limiting distribution. Although gridding significantly reduces the size of the feasible parameter space, an approximately similar number of function evaluations remains necessary to explore the posterior target distribution. Our experience with models involving a much higher parameter dimensionality demonstrate considerable enhancements in efficiency (sometimes dramatically) when sampling in the discretized rather than continuous space. Discretization might therefore provide a practical solution to speed up search efficiency for insensitive parameters. Further research on this topic is warranted.

Conclusions
In the past decade much progress has been made in the development of sampling algorithms for statistical inference of the posterior parameter distribution. The typical assumption in this work is that the parameters are continuous and can take on any value within their upper and lower bounds. Unfortunately, such algorithms typically do not work for discrete parameter estimation problems. Such problems are abundant in many fields of study, and therefore of considerable theoretical and practical interest. Examples include selecting among different measurement locations in the design of optimal experimental strategies, finding the best members of an ensemble of predictors, and more generally discrete model calibration problems. Here, we have introduced a discrete MCMC simulation algorithm that is especially designed to solve non-continuous and combinatorial posterior sampling problems. This method, entitled DREAM (D) uses DREAM as its main building block, yet uses a modified proposal distribution to facilitate solve discrete sampling problems. The DREAM (D) algorithm maintains detailed balance and ergodicity, and receives good performance across a range of problems involving a Sudoku puzzle, soil water retention function and discrete rainfall -runoff model calibration problem.
The theory developed herein is easily implementable in DREAM (ZS) (Vrugt et al., 2011b) and MT-DREAM (ZS) , which provides a venue to further increase the efficiency of MCMC simulation. The DREAM (D) code is written in MATLAB and is available upon request (jasper@uci.edu).