Introduction

The conditional binary quantile (CBQ) models extend the simple version of binary quantile models for analyzing discrete choices (including but beyond binary choices) with varying choice alternatives. Each quantile estimation represents a local inspection of effects at the specified quantile of the choice probabilities. In the simple binary setting, the features of the choice alternatives within each choice set are assumed to be the same. However, in reality we oftentimes observe individuals who are facing different sets and numbers of choice alterantives. The CBQ models are developed to solve this problem by introducing a conditional multinomial structure for modeling varying choice alternatives. Even though the CBQ models are called “binary”, they actually belong to a more general family of dicrete choice models. I refer the readers to Lu (2019) for the details of the estimation.

This demo page is intended to introduce an R package named cbq associated with the CBQ models. Compared to the original version in Lu (2019), the package additionally enables estimation of mixed effects (including both random and fixed effects) in the CBQ models and allows for more flexible specifications that can be determiend by the users of the package. Since the package has been uploaded to the Comprehensive R Archive Network: link to the package in CRAN, the package is ready for use after download. To download the package, simply execute in the R console the following line of code:

install.packages("cbq")

Or alternatively, users can download a developing version of the package from my GitHub repositories by using the following codes:

# Make sure that the following packages have been installed in your local R environment
if(!require(rstan)) install.packages("rstan")

# Install cbq from github
if(!require(devtools)) install.packages("devtools")
devtools::install_github("xiao-lu-research/cbq")

You can always reach me by xiao.lu.research [at] gmail.com whenever you have questions about the method and the package. In what follows, I demonstrate step by step how we can use the package. It is important to note that to reduce the estimation time the datasets used in the following examples are only subsets of data used in the original studies and only one chain has been specified to sample the model. Therefore, the estimates do not necessarily reflect the results of the original studies and I will refrain from interpreting the results (see Lu (2019) for the full estimations). This, however, should not matter so much for our pedagogical purpose of showing how we can use the package. Now, let us get started.

Package loading and introduction of datasets

To load the package and get help of the pacakge, we can use the following codes.

# Load the package
library(cbq)

# Get help from R documents
?cbq

The datasets used in this demo comes from three published papers in political science:

[1] data_legislature: Rasmussen, A. (2010). Early Conclusion in Bicameral Bargaining: Evidence from the Co-decision Legislative Procedure of the European Union. European Union Politics 12 (1), 41-64.

[2] data_vote: Alvarez, R. M. and J. Nagler (1995). Economics, Issues and the Perot Candidacy: Voter Choice in the 1992 Presidential Election. American Journal of Political Science 39 (3), 714-744.

[3] data_coalition: Martin, L. W. and R. T. Stevenson (2001). Government Formation in Parliamentary Democracies. American Journal of Political Science 45 (1), 33-50.

Let us first load the data and check the dimensions of the datasets and the variables included.

# Legislative data
load("data_legislature.RData")
# US vote choice data
load("data_vote.RData")
# Coalition formation data
load("data_coalition.RData")

# Dimensions of the datasets
dim(data_legislature)
## [1] 385  13
dim(data_vote)
## [1] 2727   41
dim(data_coalition)
## [1] 33256    16
# Variables included. Note that in all three datasets, the first variable is the dependent variable.
names(data_legislature)
##  [1] "choice"      "var6"        "var3"        "var5"        "var10"      
##  [6] "var15"       "urgcy_yes"   "urgcy_noleg" "var4"        "var11"      
## [11] "var14"       "var7"        "var17"
names(data_vote)
##  [1] "choice"           "dist"             "respfinp.bush"   
##  [4] "natlec.bush"      "respgjob.bush"    "resphlth.bush"   
##  [7] "respblk.bush"     "respab.bush"      "east.bush"       
## [10] "south.bush"       "west.bush"        "newvoter.bush"   
## [13] "termlim.bush"     "deficit.bush"     "dem.bush"        
## [16] "rep.bush"         "women.bush"       "educ.bush"       
## [19] "age1829.bush"     "age3044.bush"     "age4559.bush"    
## [22] "respfinp.clinton" "natlec.clinton"   "respgjob.clinton"
## [25] "resphlth.clinton" "respblk.clinton"  "respab.clinton"  
## [28] "east.clinton"     "south.clinton"    "west.clinton"    
## [31] "newvoter.clinton" "termlim.clinton"  "deficit.clinton" 
## [34] "dem.clinton"      "rep.clinton"      "women.clinton"   
## [37] "educ.clinton"     "age1829.clinton"  "age3044.clinton" 
## [40] "age4559.clinton"  "chid"
names(data_coalition)
##  [1] "realg"    "minor"    "minwin"   "numpar"   "dompar"   "median"  
##  [7] "gdiv1"    "mgodiv1"  "prevpm"   "sq"       "mginvest" "anmax2"  
## [13] "pro"      "anti"     "country"  "case"
# Have a look at the first 5 rows of the datasets.
head(data_legislature,5)
##    choice var6 var3 var5 var10 var15 urgcy_yes urgcy_noleg var4 var11
## 1       0 0.44 0.44    1     0     1         0           0  130     3
## 2       0 0.44 0.44    1     0     1         0           0  130     2
## 3       0 0.44 0.44    1     0     1         0           0  130     2
## 9       0 0.40 0.40    1     0     1         0           0  130     1
## 10      0 0.19 0.00    0     0     1         0           0  130     0
##    var14 var7 var17
## 1      0    0     1
## 2      0    1     1
## 3      0    0     1
## 9      0    1     1
## 10     0    1     1
head(data_vote,5)
##           choice   dist respfinp.bush natlec.bush respgjob.bush
## 1.Bush      TRUE 0.1024             1           3             6
## 1.Clinton  FALSE 4.0804             0           0             0
## 1.Perot    FALSE 0.2601             0           0             0
## 2.Bush      TRUE 0.1024             3           5             6
## 2.Clinton  FALSE 4.0804             0           0             0
##           resphlth.bush respblk.bush respab.bush east.bush south.bush
## 1.Bush                7            6           2         0          0
## 1.Clinton             0            0           0         0          0
## 1.Perot               0            0           0         0          0
## 2.Bush                6            6           3         0          1
## 2.Clinton             0            0           0         0          0
##           west.bush newvoter.bush termlim.bush deficit.bush dem.bush
## 1.Bush            0             0            1            0        0
## 1.Clinton         0             0            0            0        0
## 1.Perot           0             0            0            0        0
## 2.Bush            0             0            1            1        0
## 2.Clinton         0             0            0            0        0
##           rep.bush women.bush educ.bush age1829.bush age3044.bush
## 1.Bush           1          1         3            0            1
## 1.Clinton        0          0         0            0            0
## 1.Perot          0          0         0            0            0
## 2.Bush           1          1         4            0            0
## 2.Clinton        0          0         0            0            0
##           age4559.bush respfinp.clinton natlec.clinton respgjob.clinton
## 1.Bush               0                0              0                0
## 1.Clinton            0                1              3                6
## 1.Perot              0                0              0                0
## 2.Bush               0                0              0                0
## 2.Clinton            0                3              5                6
##           resphlth.clinton respblk.clinton respab.clinton east.clinton
## 1.Bush                   0               0              0            0
## 1.Clinton                7               6              2            0
## 1.Perot                  0               0              0            0
## 2.Bush                   0               0              0            0
## 2.Clinton                6               6              3            0
##           south.clinton west.clinton newvoter.clinton termlim.clinton
## 1.Bush                0            0                0               0
## 1.Clinton             0            0                0               1
## 1.Perot               0            0                0               0
## 2.Bush                0            0                0               0
## 2.Clinton             1            0                0               1
##           deficit.clinton dem.clinton rep.clinton women.clinton
## 1.Bush                  0           0           0             0
## 1.Clinton               0           0           1             1
## 1.Perot                 0           0           0             0
## 2.Bush                  0           0           0             0
## 2.Clinton               1           0           1             1
##           educ.clinton age1829.clinton age3044.clinton age4559.clinton
## 1.Bush               0               0               0               0
## 1.Clinton            3               0               1               0
## 1.Perot              0               0               0               0
## 2.Bush               0               0               0               0
## 2.Clinton            4               0               0               0
##           chid
## 1.Bush       1
## 1.Clinton    1
## 1.Perot      1
## 2.Bush       2
## 2.Clinton    2
head(data_coalition,5)
##   realg minor minwin numpar dompar median    gdiv1   mgodiv1 prevpm sq
## 1     0     1      0      1      0      0 0.000000 0.2308820      0  0
## 2     0     1      0      1      0      0 0.000000 0.0485294      0  0
## 3     0     1      0      1      0      0 0.000000 0.0485294      0  0
## 4     0     1      0      2      0      0 0.822794 0.2308820      0  0
## 5     0     1      0      2      0      0 0.279412 0.4227940      0  0
##   mginvest anmax2 pro anti country case
## 1        0      0   0    0      11    3
## 2        0      0   0    0      11    3
## 3        0      0   0    0      11    3
## 4        0      0   0    0      11    3
## 5        0      0   0    0      11    3

The correspondence between the variable names and their meanings is shown below. In data_legislature (unit of analysis: legislative bills), the variables correspond to

  1. choice: Early agreement (dummy)
  2. var6: Policy distance rapporteur-EP median
  3. var3: Policy distance rapporteur-EP median * Big party group
  4. var5: Big party group
  5. var10: Country coherence of key negotiators
  6. var15: Closeness of working relationship (calendar year)
  7. urgcy_yes: Urgency of legislation: existing legislation expires;
  8. urgcy_noleg: No existing legislation
  9. var4: Workload (pending legislation during first reading)
  10. var11: Issue areas consulted (no. of consultative committees)
  11. var14: New act
  12. var7: Directive
  13. var17: Inter-institutional disagreement

In data_vote (unit of analysis: individual vote choice):

  1. choice: 1 Bush 2 Clinton 3 Perot
  2. educ: Respondents education
  3. east: Region (East)
  4. south: Region (South)
  5. west: Region (West)
  6. women: Gender (Female)
  7. respfinp: Felt personal finance were worse
  8. natlec: Felt national economy was worse
  9. bclibdis: Ideological Distance
  10. gblibdis: Ideological Distance
  11. rplibdis: Ideological Distance
  12. dem: Democrat
  13. rep: Republican
  14. respgjob: Oppose government jobs
  15. resphlth: Oppose government health care
  16. respblk: Oppose government minority assistance
  17. respab: Abortion
  18. termlim: Term limits
  19. age1829: Age: 18-29
  20. age3044: Age: 30-44
  21. age4559: Age: 45-59
  22. newvoter: New or returning voter
  23. deficit: Felt deficit was a major problem
  24. chid: Choice set indicator

In data_coalition (unit of analysis: potential coalitions):

  1. realg: Really formed government
  2. minor: Minority Coalition
  3. Minwin: Minimal Winning Coalition
  4. numpar: Number of Parties in the Coalition
  5. dompar: Largest Party in the Coalition
  6. median: Median Party in the Coalition
  7. gdiv1: Ideological Divisions in the Coalition
  8. mgodiv1: Ideological Divisions within Majority Opposition
  9. prevpm: Previous Prime Minister in the Coalition
  10. sq: Incumbent Coalition
  11. mginvest: Minority Coalition where Investiture Vote Required
  12. anmax2: Anti-System Presence in the Coalition
  13. pro: Pre-Electoral Pact associated with the Coalition
  14. anti: Anti-Pact associated with the Coalition
  15. case: choice set indicator

Note that all the datasets are in a long-format with the binary-coded (0 or 1) dependent variable and an additional indicator variable showing the choice sets each row of observation belongs to. Each row in the data represents a choice alternative.

Estimation of the models

Binary choice: data_legislature

In a simple binary choice situation, cbq only requres specifying the discrete choice variable on the left and a bunch of independent variables on the right, seperated by ~. We can also specify a set of controling parameters in the function as displayed below.

# Estimate the model
model1 <- cbq(choice ~ var6 + var3 + var5 + var10 + var15 + urgcy_yes + urgcy_noleg + var4 + var11 +var14 + var7 + var17, 
             data = data_legislature, 
             q = 0.5, # We first specify a 0.5 quantile (median)
             vi = FALSE, # Indicating whether to use variational inference to approximate the results. The default is FALSE, meaning using MCMC sampling procedures.
             nsim = 1000, # Running the model for 1000 iterations
             grad_samples = 1, # Passed to vb (positive integer), the number of samples for Monte Carlo estimate of gradients, defaulting to 1.
             elbo_samples = 100, # Passed to vb (positive integer), the number of samples for Monte Carlo estimate of ELBO (objective function), defaulting to 100. (ELBO stands for "the evidence lower bound".)
             tol_rel_obj = 0.01, # Passed to vb (positive double), the convergence tolerance on the relative norm of the objective, defaulting to 0.01.
             output_samples = 2000, # Passed to vb (positive integer), number of posterior samples to draw and save, defaults to 2000.
             burnin = NULL, # The number of burnin iterations. If unspecified, hald of the iterations will be burnin iterations.
             thin = 1, # Thinning parameter.
             CIsize = 0.95, # The size of confidence interval.
             nchain = 1, # The number of parallel chains.
             seeds = 12345, # Random seeds to replicate the results.
             inverse_distr = FALSE, # If FALSE, the ALD will not be reversed. The default is FALSE.
             offset = 1e-20, # Offset values to enhance sampling stability. The default value is 1e-20.
             mc_core = TRUE # Indicating whether the estimation will be run in multiple parallel chains. The default is TRUE.
             )
## 
## SAMPLING FOR MODEL 'cbqbv' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000246 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.46 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 19.8016 seconds (Warm-up)
## Chain 1:                5.7568 seconds (Sampling)
## Chain 1:                25.5584 seconds (Total)
## Chain 1:

We can check the estimated model by using the following three functions, where the first generates the summary table, the second presents the coefficient matrix and the third checks model convergence.

# Print the results
print(model1)
## Conditional binary quantile regression 
## 
## Call:
## cbq(formula = choice ~ var6 + var3 + var5 + var10 + var15 + urgcy_yes + 
##     urgcy_noleg + var4 + var11 + var14 + var7 + var17, data = data_legislature, 
##     q = 0.5, vi = FALSE, nsim = 1000, grad_samples = 1, elbo_samples = 100, 
##     tol_rel_obj = 0.01, output_samples = 2000, burnin = NULL, 
##     thin = 1, CIsize = 0.95, nchain = 1, seeds = 12345, inverse_distr = FALSE, 
##     offset = 1e-20, mc_core = TRUE)
## 
## MCMC run for 1000 iterations, with 500 used. 
## 
## Coefficients:
##             Estimate      LB     UB
## (Intercept)   -4.819  -7.789 -1.688
## var6          -0.107  -1.981  1.512
## var3          -5.301 -10.841 -1.132
## var5           3.076   0.964  5.789
## var10          1.323   0.151  2.560
## var15          0.821   0.545  1.151
## urgcy_yes      2.752   1.152  4.484
## urgcy_noleg    1.048  -0.105  2.353
## var4           0.015  -0.001  0.032
## var11          0.415   0.140  0.685
## var14         -1.803  -2.997 -0.721
## var7          -0.619  -1.376  0.191
## var17         -2.876  -4.089 -1.800
print(model1, "mcmc") # Print MCMC
## Inference for Stan model: cbqbv.
## 1 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=500.
## 
##             mean se_mean   sd    2.5%     25%     50%     75%   97.5%
## beta[1]    -4.82    0.10 1.55   -7.79   -5.77   -4.87   -3.84   -1.69
## beta[2]    -0.11    0.06 0.91   -1.98   -0.66   -0.08    0.51    1.51
## beta[3]    -5.30    0.18 2.45  -10.84   -6.62   -5.29   -3.49   -1.13
## beta[4]     3.08    0.08 1.19    0.96    2.21    3.02    3.80    5.79
## beta[5]     1.32    0.02 0.65    0.15    0.83    1.32    1.76    2.56
## beta[6]     0.82    0.01 0.15    0.54    0.72    0.81    0.93    1.15
## beta[7]     2.75    0.04 0.90    1.15    2.10    2.70    3.39    4.48
## beta[8]     1.05    0.03 0.62   -0.11    0.60    1.07    1.44    2.35
## beta[9]     0.02    0.00 0.01    0.00    0.01    0.02    0.02    0.03
## beta[10]    0.41    0.01 0.14    0.14    0.32    0.41    0.52    0.68
## beta[11]   -1.80    0.03 0.60   -3.00   -2.21   -1.79   -1.37   -0.72
## beta[12]   -0.62    0.02 0.43   -1.38   -0.93   -0.62   -0.33    0.19
## beta[13]   -2.88    0.02 0.61   -4.09   -3.26   -2.84   -2.45   -1.80
## lp__     -195.67    0.19 2.54 -201.24 -197.29 -195.45 -193.87 -191.54
##          n_eff Rhat
## beta[1]    254 1.00
## beta[2]    269 1.01
## beta[3]    177 1.02
## beta[4]    205 1.01
## beta[5]    930 1.00
## beta[6]    399 1.00
## beta[7]    556 1.00
## beta[8]    466 1.00
## beta[9]    602 1.00
## beta[10]   748 1.00
## beta[11]   450 1.00
## beta[12]   737 1.00
## beta[13]   752 1.00
## lp__       182 1.00
## 
## Samples were drawn using NUTS(diag_e) at Thu Feb 27 12:01:40 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
# Examine the coefficient estimates
coef(model1)
##                Estimate            LB          UB
## (Intercept) -4.81901733  -7.789123864 -1.68751112
## var6        -0.10674397  -1.981043295  1.51233659
## var3        -5.30076900 -10.840607199 -1.13156396
## var5         3.07629825   0.963926360  5.78875404
## var10        1.32320303   0.150698098  2.55984875
## var15        0.82061505   0.544685268  1.15065942
## urgcy_yes    2.75164526   1.151922105  4.48366812
## urgcy_noleg  1.04807005  -0.105176133  2.35284643
## var4         0.01518161  -0.001493851  0.03206085
## var11        0.41465409   0.139894025  0.68455239
## var14       -1.80290166  -2.996690275 -0.72144895
## var7        -0.61902091  -1.375919925  0.19057199
## var17       -2.87639000  -4.088896423 -1.79992731
# Check convergence (need to increase the number of chains for a better inspection)
plot(model1)
## 'pars' not specified. Showing first 10 parameters by default.