Title: | Ensures Mutually Consistent Beliefs When Using IVs |
---|---|
Description: | Uses data and researcher's beliefs on measurement error and instrumental variable (IV) endogeneity to generate the space of consistent beliefs across measurement error, instrument endogeneity, and instrumental relevance for IV regressions. Package based on DiTraglia and Garcia-Jimeno (2020) <doi:10.1080/07350015.2020.1753528>. |
Authors: | Frank DiTraglia [aut], Mallick Hossain [aut, cre] |
Maintainer: | Mallick Hossain <[email protected]> |
License: | CC0 |
Version: | 1.0.1 |
Built: | 2024-11-02 05:14:33 UTC |
Source: | https://github.com/emallickhossain/ivdoctr |
Replicates IV using controls from Table 2
afghan
afghan
A data frame with 687 rows and 17 variables:
Indicator if child is enrolled in formal school. Outcome.
Normalized test score
Indicator if village is treated. Instrument.
Indicator if child is child of head of household
Number of household members
Female indicator
Child's age
Time family has lived in village
Indicator for speaking Farsi
Indicator for speaking Tajik
Indicator for if head of household is a farmer
Number of jeribs of land owned
Head of household age
Years of education for head of household
Number of sheep and goats owned
Indicator if village is in Chagcharan district
Distance to nearest non-community based school
Provided by author.
https://www.jstor.org/stable/3083335
B function from Proposition A3
b_functionA3(obs_draws, g, psi)
b_functionA3(obs_draws, g, psi)
obs_draws |
Row of the data.frame of observable draws |
g |
Value from g function |
psi |
Psi value |
A min and a max of the B function
Evaluates the corners given user bounds. Vectorized wrt multiple draws of obs.
candidate1(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
candidate1(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
r_TstarU_lower |
Vector of lower bounds of endogeneity |
r_TstarU_upper |
Vector of upper bounds of endogeneity |
k_lower |
Vector of lower bounds on measurement error |
k_upper |
Vector of upper bounds on measurement error |
obs |
Observables generated by get_observables |
List containing vector of lower bounds and vector of upper bounds of r_uz
Evaluates the edge where k is on the boundary. Vectorized wrt multiple draws of obs.
candidate2(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
candidate2(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
r_TstarU_lower |
Vector of lower bounds of endogeneity |
r_TstarU_upper |
Vector of upper bounds of endogeneity |
k_lower |
Vector of lower bounds on measurement error |
k_upper |
Vector of upper bounds on measurement error |
obs |
Observables generated by get_observables |
List containing vector of lower bounds and vector of upper bounds of r_uz
Evaluates the edge where r_TstarU is on the boundary.
candidate3(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
candidate3(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
r_TstarU_lower |
Vector of lower bounds of endogeneity |
r_TstarU_upper |
Vector of upper bounds of endogeneity |
k_lower |
Vector of lower bounds on measurement error |
k_upper |
Vector of upper bounds on measurement error |
obs |
Observables generated by get_observables |
List containing vector of lower bounds and vector of upper bounds of r_uz
Collapse 3-d array to matrix
collapse_3d_array(myarray)
collapse_3d_array(myarray)
myarray |
A three-dimensional array. |
Matrix with the 3rd dimension appended as rows to the matrix
Cross-country dataset used to construct Table 4 of Acemoglu, Johnson & Robinson (2001).
colonial
colonial
A data frame with 64 rows and 9 variables:
three letter country abbreviation, e.g. AUS for Australia
dummy variable =1 if country is in Africa
absolute distance to equator (scaled between 0 and 1)
dummy variable, =1 for "Neo-Europes" (AUS, CAN, NZL, USA)
Average protection against expropriation risk. Measures risk of government appropriation of foreign private investment on a scale from 0 (least risk) to 10 (most risk). Averaged over all years from 1985-1995.
Natural logarithm of per capita GDP in 1995 at purchasing power parity
Natural logarithm of European settler mortality
dummy variable, =1 if country is in Asia
Natural logarithm of output per worker in 1988
http://economics.mit.edu/faculty/acemoglu/data/ajr2001
https://www.aeaweb.org/articles.php?doi=10.1257/aer.91.5.1369
This function takes data and user restrictions on measurement error and endogeneity and simulates data and the resulting bounds on instrument validity.
draw_bounds( y_name, T_name, z_name, data, controls = NULL, r_TstarU_restriction = NULL, k_restriction = NULL, n_draws = 5000 )
draw_bounds( y_name, T_name, z_name, data, controls = NULL, r_TstarU_restriction = NULL, k_restriction = NULL, n_draws = 5000 )
y_name |
Character vector of the name of the dependent variable |
T_name |
Character vector of the names of the preferred regressors |
z_name |
Character vector of the names of the instrumental variables |
data |
Data to be analyzed |
controls |
Character vector containing the names of the exogenous regressors |
r_TstarU_restriction |
2 element vector of bounds on r_TstarU |
k_restriction |
2-element vector of bounds on kappa |
n_draws |
Integer number of simulations to draw |
List containing simulated data observables (covariances, correlations, and R-squares), indications of whether the identified set is empty, the unrestricted and restricted bounds on instrumental relevance, instrumental validity, and measurement error.
This function takes the data and simulates potential draws of data from the properties of the observed data.
draw_observables(y_name, T_name, z_name, data, controls = NULL, n_draws = 5000)
draw_observables(y_name, T_name, z_name, data, controls = NULL, n_draws = 5000)
y_name |
Character vector of the name of the dependent variable |
T_name |
Character vector of the names of the preferred regressors |
z_name |
Character vector of the names of the instrumental variables |
data |
Data to be analyzed |
controls |
Character vector containing the names of the exogenous regressors |
n_draws |
Integer number of simulations to draw |
Data frame containing covariances, correlations, and R-squares for each data simulation
Draws covariance matrix using the Jeffrey's Prior
draw_sigma_jeffreys(y, Tobs, z, k, n_draws)
draw_sigma_jeffreys(y, Tobs, z, k, n_draws)
y |
Vector of dependent variable |
Tobs |
Matrix containing data for the preferred regressor |
z |
Matrix containing data for the instrumental variable |
k |
Number of covariates, including the intercept |
n_draws |
Integer number of draws to perform |
Array of covariance matrix draws
Creates LaTeX code for parameter estimates
format_est(est)
format_est(est)
est |
Number |
LaTeX string for the number
Creates LaTeX code for the HPDI
format_HPDI(bounds)
format_HPDI(bounds)
bounds |
2-element vector of the upper and lower HPDI bounds |
LaTeX string of the HPDI
Creates LaTeX code for the standard error
format_se(se)
format_se(se)
se |
Standard error |
LaTeX string for the standard error
G function from Proposition A.2
g_functionA2(kappa, r_TstarU, obs_draws)
g_functionA2(kappa, r_TstarU, obs_draws)
kappa |
Kappa value |
r_TstarU |
r_TstarU value |
obs_draws |
a row of the data.frame of observable draws |
G value
Computes a0 and a1 bounds
get_alpha_bounds(draws, p)
get_alpha_bounds(draws, p)
draws |
data.frame of observables of simulated data |
p |
Treatment probability from binary data |
List of alpha bounds
This function solves for beta given r_TstarU and kappa. It handles 3 potential cases when beta must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k
get_beta(r_TstarU, k, obs)
get_beta(r_TstarU, k, obs)
r_TstarU |
Vector of r_TstarU values |
k |
Vector of kappa values |
obs |
Observables generated by get_observables |
Vector of betas
Returns beta bounds in binary case using grid search
get_beta_bounds_binary(obs_draws, p, r_TstarU_restriction)
get_beta_bounds_binary(obs_draws, p, r_TstarU_restriction)
obs_draws |
Row of the data.frame of observable draws |
p |
Treatment probability from data |
r_TstarU_restriction |
2-element vector of restrictions on r_TstarU |
Min and max values for beta
Generates beta bounds off of beta draws
get_beta_bounds_binary_post(draws, n_observables)
get_beta_bounds_binary_post(draws, n_observables)
draws |
Posterior draws |
n_observables |
Number of observable draws |
Upper and lower bounds of beta based on posterior draws
Wrapper function combines all unrestricted bounds together. Vectorized
get_bounds_unrest(obs)
get_bounds_unrest(obs)
obs |
Observables generated by get_observables |
List of unrestricted bounds for r_TstarU, r_uz, and kappa
Computes OLS and IV estimates
get_estimates(y_name, T_name, z_name, data, controls = NULL, robust = FALSE)
get_estimates(y_name, T_name, z_name, data, controls = NULL, robust = FALSE)
y_name |
Character vector of the name of the dependent variable |
T_name |
Character vector of the names of the preferred regressors |
z_name |
Character vector of the names of the instrumental variables |
data |
Data to be analyzed |
controls |
Character vector containing the names of the exogenous regressors |
robust |
Boolean of whether to compute heteroskedasticity-robust standard errors |
List of beta estimates and associated standard errors for OLS and IV estimation
Given observables from the data, generates unrestricted bounds for kappa. Vectorized
get_k_bounds_unrest(obs, tilde)
get_k_bounds_unrest(obs, tilde)
obs |
Observables generated by get_observables |
tilde |
Boolean of whether or not kappa_tilde or kappa is desired |
List of upper bounds and lower bounds for kappa
Computes L, lower bound for kappa_tilde in paper
get_L(draws)
get_L(draws)
draws |
data.frame of observables of simulated data |
Vector of L values
This function solves for the magnification factor given r_TstarU and kappa. It handles 3 potential cases when the magnification factor must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k
get_M(r_TstarU, k, obs)
get_M(r_TstarU, k, obs)
r_TstarU |
Vector of r_TstarU values |
k |
Vector of kappa values |
obs |
Observables generated by get_observables |
Vector of magnification factors
Computes beliefs that support valid instrument
get_new_draws(obs_draws, post_draws)
get_new_draws(obs_draws, post_draws)
obs_draws |
data.frame of draws of reduced form parameters |
post_draws |
data.frame of posterior draws |
data.frame of new draws
Given data and function specification, returns the relevant correlations and covariances with any exogenous controls projected out.
get_observables(y_name, T_name, z_name, data, controls = NULL)
get_observables(y_name, T_name, z_name, data, controls = NULL)
y_name |
Name of the dependent variable |
T_name |
Name(s) of the preferred regressor(s) |
z_name |
Name(s) of the instrumental variable(s) |
data |
Data to be analyzed |
controls |
Exogenous regressors to be included |
List of correlations, covariances, and R^2 of first and second stage regressions after projecting out any exogenous control regressors
Compute the share of draws that could contain a valid instrument.
get_p_valid(draws)
get_p_valid(draws)
draws |
List of simulated draws |
Numeric of the share of valid draws as determined by having the the restricted bounds for r_uz contain zero.
Computes the lower bound of psi for binary data
get_psi_lower(s2_T, p, kappa)
get_psi_lower(s2_T, p, kappa)
s2_T |
Vector of s2_T draws from observables |
p |
Treatment probability from binary data |
kappa |
Vector of kappa, NOTE: kappa_tilde in the paper |
Vector of lower bounds for psi
Computes the upper bound of psi for binary data
get_psi_upper(s2_T, p, kappa)
get_psi_upper(s2_T, p, kappa)
s2_T |
Vector of s2_T draws from observables |
p |
Treatment probability from binary data |
kappa |
Vector of kappa, NOTE: kappa_tilde in the paper |
Vector of upper bounds for psi
Given observables from the data, generates the unrestricted bounds for rho_TstarU. Data does not impose any restrictions on r_TstarU Vectorized
get_r_TstarU_bounds_unrest(obs)
get_r_TstarU_bounds_unrest(obs)
obs |
Observables generated by get_observables |
List of upper and lower bounds for r_TstarU
This function solves for r_uz given r_TstarU and kappa. It handles 3 potential cases when r_uz must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k
get_r_uz(r_TstarU, k, obs)
get_r_uz(r_TstarU, k, obs)
r_TstarU |
Vector of r_TstarU values |
k |
Vector of kappa values |
obs |
Observables generated by get_observables |
Vector of r_uz values.
This function takes observables from the data and user beliefs over the extent of measurement error (kappa) and the direction of endogeneity (r_TstarU) to generate the implied bounds on instrument validity (r_uz)
get_r_uz_bounds(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
get_r_uz_bounds(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)
r_TstarU_lower |
Vector of lower bounds of endogeneity |
r_TstarU_upper |
Vector of upper bounds of endogeneity |
k_lower |
Vector of lower bounds on measurement error |
k_upper |
Vector of upper bounds on measurement error |
obs |
Observables generated by get_observables |
2-column data frame of lower and upper bounds of r_uz
Given observables from the data, generates the unrestricted bounds for rho_uz. Vectorized
get_r_uz_bounds_unrest(obs)
get_r_uz_bounds_unrest(obs)
obs |
Observables generated by get_observables |
List of upper and lower bounds for rho_uz
This function solves for the variance of u given r_TstarU and kappa. It handles 3 potential cases when the variance of u must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k
get_s_u(r_TstarU, k, obs)
get_s_u(r_TstarU, k, obs)
r_TstarU |
Vector of r_TstarU values |
k |
Vector of kappa values |
obs |
Observables generated by get_observables |
Vector of variances of u
Computes coverage of list of intervals
getCoverage(data, guess)
getCoverage(data, guess)
data |
2-column data frame of confidence intervals |
guess |
2-element vector of confidence interval |
Coverage percentage
Generates smallest covering interval
getInterval(data, center, conf = 0.9, tol = 1e-06)
getInterval(data, center, conf = 0.9, tol = 1e-06)
data |
2-column data frame of confidence intervals |
center |
2-element vector to center coverage interval |
conf |
Confidence level |
tol |
Tolerance level for convergence |
2-element vector of confidence interval
Generates parameter estimates given user restrictions and data
ivdoctr( y_name, T_name, z_name, data, example_name, controls = NULL, robust = FALSE, r_TstarU_restriction = c(-1, 1), k_restriction = c(1e-04, 1), n_draws = 5000, n_RF_draws = 1000, n_IS_draws = 1000, resample = FALSE )
ivdoctr( y_name, T_name, z_name, data, example_name, controls = NULL, robust = FALSE, r_TstarU_restriction = c(-1, 1), k_restriction = c(1e-04, 1), n_draws = 5000, n_RF_draws = 1000, n_IS_draws = 1000, resample = FALSE )
y_name |
Character string with the column name of the dependent variable |
T_name |
Character string with the column name of the endogenous regressor(s) |
z_name |
Character string with the column name of the instrument(s) |
data |
Data frame |
example_name |
Character string naming estimation |
controls |
Vector of character strings specifying the exogenous variables |
robust |
Indicator for heteroskedasticity-robust standard errors |
r_TstarU_restriction |
2-element vector of min and max of r_TstarU. |
k_restriction |
2-element vector of min and max of kappa. |
n_draws |
Number of draws when generating frequentist-friendly draws of the covariance matrix |
n_RF_draws |
Number of reduced-form draws |
n_IS_draws |
Number of draws on the identified set |
resample |
Indicator of whether or not to resample using magnification factor |
List with elements:
ols: lm object of OLS estimation,
iv: ivreg object of the IV estimation
n: Number of observations
b_OLS: OLS point estimate
se_OLS: OLS standard errors
b_IV: IV point estimate
se_IV: IV standard errors
k_lower: lower bound of kappa
p_empty: fraction of parameter draws that yield an empty identified set
p_valid: fraction of parameter draws compatible with a valid instrument
r_uz_full_interval: 90% posterior credible interval for fully identified set of rho
beta_full_interval: 90% posterior credible interval for fully identified set of beta
r_uz_median: posterior median for partially identified rho
r_uz_partial_interval: 90% posterior credible interval for partially identified set of rho under a conditionally uniform reference prior
beta_median: posterior median for partially identified beta
beta_partial_interval: 90% posterior credible interval for partially identified set of beta under a conditionally uniform reference prior
a0: If treatment is binary, mis-classification probability of no-treatment case. NULL otherwise
a1: If treatment is binary, mis-classification probability of treatment case. NULL otherwise
psi_lower: lower bound for psi
binary: logical indicating if treatment is binary
k_restriction: User-specified bounds on kappa
r_TstarU_restriction: User-specified bounds on r_TstarU
library(ivdoctr) endog <- c(0, 0.9) meas <- c(0.6, 1) colonial_example1 <- ivdoctr(y_name = "logpgp95", T_name = "avexpr", z_name = "logem4", data = colonial, controls = NULL, robust = FALSE, r_TstarU_restriction = endog, k_restriction = meas, example_name = "Colonial Origins")
library(ivdoctr) endog <- c(0, 0.9) meas <- c(0.6, 1) colonial_example1 <- ivdoctr(y_name = "logpgp95", T_name = "avexpr", z_name = "logem4", data = colonial, controls = NULL, robust = FALSE, r_TstarU_restriction = endog, k_restriction = meas, example_name = "Colonial Origins")
Takes the OLS and IV estimates and converts it to a row of the LaTeX table
make_full_row(stats, example_name)
make_full_row(stats, example_name)
stats |
List with OLS and IV estimates and the bounds on kappa and r_uz |
example_name |
Character string detailing the example |
LaTeX code passed to makeTable()
Makes LaTeX code to make a row of a table and shift by some amount of columns if necessary
make_tex_row(char_vec, shift = 0)
make_tex_row(char_vec, shift = 0)
char_vec |
Vector of characters to be collapsed into a LaTeX table |
shift |
Number of columns to shift over |
LaTeX string of the whole row of the table
Generates table of parameter estimates given user restrictions and data
makeTable(..., output)
makeTable(..., output)
... |
Arguments of TeX code for individual examples to be combined into a single table |
output |
File name to write |
LaTeX code that generates output table with regression results
library(ivdoctr) endog <- c(0, 0.9) meas <- c(0.6, 1) colonial_example1 <- ivdoctr(y_name = "logpgp95", T_name = "avexpr", z_name = "logem4", data = colonial, controls = NULL, robust = FALSE, r_TstarU_restriction = endog, k_restriction = meas, example_name = "Colonial Origins") makeTable(colonial_example1, output = file.path(tempdir(), "colonial.tex"))
library(ivdoctr) endog <- c(0, 0.9) meas <- c(0.6, 1) colonial_example1 <- ivdoctr(y_name = "logpgp95", T_name = "avexpr", z_name = "logem4", data = colonial, controls = NULL, robust = FALSE, r_TstarU_restriction = endog, k_restriction = meas, example_name = "Colonial Origins") makeTable(colonial_example1, output = file.path(tempdir(), "colonial.tex"))
Generates a custom color palette given a vector of numbers
map2color(x, pal, limits = NULL)
map2color(x, pal, limits = NULL)
x |
Vector of numbers |
pal |
Palette function generate from colorRampPalette |
limits |
Limits on the numeric sequence |
Hex values for colors
Rounds x to two decimal places
myformat(x)
myformat(x)
x |
Number to be rounded |
Number rounded to 2 decimal places
Plot ivdoctr Restrictions
plot_3d_beta( y_name, T_name, z_name, data, controls = NULL, r_TstarU_restriction = c(-1, 1), k_restriction = c(0, 1), n_grid = 30, n_colors = 500, fence = NULL, gray_k = NULL, gray_rTstarU = NULL, theta = 0, phi = 15 )
plot_3d_beta( y_name, T_name, z_name, data, controls = NULL, r_TstarU_restriction = c(-1, 1), k_restriction = c(0, 1), n_grid = 30, n_colors = 500, fence = NULL, gray_k = NULL, gray_rTstarU = NULL, theta = 0, phi = 15 )
y_name |
Character string with the column name of the dependent variable |
T_name |
Character string with the column name of the endogenous regressor(s) |
z_name |
Character string with the column name of the instrument(s) |
data |
Data frame |
controls |
Vector of character strings specifying the exogenous variables |
r_TstarU_restriction |
2-element vector of bounds for r_TstarU |
k_restriction |
2-element vector of bounds for kappa |
n_grid |
Number of points to put in grid |
n_colors |
Number of colors to use |
fence |
Vector of left, bottom, right, and top corners of rectangle |
gray_k |
2-element vector of kappa restrictions to recolor graph as gray |
gray_rTstarU |
2-element vector of rTstarU restrictions to recolor graph as gray |
theta |
Graphing parameters for orienting plot |
phi |
Graphing parameters for orienting plot |
Interactive 3d plot which can be oriented and saved using rgl.snapshot()
library(ivdoctr) endog <- matrix(c(0, 0.9), nrow = 1) meas <- matrix(c(0.6, 1), nrow = 1) plot_3d_beta(y_name = "logpgp95", T_name = "avexpr", z_name = "logem4", data = colonial, r_TstarU_restriction = endog, k_restriction = meas)
library(ivdoctr) endog <- matrix(c(0, 0.9), nrow = 1) meas <- matrix(c(0.6, 1), nrow = 1) plot_3d_beta(y_name = "logpgp95", T_name = "avexpr", z_name = "logem4", data = colonial, r_TstarU_restriction = endog, k_restriction = meas)
Construct vectors of points that outline a rectangle.
rect_points(xleft, ybottom, xright, ytop, step_x, step_y)
rect_points(xleft, ybottom, xright, ytop, step_x, step_y)
xleft |
The left side of the rectangle |
ybottom |
The bottom of the rectangle |
xright |
The right side of the rectangle |
ytop |
The top of the rectangle |
step_x |
The step size of the x coordinates |
step_y |
The step size of the y coordinates |
List of x-coordinates and y-coordinates tracing the points around the rectangle
Simulate draws from the inverse Wishart distribution
rinvwish(n, v, S)
rinvwish(n, v, S)
n |
An integer, the number of draws. |
v |
An integer, the degrees of freedom of the distribution. |
S |
A numeric matrix, the scale matrix of the distribution. |
Employs the Bartlett Decomposition (Smith & Hocking 1972). Output exactly matches that of riwish from the MCMCpack package if the same random seed is used.
A numeric array of matrices, each of which is one simulation draw.
Convert 3-d array to list of matrixes
toList(myArray)
toList(myArray)
myArray |
A three-dimensional numeric array. |
A list of numeric matrices.
Data on Prussian counties in 1871 from Becker and Woessmann's (2009) paper "Was Weber Wrong? A Human Capital Theory of Protestant Economic History."
weber
weber
A data frame with 452 rows and 44 variables:
kreiskey1871
County name in 1871
District key
Latitude (in rad)
Longitude (in rad)
Distance to Wittenberg (in km)
Year in which county was annexed by Prussia
Average household size
Population growth from 1867-1871 in percentage points
Percent Protestants
Percent Jews
Percent literate
Percent missing education information
Percent below the age of 10
Percent female
Percent born in municipality
Percent of Prussian origin
Percent blind
Percent deaf-mute
Percent insane
Percent of county population in urban areas
Natural logarithm of total population size
Natural logarithm of distance to Berlin (km)
Dummy variable, =1 if county is Polish-speaking
Latitude * Longitude * 100
Percent of pupils farther than 3km from school
Percent of labor force employed in mining
Income tax revenue per capita in 1877
Percentage of labor force employed in manufacturing in 1882
Percentage of labor force employed in services in 1882
Percentage of labor force employed in manufacturing and services in 1882
100 * Natural logarithm of male elementary school teachers in 1886
Dummy variable, =1 if Imperial of Hanseatic city in 1517
Income of male elementary school teachers in 1886
Total population size
Distance to Berlin (km)
Dummy variable, =1 if University in 1517
Dummy variable, =1 if Imperial city in 1517
Dummy variable, =1 if Hanseatic city in 1517
Percentage of Catholics
Share of municipalities beginning with letter A to L
Monasteries per square kilometer in 1517
Dummy variable, =1 if school in 1517
City population in 1500
https://www.ifo.de/en/iPEHD doi:10.1162/qjec.2009.124.2.531