How to fix Moretti (2021)
Advice for economists
In my comment on Moretti (2021) (M21), I show that correcting errors in the event study and IV regressions gives null results. So should we believe in agglomeration effects for innovation? Here I discuss two ways to repair the paper.
First, the problem with the mover event study is data issues. M21 uses inventor-level data based on the COMETS dataset, but it doesn’t distinguish between different inventors who share the same name. Instead, the M21 code simply assigns an identifier based on inventor names. As I show in my comment, we can find examples where two people with the same name are living in different cities, but this is coded as one inventor moving across cities. Since no move actually occurs, this creates attenuation bias, which could explain the null result.
For example, we see John P. Hansen patenting in Austin, Texas from 1993-2002 and in Wadsworth, Ohio from 1997-2003. Austin-Hansen works in computer science and assigns patents to Motorola, while Wadsworth-Hansen assigns patents to J.M. Smucker, the food and beverage company. M21 assigns these observations the same inventor identifier, and transforms the patent data into an inventor-year panel by assigning the inventor’s modal city, which is Austin from 1993-2002 and Wadsworth in 2003. This is then coded as John Hansen moving from Austin to Wadsworth in 2003.
But clearly these are just different people, and this case shouldn’t be used in the mover event study. So we can improve the event study by going back to the raw data and creating a proper unique identifier for inventors.
Another data issue is that the COMETS dataset does not have unique city identifiers, which also leads to misclassified moves. For example, Abbas Rafii is observed in Los Altos in 1997-1998, but is coded as moving in 1998. This occurs because the city identifier changes from 097 to 146, despite the recorded city name being Los Altos in every year. I also noticed that Fremont, California is associated with numeric codes 097, 140, and 146 with frequencies 23%, 34%, and 43%. Clearly something is wrong with the COMETS data.
So the null result in the event study could be explained by attenuation bias from misclassified moves, and could be repaired by improving the data.
Second, the null result in the IV regression could be due to the choice of instrument. Recall that the endogenous variable is cluster size, which is defined at the research field by city level. M21 instruments for cluster size using an IV based on variation in the number of inventors in firms in other cities, for firms that also have a presence in the focal city. One issue here is that there is a level mismatch: the instrument is constructed at the firm level, while cluster size is at the cluster level. So inventors at different firms in the same cluster have different instruments for cluster size, which is odd.
M21’s instrument is similar in spirit to a shift-share/Bartik IV, so why not do that? In the classic Bartik setup, we have cities and industries, and use national growth in an industry (the shift) with the industry share of city employment (the share) to construct a proxy for employment growth. Specifically, the instrument for a city’s employment growth is the weighted average of national industry growth, weighting by local shares.
In our context, we have clusters (i.e., city-field) and firms. So we can use the national growth in a firm as the shift (in a leave-one-out way to omit the focal city) with the firm’s share of inventors in the cluster (the share). So the instrument for cluster size is the weighted average of national firm growth in inventors, weighting by the firm’s share of the local cluster.
So it seems like we can apply the ‘exogenous shifts’ version of shift-share IV, and maybe get a non-null result.
