Welcome Guestlogin to KGsePGregister at KGsePG email | FAQs

Kim 2003 Reshaping Panel Data Using Excel Stata

download

    1 of 4

    Kim 2003 Reshaping Panel Data Using Excel Stata



    Kim 2003 Reshaping Panel Data Using Excel Stata - Transcript


    Reshaping Panel Data Using Excel and Stata
    Moonhawk Kim Department of Political Science Stanford University June 27 2003

    Figure 1 Downloaded Panel Data

    Figure 2 Reorganized Panel Data

    Many of us frequently nd ourselves in situations of downloading panel data from having to reshape data from Figure 1 to Figure 2 That is many external databases e g World 1

    Bank s World Development Indicators download panel data in a format in which units and data series go down the rows and time periods go across columns This is not a helpful format for either data analysis or for importing into your own database table Accordingly you need to convert the format from Figure 1 to Figure 2 Before I learned this trick I used to copy and concatenate relevant columns which took me hours This memo will walk you through an example of converting the format from wide to long and then back to wide Each worksheet in the accompanying Excel le SampleData xls matches up with each of the steps below We are starting with the worksheet Initial Download This is the same as Figure 1 1 Stata requires the variables over which we perform the reshape command to be numbers rather than string Thus we start by assigning each unit here countries a unique ID number a Create column for IDs unit id Insert 1 to the rst observation of Sweden Then type and ll down the following formula to assign each unit the same ID number if b3 b2 a2 a2 1 This translates into set the ID to the same as the ID in the observation above if the country name is the same as the country name in the observation above If not increase the new ID number by one b Insert a new column Copy the column with values Click on the newly created column Go to menu Edit and to Paste Special Click on the Values option under Paste and click OK This forces the ID numbers to become as if you had manually entered the numbers and gets rid of the formulas c Delete the original unit id column with formula based cells 2 Now we need to do the same for each data series a Sort by data series ind1 desc b Create column for IDs series id Insert 1 to the rst observation of Commercial service imports Then type and ll down the following formula to assign each unit the same ID number if b3 b2 a2 a2 1 c Insert a new column Copy the column with values Click on the newly created column Go to menu Edit and to Paste Special Click on the Values option under Paste and click OK This forces the ID numbers to become as if you had manually entered the numbers and gets rid of the formulas d Delete the original series id column with formula based cells 3 Stata does not accept numbers as variable names So we need to change the years into something else The string we add will also have another function later on a Select all the years across the rst row from 1995 to 2001 Find and replace 19 with data19 2

    b Obviously this doesn t work for years 2000 and 2001 Change these manually to data2000 and data2001 4 Lastly Stata World Development Indicators and Excel use di erent symbols for null values WDI uses double periods Excel uses empty cells and Stata uses single periods There is a null value for Sweden s domestic credit to private sector in 2001 Find and replace null value indicators e g from to 5 Copy and paste data into the data editor in Stata Close the data editor 6 Issue the following command reshape long data i unit id series id j time I won t spell out the speci cs of the reshape command You can refer to the help le in Stata The command should give the following output note j 1995 1996 1997 1998 1999 2000 2001 Data wide long Number of obs 16 112 Number of variables 11 6 j variable 7 values time xij variables data1995 data1996 data2001 data Issue edit to see what Stata did 7 Now delete the variable that contains the series labels ind1 desc but keep the series ID number variable series id We have to drop it because it interferes with uniquely identifying each observation 8 We do not want all the di erent data series to go down a single column Although we still want unit time e g country years to go down the rows we want the di erent data series to go across columns We accomplish that by issuing the following reshape wide data i unit id time j series id note j 1 2 3 4 Data long wide 3

    Number of obs Number of variables j variable 4 values xij variables

    112 5 series id



    28 7 dropped

    data data1 data2 data4 Issue edit to see what Stata did 9 And we clean up a little move country name unit id drop unit id 10 Recover series name and assign new variable names Refer back to the Excel le to see which series ID number matches up with which series Assign a new short variable name

    4