Find College Major with Highest Starting Salaries
To access a particular column from a data frame we can use the square bracket notation, like so:
clean_df['Starting Median Salary']
You should see all the values printed out below the cell for just this column:
To find the highest starting salary we can simply chain the .max()
method.
clean_df['Starting Median Salary'].max()
The highest starting salary is $74,300. But which college major earns this much on average? For this, we need to know the row number or index so that we can look up the name of the major. Lucky for us, the .idxmax()
method will give us index for the row with the largest value.
clean_df['Starting Median Salary'].idxmax()
which is 43. To see the name of the major that corresponds to that particular row, we can use the .loc
(location) property.
clean_df['Undergraduate Major'].loc[43]
Here we are selecting both a column ('Undergraduate Major') and a row at index 43, so we are retrieving the value of a particular cell. You might see people using the double square brackets notation to achieve exactly the same thing:
clean_df['Undergraduate Major'][43]
If you don't specify a particular column you can use the .loc property to retrieve an entire row:
clean_df.loc[43]
Now that we've found the major with the highest starting salary, can you write the code to find the following:
What college major has the highest mid-career salary? How much do graduates with this major earn? (Mid-career is defined as having 10+ years of experience).
Which college major has the lowest starting salary and how much do graduates earn after university?
Which college major has the lowest mid-career salary and how much can people expect to earn with this degree?
I'll provide the solution and the code snippets in the next lesson =)