The median distance without anything else is worthless. The median only represents the “typical American” if we’re working with some kind of bell curve, which can’t be true since a sizable chunk of people still live with their parents. For all I know 49% of people could be living with their parents and another 49% could have moved far away - the median still only shows a data point from the remaining 2% of people.
If the data is normally distributed (as in bell shaped), the mean and median will be the same. The problem is that the data is probably very skewed, so the median is probably a better representation of the central tendency than the mean. Also, it’s incorrect to say it only represents 2% of people; it’s more that every person is only represented by their position in the set. Quintiles, quartiles, and percentiles work the same way. I like to think of it as everyone being a single vote, unweighted by value.
That said, if you wanted another acceptable alternative, you can also remove all outliers and likely return the curve back to normal. The problem there is you’d probably be removing every immigrant, so you wouldn’t be representing all Americans. Pros and cons, but given medians are almost only used to describe data and not analyze it, since it’s not compatable with a lot of statistics. A real analyst would probably just dummy code immigration in a regression and provide coefficients for both groups, anyway.
Acceptable in the statistical sense. Normality is required by a lot of statistical tests, so it’s done a lot. There are better ways to do it without losing important insights though, hence what I said later in that paragraph.
The median distance without anything else is worthless. The median only represents the “typical American” if we’re working with some kind of bell curve, which can’t be true since a sizable chunk of people still live with their parents. For all I know 49% of people could be living with their parents and another 49% could have moved far away - the median still only shows a data point from the remaining 2% of people.
This “anything else” usually is the variance or standard deviation, but I doubt anyone without education in statistics can grasp what they mean.
If the data is normally distributed (as in bell shaped), the mean and median will be the same. The problem is that the data is probably very skewed, so the median is probably a better representation of the central tendency than the mean. Also, it’s incorrect to say it only represents 2% of people; it’s more that every person is only represented by their position in the set. Quintiles, quartiles, and percentiles work the same way. I like to think of it as everyone being a single vote, unweighted by value.
That said, if you wanted another acceptable alternative, you can also remove all outliers and likely return the curve back to normal. The problem there is you’d probably be removing every immigrant, so you wouldn’t be representing all Americans. Pros and cons, but given medians are almost only used to describe data and not analyze it, since it’s not compatable with a lot of statistics. A real analyst would probably just dummy code immigration in a regression and provide coefficients for both groups, anyway.
I’m not sure I would call that “acceptable.” I would be more likely to call it “destroying the interesting features of the data.”
Acceptable in the statistical sense. Normality is required by a lot of statistical tests, so it’s done a lot. There are better ways to do it without losing important insights though, hence what I said later in that paragraph.