|
|
Newsflash
The school one bench go everywhere together! Then fate threw you in different directions. |
social psychologist Statistics in the Social Sciences (1 viewing) (1) Guests
Favoured: 0
|
|
|
TOPIC: social psychologist Statistics in the Social Sciences
|
|
|
|
social psychologist Statistics in the Social Sciences
|
|
|
I have been asked by a computer scientist who does research in machine learning why the social sciences seem to be “stuck in the '50s” in terms of the statistics they use. The fields called “machine learning” and “modern statistics” offer a variety of new tools for prediction. Such tools include decision trees, neural nets, nearest neighbor techniques, and many others. Apparently, in many applications they provide much better predictions than do the standard tools of regression analysis, logit models, ANOVA and the like. Why don't the social sciences move forward, then? It first has to be said that the social sciences do move forward, and advanced research in econometric theory uses and contributes to the development of these “new tools”. The question is, therefore, why are these advances made relatively slowly, and why haven't we changed our curriculum in core graduate (let alone undergraduate) courses? I think that there are at least three reasons for this. The first two are sociological in nature. The last is more substantive and I’ll say a few more words on it. 1. As any other field, the social sciences may be too conservative. We have the good old tools that we all know and know how to teach; we can run the simple tools in no time on our laptops, sometimes using no more than Excel. Moving forward necessitates a big investment in education, software, equipment and so forth. Thus, we prefer the old and familiar. 2. There is a sort of Grice's principle at work here. [An aside: Grice’s principle says that, as a rule, people tend to use the simplest statement that can convey the message they wish to transmit. Knowing this, the listener can infer from a statement more than it logically implies. For example, if I tell you that I'll be at the office as of 11am, you suspect I won't be there earlier, though what I said is logically consistent with my earlier arrival.] If an economist or a psychologist uses a new statistical technique to analyze data, the audience infers that the same results could not have been obtained using the standard tools. And this is taken to mean that the results are not very robust. Knowing this, researchers do their best to show that their results can be obtained by known tools, and may not even bother to present results that cannot. Thus, at equilibrium, what's considered good empirical research uses standard statistics. 3. Not all of the new methods have well-developed hypotheses tests. There is no conceptual difficulty is developing such tests, but often the mathematical difficulties may be significant. [Example: Decision Trees. This is a technique for classification or prediction, addressing the basic problem that regression analysis deals with: trying to find the value of a predicted variable y _base_d on associated values of the predicting variables x_1,…,x_m, and past joint observations of the x’s as well as the y. At each non-terminal node of the tree there is a question about the values of the x’s, and each possible answer to the question leads to a different branch. A new data point, for which the x values are given, is analyzed by starting at the root of the tree, and finding a path down to a terminal node _base_d on the given x values. Often the branching is according to the value of a particular variable (say, x_76 vs. x_7<=6). At the terminal nodes the prediction may be a constant, or a result of a function. In principle, one can imagine a statistical model in which the data generating process (DGP) is _base_d on a decision tree. For example, consider the set of all DGPs defined by binary trees of depth 3, where, at each terminal node, there is a linear model (with an error term as usual). Define the null hypothesis H_0 to consist of a subset of these DGPs, for instance, all the trees that do not use x_7 in any of their nodes. Rejecting H_0 in this example would mean that “x_7 is significant”, having a similar meaning to rejecting the hypotheses that beta_7=0 in standard linear regression. The problem is that the combinatorial nature of decision trees makes it very complicated to compute maximum likelihood estimators for this problem. More generally, tractability may be one reason that for many such “new” techniques there is no theory of statistical inference as we know it.] Why do we need hypotheses tests? I was surprised when the computer scientist posed this question. Indeed, if you have a statistical model that provides very good predictions, and you can generate a prediction for every possible value of the predicting variables, why would you insist on hypotheses testing? The answer might have to do with the distinction between general, qualitative conclusions and specific, quantitative predictions. Suppose that we have found, in three different countries, that teaching method A is more effective than method B. Suppose that we're designing the education system in a fourth country, and that we have to choose between methods A and B, without conducting a new study in this country. It makes sense to resort to the general conclusion and employ method A. This is true even if we do not have any specific model that would allow us to predict the success of individual students in the new country. Thus, hypotheses tests may allow us to distill general conclusions that might be transportable from one context to another, in a way that specific predictions aren't. It follows that techniques for which hypotheses tests have not yet been developed can be relatively more helpful in domains where the specific prediction is more important than the general conclusion. To consider two extremes, suppose that in case 1 we wish to predict the probability of a heart attack, whereas in case 2 our goal is to find out whether negative income tax is an effective tool in getting more people to seek jobs. In case 1 we believe that the risk factors and the particular parameters by which they affect the probability of a heart attack are almost universal. Once factors such as age, blood pressure, and the like are taken into consideration, others such as culture and religion are unlikely to change the assessment of the probability of a heart attack. Hence the very same parameters estimated in one country are likely to be applicable to another. In this case specific prediction is possible, and we might well opt for a prediction method that has a very low SSE even if it has no statistical inference theory to augment it. By contrast, in case 2 we consider the probabilities that an individual would seek a job with and without a negative income tax. These probabilities are hard to predict if one only analyzes individuals in other cultures and other economies. In fact, even qualitative conclusions _base_d on experience in one country should be taken with a few grains of salt when applied to another. But if such an inference is at all possible, we would trust the general conclusions much more than the specific predictions. And we will therefore not choose a statistical technique that lacks a machinery of hypotheses testing, even if its predictions in past data_base_s were very good. Tzachi
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
social psychologist Statistics in the Social Sciences
|
|
|
Interesting exchange you have had Tzachi! I am as surprised as you are that your CS colleague wondered why we need hypothesis testing since it is at the heart of qualitative (and to a certain extent quantitative) policy analysis. Since we (social scientists) are mostly concerned with policy analysis, we are rightfully obsessed with endogeneity. In lieu of a true randomized experiment, spelling out the assumptions of the economic model lead to one of two strategies: either some clever choice of instruments, or a completely structural approach _base_d on equilibrium analysis and estimation of the model's deep parameters, the latter followed by counterfactual analysis that is as good as the model's assumptions are. I know *nothing* about the methods that your CS colleague is referring to, but as a first step I would like to know if they lend themselves to estimate equilibrium models where actors best respond, either through participation or through actions, to changes in the environment. If they fail that test then game over. If they pass it, then the next step is to have the ability to generate correct standard errors, since I don't think our conclusions are of much use without them. Cheers, Steve - Hide quoted text -- Show quoted text -
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
social psychologist Statistics in the Social Sciences
|
|
|
In my experience problems in machine learning tend to be quite different from problems in economics, and in particular to have more _object_ive criteria of deciding what a good solution is. A typical problem is, say, to decipher hand-written zip codes. The problem comes with a standard dataset called MNIST. MNIST contains 60,000 examples in the training set, and 10,000 examples in the test set. Each example is the image of a digit together with a classification (0-9). The goal is to use the examples in the training set to predict the correct classification in the test set. That is, you have to come up with (1) a classification model that has an image of a digit as its input, and comes up with a digit as is output, and (2) a method of using the training set to set the parameters of the model. Models are often probabilistic, i.e. rather than come up with a classification directly, they compute the probability of different digits conditional on the input, and then output the most probable digit. There are no hypotheses as such. Rather, models are compared against each other. Note: (1) there is an _object_ive score (the number of digits in the test set that are correctly classified), and (2) the test set is essentially a very large representative sample of the real world (in this case zip codes on US envelopes), so that doing well on the test set is strongly predictive of doing well in the real world. Given these properties, if someone comes up with a better model, then it will be immediately apparent to everyone that the model is better: the proof of the pudding is in the eating. In particular, this would be the case even the prior to seeing the results no leading researcher would give the model a second look, and in principle even if it is very difficult to understand quite why the model works (as may be the case with, say, neural networks). The same should apply, in principle, for something like that heart attack example, though perhaps because it's a matter of life and death doctors would want to understand exactly why the model works. But in economics I suspect that the nature of the problems is different, and it is much harder to judge which model is better. In, say, the negative income-tax example, the evidence from country A has limited bearing on country B. So if you want to convince people (or yourself!) of your predictions for country B you need to convince them of the merits of your model separately from its ability to explain existing data. Guy. - Hide quoted text -- Show quoted text -
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
social psychologist Statistics in the Social Sciences
|
|
|
Notice that your CS colleague is interested in statistical tools for prediction. While some of our private sector colleagues are interested in prediction, many of us are interested in explanation, and explanation begins with a structural model even if it is never directly estimated. (For example, the discussion of the validity of an instrument entails structural arguments.) Our problems are often inferential in nature, and require the separation of correlation from causation. This is not central to the part of the AI and machine learning literatures I've read (which, of course, could be an odd sample
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
social psychologist Statistics in the Social Sciences
|
|
|
prediction. While some of our private sector colleagues are interested in prediction, many of us are interested in explanation, and explanation begins with a structural model even if it is never directly estimated. (For example, the discussion of the validity of an instrument entails structural arguments.) Our problems are often inferential in nature, and require the separation of correlation from causation. This is not central to the part of the AI and machine learning literatures I've read (which, of course, could be an odd sample
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
Who's Online
We have 35 guests online
Collagen - to retain youth
Collagen - to keep the protein that gives it the appropriate tension, elasticity, and is responsible for its flexibility. Collagen in Conference Organizers Poland Incentive Travel Poland treppen de humans is 1 / 3 of all proteins in the body. Is the most important structural protein, very resistant to stretching. Is the main protein of connective tissue. Collagen is extremely durable to stretching. To break the collagen fibers with a diameter of only 1 mm, you must use the burden of at least 10 kg. Cells of the skin after 25 years Metallzäune Zäune Zaun aus Polen and reduce its natural collagen production, slow metabolism and quickly die. It is harder to hydrate the skin and nourish.
You can't choose your family
Currently, the number of children in foster care facilities reaches 70 thousand. The vast majority, as much as 96% of charges, have both parents. The rest are orphans natural. My friend is pregnant and I my aunt. With hand on heart and a big smile on his face I promised her that I will be the best aunt in the world. Enjoy as hell, DMC Poland Zaun Zaun, Zaunhersteller just as if I, not she, was born 8 months for a beautiful, little man. It also vowed that if it failed in this world, will replace her as a mother, as far as possible. There is something magical in a vague image, a tiny spot, which is not even aware it exists. My friend's unborn child is lucky. Immediately after the incoming tour operator poland escape from her pain, will be welcomed by two loving parents and the whole army of grandmothers, grandfathers, uncles, cousins and aunts. Unfortunately, not every man is a gift from fate.
Vitamin D increases muscle strength
90% of vitamin D in our body comes from sun exposure. Its major deficiency is particularly acute in the autumn and winter months, when ultraviolet radiation is not able to provide adequate doses of this vitamin. Vitamin D has recently Considerable interest among scientists, notably because of the role of the prevention of osteoporosis, build healthy and strong bones, as well as prevention of certain types of cancer and hypertension. Researchers from
the University of Manchester reported on the pages of Journal of Clinical Endocrinology & Metabolism, shows that the results of their research a vitamin that has a positive effect on endurance and strength of muscle contraction in adolescent girls.
|
|