Generalised Linear Models: Binary data
up vote
0
down vote
favorite
I am currently working on GLM problem.
My response variable is binary as are some of my explanatory variable,others are categorical i.e. 1-1day, 2- 2-3days, 3-5+days and so forth.
I have coded it into factors.
My question is: I have used the step function and I am left with a model with many insignificant variables, in this case; do I simply drop these variables, if not what do I do ?
Also I tried to do the model selection, manually, using the anova function to test if the differences in the deviance were significant enough, and this gives me an answer that is somewhat different to the automatic model selection. Is this to be expected?
How do i go about my model selection, and how can I test if the functional form of my variables is correct ?
Thanks any help! :)
statistics statistical-inference binary logistic-regression
add a comment |
up vote
0
down vote
favorite
I am currently working on GLM problem.
My response variable is binary as are some of my explanatory variable,others are categorical i.e. 1-1day, 2- 2-3days, 3-5+days and so forth.
I have coded it into factors.
My question is: I have used the step function and I am left with a model with many insignificant variables, in this case; do I simply drop these variables, if not what do I do ?
Also I tried to do the model selection, manually, using the anova function to test if the differences in the deviance were significant enough, and this gives me an answer that is somewhat different to the automatic model selection. Is this to be expected?
How do i go about my model selection, and how can I test if the functional form of my variables is correct ?
Thanks any help! :)
statistics statistical-inference binary logistic-regression
You might want to do a search for related terms on stats.stackexchange.com
– shadowtalker
2 days ago
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am currently working on GLM problem.
My response variable is binary as are some of my explanatory variable,others are categorical i.e. 1-1day, 2- 2-3days, 3-5+days and so forth.
I have coded it into factors.
My question is: I have used the step function and I am left with a model with many insignificant variables, in this case; do I simply drop these variables, if not what do I do ?
Also I tried to do the model selection, manually, using the anova function to test if the differences in the deviance were significant enough, and this gives me an answer that is somewhat different to the automatic model selection. Is this to be expected?
How do i go about my model selection, and how can I test if the functional form of my variables is correct ?
Thanks any help! :)
statistics statistical-inference binary logistic-regression
I am currently working on GLM problem.
My response variable is binary as are some of my explanatory variable,others are categorical i.e. 1-1day, 2- 2-3days, 3-5+days and so forth.
I have coded it into factors.
My question is: I have used the step function and I am left with a model with many insignificant variables, in this case; do I simply drop these variables, if not what do I do ?
Also I tried to do the model selection, manually, using the anova function to test if the differences in the deviance were significant enough, and this gives me an answer that is somewhat different to the automatic model selection. Is this to be expected?
How do i go about my model selection, and how can I test if the functional form of my variables is correct ?
Thanks any help! :)
statistics statistical-inference binary logistic-regression
statistics statistical-inference binary logistic-regression
asked 2 days ago
odesinit
255
255
You might want to do a search for related terms on stats.stackexchange.com
– shadowtalker
2 days ago
add a comment |
You might want to do a search for related terms on stats.stackexchange.com
– shadowtalker
2 days ago
You might want to do a search for related terms on stats.stackexchange.com
– shadowtalker
2 days ago
You might want to do a search for related terms on stats.stackexchange.com
– shadowtalker
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Model selection is an art included numerous statistical skill and analyzing technique. Generally, if you get the correct model form or do the right way of variables selecting, the coefficient in result will be meaningful and the model will predict more correctly the target variables. And you can check it by splitting to the training set, validation set and testing set.
With the GLMs have a general form as $y_i=beta_0+sum_{i=1}^nbeta_ix_i+epsilon$, we focus mostly on how to choose the right distribution of random component $Y$ and how to modify the predictors in the best way.
You can imagine that predicting will be more strict if you have the right distribution for the target variable. E.g, you can check the distribution by using the Tweedie model with the functional parameter can specify types distribution such as discrete (Poisson), continuous (Normal, Gamma) and mixed type (Compound Poisson). You can approach specifically shrinkage methods following each type of distributions.
For the predictors $X$, instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze and adjust the threshold of boxplot for the continuous features, and categorical features can be split into the dummy matrix.
After that, you can fit the model and analyze the result. Trying to do several statistical tests to see how well features fit with target variables such as R-square, adjusted-R-square, p-value, do ANOVA testing, do some likelihood test AIC... Using the validation set (or cross-validation set) to improve the model.
Implement the result and testing method, then repeat the model selection steps until you get your expected result.
My resources: Non-Life Insurance Pricing with Generalized Linear Models-Authors: Ohlsson, Esbjörn, Johansson, Björn, and others paper for specific topic.
Hope it is helpful.
New contributor
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Model selection is an art included numerous statistical skill and analyzing technique. Generally, if you get the correct model form or do the right way of variables selecting, the coefficient in result will be meaningful and the model will predict more correctly the target variables. And you can check it by splitting to the training set, validation set and testing set.
With the GLMs have a general form as $y_i=beta_0+sum_{i=1}^nbeta_ix_i+epsilon$, we focus mostly on how to choose the right distribution of random component $Y$ and how to modify the predictors in the best way.
You can imagine that predicting will be more strict if you have the right distribution for the target variable. E.g, you can check the distribution by using the Tweedie model with the functional parameter can specify types distribution such as discrete (Poisson), continuous (Normal, Gamma) and mixed type (Compound Poisson). You can approach specifically shrinkage methods following each type of distributions.
For the predictors $X$, instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze and adjust the threshold of boxplot for the continuous features, and categorical features can be split into the dummy matrix.
After that, you can fit the model and analyze the result. Trying to do several statistical tests to see how well features fit with target variables such as R-square, adjusted-R-square, p-value, do ANOVA testing, do some likelihood test AIC... Using the validation set (or cross-validation set) to improve the model.
Implement the result and testing method, then repeat the model selection steps until you get your expected result.
My resources: Non-Life Insurance Pricing with Generalized Linear Models-Authors: Ohlsson, Esbjörn, Johansson, Björn, and others paper for specific topic.
Hope it is helpful.
New contributor
add a comment |
up vote
0
down vote
Model selection is an art included numerous statistical skill and analyzing technique. Generally, if you get the correct model form or do the right way of variables selecting, the coefficient in result will be meaningful and the model will predict more correctly the target variables. And you can check it by splitting to the training set, validation set and testing set.
With the GLMs have a general form as $y_i=beta_0+sum_{i=1}^nbeta_ix_i+epsilon$, we focus mostly on how to choose the right distribution of random component $Y$ and how to modify the predictors in the best way.
You can imagine that predicting will be more strict if you have the right distribution for the target variable. E.g, you can check the distribution by using the Tweedie model with the functional parameter can specify types distribution such as discrete (Poisson), continuous (Normal, Gamma) and mixed type (Compound Poisson). You can approach specifically shrinkage methods following each type of distributions.
For the predictors $X$, instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze and adjust the threshold of boxplot for the continuous features, and categorical features can be split into the dummy matrix.
After that, you can fit the model and analyze the result. Trying to do several statistical tests to see how well features fit with target variables such as R-square, adjusted-R-square, p-value, do ANOVA testing, do some likelihood test AIC... Using the validation set (or cross-validation set) to improve the model.
Implement the result and testing method, then repeat the model selection steps until you get your expected result.
My resources: Non-Life Insurance Pricing with Generalized Linear Models-Authors: Ohlsson, Esbjörn, Johansson, Björn, and others paper for specific topic.
Hope it is helpful.
New contributor
add a comment |
up vote
0
down vote
up vote
0
down vote
Model selection is an art included numerous statistical skill and analyzing technique. Generally, if you get the correct model form or do the right way of variables selecting, the coefficient in result will be meaningful and the model will predict more correctly the target variables. And you can check it by splitting to the training set, validation set and testing set.
With the GLMs have a general form as $y_i=beta_0+sum_{i=1}^nbeta_ix_i+epsilon$, we focus mostly on how to choose the right distribution of random component $Y$ and how to modify the predictors in the best way.
You can imagine that predicting will be more strict if you have the right distribution for the target variable. E.g, you can check the distribution by using the Tweedie model with the functional parameter can specify types distribution such as discrete (Poisson), continuous (Normal, Gamma) and mixed type (Compound Poisson). You can approach specifically shrinkage methods following each type of distributions.
For the predictors $X$, instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze and adjust the threshold of boxplot for the continuous features, and categorical features can be split into the dummy matrix.
After that, you can fit the model and analyze the result. Trying to do several statistical tests to see how well features fit with target variables such as R-square, adjusted-R-square, p-value, do ANOVA testing, do some likelihood test AIC... Using the validation set (or cross-validation set) to improve the model.
Implement the result and testing method, then repeat the model selection steps until you get your expected result.
My resources: Non-Life Insurance Pricing with Generalized Linear Models-Authors: Ohlsson, Esbjörn, Johansson, Björn, and others paper for specific topic.
Hope it is helpful.
New contributor
Model selection is an art included numerous statistical skill and analyzing technique. Generally, if you get the correct model form or do the right way of variables selecting, the coefficient in result will be meaningful and the model will predict more correctly the target variables. And you can check it by splitting to the training set, validation set and testing set.
With the GLMs have a general form as $y_i=beta_0+sum_{i=1}^nbeta_ix_i+epsilon$, we focus mostly on how to choose the right distribution of random component $Y$ and how to modify the predictors in the best way.
You can imagine that predicting will be more strict if you have the right distribution for the target variable. E.g, you can check the distribution by using the Tweedie model with the functional parameter can specify types distribution such as discrete (Poisson), continuous (Normal, Gamma) and mixed type (Compound Poisson). You can approach specifically shrinkage methods following each type of distributions.
For the predictors $X$, instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze and adjust the threshold of boxplot for the continuous features, and categorical features can be split into the dummy matrix.
After that, you can fit the model and analyze the result. Trying to do several statistical tests to see how well features fit with target variables such as R-square, adjusted-R-square, p-value, do ANOVA testing, do some likelihood test AIC... Using the validation set (or cross-validation set) to improve the model.
Implement the result and testing method, then repeat the model selection steps until you get your expected result.
My resources: Non-Life Insurance Pricing with Generalized Linear Models-Authors: Ohlsson, Esbjörn, Johansson, Björn, and others paper for specific topic.
Hope it is helpful.
New contributor
edited 2 days ago
New contributor
answered 2 days ago
AnNg
374
374
New contributor
New contributor
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2999132%2fgeneralised-linear-models-binary-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You might want to do a search for related terms on stats.stackexchange.com
– shadowtalker
2 days ago