Why does changing random seeds alter results?












5














I'm running some SVMs for a seminar and a friend of mine noted I should set a seed so my results don't change everytime I run the code. I was wondering why is that the case. If a different seed can induce different results, why should I trust SVMs at all?



Should I set a specific seed or is it ok to just set the first number that comes to my mind?










share|cite|improve this question


















  • 5




    "No man ever steps in the same river twice, for it's not the same river and he's not the same man." -- Heraclitus. Any time you do something, the results differ a little. Why should you demand, then, that a statistical procedure be any different? What matters isn't that the result changes, but by how much and whether it makes any difference. See stats.stackexchange.com/search?q=seed+set+random for discussions of this issue.
    – whuber
    Nov 18 '18 at 19:45


















5














I'm running some SVMs for a seminar and a friend of mine noted I should set a seed so my results don't change everytime I run the code. I was wondering why is that the case. If a different seed can induce different results, why should I trust SVMs at all?



Should I set a specific seed or is it ok to just set the first number that comes to my mind?










share|cite|improve this question


















  • 5




    "No man ever steps in the same river twice, for it's not the same river and he's not the same man." -- Heraclitus. Any time you do something, the results differ a little. Why should you demand, then, that a statistical procedure be any different? What matters isn't that the result changes, but by how much and whether it makes any difference. See stats.stackexchange.com/search?q=seed+set+random for discussions of this issue.
    – whuber
    Nov 18 '18 at 19:45
















5












5








5


1





I'm running some SVMs for a seminar and a friend of mine noted I should set a seed so my results don't change everytime I run the code. I was wondering why is that the case. If a different seed can induce different results, why should I trust SVMs at all?



Should I set a specific seed or is it ok to just set the first number that comes to my mind?










share|cite|improve this question













I'm running some SVMs for a seminar and a friend of mine noted I should set a seed so my results don't change everytime I run the code. I was wondering why is that the case. If a different seed can induce different results, why should I trust SVMs at all?



Should I set a specific seed or is it ok to just set the first number that comes to my mind?







svm






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Nov 18 '18 at 19:38









Pedro Cavalcante Oliveira

1534




1534








  • 5




    "No man ever steps in the same river twice, for it's not the same river and he's not the same man." -- Heraclitus. Any time you do something, the results differ a little. Why should you demand, then, that a statistical procedure be any different? What matters isn't that the result changes, but by how much and whether it makes any difference. See stats.stackexchange.com/search?q=seed+set+random for discussions of this issue.
    – whuber
    Nov 18 '18 at 19:45
















  • 5




    "No man ever steps in the same river twice, for it's not the same river and he's not the same man." -- Heraclitus. Any time you do something, the results differ a little. Why should you demand, then, that a statistical procedure be any different? What matters isn't that the result changes, but by how much and whether it makes any difference. See stats.stackexchange.com/search?q=seed+set+random for discussions of this issue.
    – whuber
    Nov 18 '18 at 19:45










5




5




"No man ever steps in the same river twice, for it's not the same river and he's not the same man." -- Heraclitus. Any time you do something, the results differ a little. Why should you demand, then, that a statistical procedure be any different? What matters isn't that the result changes, but by how much and whether it makes any difference. See stats.stackexchange.com/search?q=seed+set+random for discussions of this issue.
– whuber
Nov 18 '18 at 19:45






"No man ever steps in the same river twice, for it's not the same river and he's not the same man." -- Heraclitus. Any time you do something, the results differ a little. Why should you demand, then, that a statistical procedure be any different? What matters isn't that the result changes, but by how much and whether it makes any difference. See stats.stackexchange.com/search?q=seed+set+random for discussions of this issue.
– whuber
Nov 18 '18 at 19:45












1 Answer
1






active

oldest

votes


















12














tl;dr practically speaking, you can probably set the seed to anything you want (e.g. your birthday or phone number [although there are obvious privacy issues there :-)] or your lucky number); with some interesting caveats, you can use the same random number seed for most of your analyses (I often use 1001). In order to be useful, stochastic algorithms are generally insensitive to the random number seed.



the long answer



Classical statistical methods (t-test, ANOVA, regression etc.) are deterministic algorithms, but many modern algorithmic approaches include a stochastic component. (In between are methods like k-means clustering or expectation-maximization, which are intrinsically deterministic but are usually run from multiple randomly chosen starting points to mitigate their sensitivity to starting conditions.)



SVM need not be stochastic (e.g. the implementation in the e1071 package for R appears to be deterministic), but it is often implemented using stochastic gradient descent (SGD: e.g. see here) for computational reasons.



Methods that are using large ensembles of random samples from the data (e.g. bootstrapping, bagging, as well as SGD, which picks a different sample of the data at each update step) are effectively averaging across many samples, and are likely to be relatively insensitive to the random-number seed. Methods that are likely to be unstable with respect to the random-number seed (e.g. EM, k-means clustering) will generally have mechanisms built into the software that will automatically run several realizations and do something sensible with the results (i.e. average them), to make the method less sensitive.



This sensitivity is part of the information that you should know about a method before using it (along with some idea of its strengths and weaknesses, what meta-parameters it has that need to be tuned, etc.).



The best thing to do in the course of learning is to try some experiments - for a particular data set and model, try the same method with a handful of different random-number seeds and see how much the results vary!






share|cite|improve this answer























  • +1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
    – whuber
    Nov 18 '18 at 21:35










  • As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
    – MotiN
    Nov 25 '18 at 9:40










  • I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
    – Ben Bolker
    Nov 25 '18 at 13:46











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377658%2fwhy-does-changing-random-seeds-alter-results%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









12














tl;dr practically speaking, you can probably set the seed to anything you want (e.g. your birthday or phone number [although there are obvious privacy issues there :-)] or your lucky number); with some interesting caveats, you can use the same random number seed for most of your analyses (I often use 1001). In order to be useful, stochastic algorithms are generally insensitive to the random number seed.



the long answer



Classical statistical methods (t-test, ANOVA, regression etc.) are deterministic algorithms, but many modern algorithmic approaches include a stochastic component. (In between are methods like k-means clustering or expectation-maximization, which are intrinsically deterministic but are usually run from multiple randomly chosen starting points to mitigate their sensitivity to starting conditions.)



SVM need not be stochastic (e.g. the implementation in the e1071 package for R appears to be deterministic), but it is often implemented using stochastic gradient descent (SGD: e.g. see here) for computational reasons.



Methods that are using large ensembles of random samples from the data (e.g. bootstrapping, bagging, as well as SGD, which picks a different sample of the data at each update step) are effectively averaging across many samples, and are likely to be relatively insensitive to the random-number seed. Methods that are likely to be unstable with respect to the random-number seed (e.g. EM, k-means clustering) will generally have mechanisms built into the software that will automatically run several realizations and do something sensible with the results (i.e. average them), to make the method less sensitive.



This sensitivity is part of the information that you should know about a method before using it (along with some idea of its strengths and weaknesses, what meta-parameters it has that need to be tuned, etc.).



The best thing to do in the course of learning is to try some experiments - for a particular data set and model, try the same method with a handful of different random-number seeds and see how much the results vary!






share|cite|improve this answer























  • +1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
    – whuber
    Nov 18 '18 at 21:35










  • As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
    – MotiN
    Nov 25 '18 at 9:40










  • I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
    – Ben Bolker
    Nov 25 '18 at 13:46
















12














tl;dr practically speaking, you can probably set the seed to anything you want (e.g. your birthday or phone number [although there are obvious privacy issues there :-)] or your lucky number); with some interesting caveats, you can use the same random number seed for most of your analyses (I often use 1001). In order to be useful, stochastic algorithms are generally insensitive to the random number seed.



the long answer



Classical statistical methods (t-test, ANOVA, regression etc.) are deterministic algorithms, but many modern algorithmic approaches include a stochastic component. (In between are methods like k-means clustering or expectation-maximization, which are intrinsically deterministic but are usually run from multiple randomly chosen starting points to mitigate their sensitivity to starting conditions.)



SVM need not be stochastic (e.g. the implementation in the e1071 package for R appears to be deterministic), but it is often implemented using stochastic gradient descent (SGD: e.g. see here) for computational reasons.



Methods that are using large ensembles of random samples from the data (e.g. bootstrapping, bagging, as well as SGD, which picks a different sample of the data at each update step) are effectively averaging across many samples, and are likely to be relatively insensitive to the random-number seed. Methods that are likely to be unstable with respect to the random-number seed (e.g. EM, k-means clustering) will generally have mechanisms built into the software that will automatically run several realizations and do something sensible with the results (i.e. average them), to make the method less sensitive.



This sensitivity is part of the information that you should know about a method before using it (along with some idea of its strengths and weaknesses, what meta-parameters it has that need to be tuned, etc.).



The best thing to do in the course of learning is to try some experiments - for a particular data set and model, try the same method with a handful of different random-number seeds and see how much the results vary!






share|cite|improve this answer























  • +1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
    – whuber
    Nov 18 '18 at 21:35










  • As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
    – MotiN
    Nov 25 '18 at 9:40










  • I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
    – Ben Bolker
    Nov 25 '18 at 13:46














12












12








12






tl;dr practically speaking, you can probably set the seed to anything you want (e.g. your birthday or phone number [although there are obvious privacy issues there :-)] or your lucky number); with some interesting caveats, you can use the same random number seed for most of your analyses (I often use 1001). In order to be useful, stochastic algorithms are generally insensitive to the random number seed.



the long answer



Classical statistical methods (t-test, ANOVA, regression etc.) are deterministic algorithms, but many modern algorithmic approaches include a stochastic component. (In between are methods like k-means clustering or expectation-maximization, which are intrinsically deterministic but are usually run from multiple randomly chosen starting points to mitigate their sensitivity to starting conditions.)



SVM need not be stochastic (e.g. the implementation in the e1071 package for R appears to be deterministic), but it is often implemented using stochastic gradient descent (SGD: e.g. see here) for computational reasons.



Methods that are using large ensembles of random samples from the data (e.g. bootstrapping, bagging, as well as SGD, which picks a different sample of the data at each update step) are effectively averaging across many samples, and are likely to be relatively insensitive to the random-number seed. Methods that are likely to be unstable with respect to the random-number seed (e.g. EM, k-means clustering) will generally have mechanisms built into the software that will automatically run several realizations and do something sensible with the results (i.e. average them), to make the method less sensitive.



This sensitivity is part of the information that you should know about a method before using it (along with some idea of its strengths and weaknesses, what meta-parameters it has that need to be tuned, etc.).



The best thing to do in the course of learning is to try some experiments - for a particular data set and model, try the same method with a handful of different random-number seeds and see how much the results vary!






share|cite|improve this answer














tl;dr practically speaking, you can probably set the seed to anything you want (e.g. your birthday or phone number [although there are obvious privacy issues there :-)] or your lucky number); with some interesting caveats, you can use the same random number seed for most of your analyses (I often use 1001). In order to be useful, stochastic algorithms are generally insensitive to the random number seed.



the long answer



Classical statistical methods (t-test, ANOVA, regression etc.) are deterministic algorithms, but many modern algorithmic approaches include a stochastic component. (In between are methods like k-means clustering or expectation-maximization, which are intrinsically deterministic but are usually run from multiple randomly chosen starting points to mitigate their sensitivity to starting conditions.)



SVM need not be stochastic (e.g. the implementation in the e1071 package for R appears to be deterministic), but it is often implemented using stochastic gradient descent (SGD: e.g. see here) for computational reasons.



Methods that are using large ensembles of random samples from the data (e.g. bootstrapping, bagging, as well as SGD, which picks a different sample of the data at each update step) are effectively averaging across many samples, and are likely to be relatively insensitive to the random-number seed. Methods that are likely to be unstable with respect to the random-number seed (e.g. EM, k-means clustering) will generally have mechanisms built into the software that will automatically run several realizations and do something sensible with the results (i.e. average them), to make the method less sensitive.



This sensitivity is part of the information that you should know about a method before using it (along with some idea of its strengths and weaknesses, what meta-parameters it has that need to be tuned, etc.).



The best thing to do in the course of learning is to try some experiments - for a particular data set and model, try the same method with a handful of different random-number seeds and see how much the results vary!







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Nov 18 '18 at 22:38

























answered Nov 18 '18 at 21:05









Ben Bolker

22.6k16091




22.6k16091












  • +1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
    – whuber
    Nov 18 '18 at 21:35










  • As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
    – MotiN
    Nov 25 '18 at 9:40










  • I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
    – Ben Bolker
    Nov 25 '18 at 13:46


















  • +1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
    – whuber
    Nov 18 '18 at 21:35










  • As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
    – MotiN
    Nov 25 '18 at 9:40










  • I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
    – Ben Bolker
    Nov 25 '18 at 13:46
















+1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
– whuber
Nov 18 '18 at 21:35




+1. For a discussion of some of the issues associated with using arbitrary seeds, please see stats.stackexchange.com/questions/80407.
– whuber
Nov 18 '18 at 21:35












As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
– MotiN
Nov 25 '18 at 9:40




As well, even for a deterministic approach it may be dependent on a starting point or the ordering of the data.
– MotiN
Nov 25 '18 at 9:40












I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
– Ben Bolker
Nov 25 '18 at 13:46




I did make the point about starting point above ... and I wasn't dealing with issues related to numerical instability in my answer, seems slightly tangential ...
– Ben Bolker
Nov 25 '18 at 13:46


















draft saved

draft discarded




















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377658%2fwhy-does-changing-random-seeds-alter-results%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

AnyDesk - Fatal Program Failure

How to calibrate 16:9 built-in touch-screen to a 4:3 resolution?

QoS: MAC-Priority for clients behind a repeater