sgGWR.optimizers package
Submodules
sgGWR.optimizers.existings module
Optimizers using other packages. Currently we support optimizers in scipy and optax.
- class sgGWR.optimizers.existings.optax_optimizer(optax_optim=(<function chain.<locals>.init_fn>, <function chain.<locals>.update_fn>))
Bases:
object
- run(model, maxiter=1000, batchsize=100, PRNGkey=None, diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)
- class sgGWR.optimizers.existings.scipy_L_BFGS_B
Bases:
object
same setting to the ‘scgwr’ package in R see: https://github.com/cran/scgwr/blob/master/R/scgwr.R
- run(model, diff_mode='auto', tol=None, kwargs_minimize={})
sgGWR.optimizers.existings_numpy module
Optimizers using other packages. Currently we support optimizers in scipy.
- class sgGWR.optimizers.existings_numpy.scipy_L_BFGS_B
Bases:
object
same setting to the ‘scgwr’ package in R see: https://github.com/cran/scgwr/blob/master/R/scgwr.R
- run(model, diff_mode='manual', tol=None, kwargs_minimize={})
sgGWR.optimizers.golden module
Golden Section Search for adaptive bandwidth
sgGWR.optimizers.second module
Optimizers using stochastic gradients with second order informations. Faster convergence can be expected.
- class sgGWR.optimizers.second.SGN(learning_rate0=1.0, lam=0)
Bases:
SGD
Stochastic Gauss-Newton methods. Small value is added to approximated Hessian to guarantee its positive definitness.
refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).
- run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)
- step(t, x, f_g_J, idx)
- class sgGWR.optimizers.second.SGN_BFGS(learning_rate0=1.0, lam=0)
Bases:
SGN
Stochastic Gauss-Newton methods. BFGS formula is applied to guarantee the positive definiteness of approximated Hessian matrix. Note: I recommend small learning rate
refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).
- step(t, x, f_g_J, idx)
- class sgGWR.optimizers.second.SGN_LM(learning_rate0=1.0, lam=0, lam_LM0=1.0, boost=1.01, drop=0.99, eps=0.25, tau=0.001)
Bases:
SGN
Stochastic Gauss-Newton methods with damping of Levenberg-Marquardt (LM) method. The damping improves the stability. This algorithm refered to as “SMW-GN” in Ren and Goldfarb (2019) because they used Sherman-Morrison-Woodbury(SMW) formula. SMW formula is not used in this implementation because we assume that the number of parameters is not far larger than the mini-batch size.
refernces: Ren, Y., & Goldfarb, D. (2019). Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks. https://arxiv.org/abs/1906.02353v1
- step(t, x, f_g_J, idx)
sgGWR.optimizers.sg module
Optimizers using stochastic gradients.
- class sgGWR.optimizers.sg.ASGD(learning_rate0=1.0, lam=0.0001)
Bases:
SGD
Avereaged Stochastic Gradient Descent Algorithm
reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
- lr_schedule(t)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.sg.Adam(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)
Bases:
SGD
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.sg.SGD(learning_rate0=1.0, lam=0.0001)
Bases:
object
Stochastic Gradient Descent Algorithm
reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
- lr_schedule(t)
- run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.sg.SGDarmijo(learning_rate0=1.0, c=0.5, ls_decay=0.5, reset_decay=2.0, search_from_lr0=False)
Bases:
SGD
Stochastic Gradient Descent Algorithm with Armijo Line-search
reference: Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., & Lacoste-Julien, S. (2019). Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32.
- armijo_cond(f, f0, x, grads, g2, lr)
- armijo_search(x, grads, lr, f)
- lr_schedule(t)
- step(t, x, f, g, f_and_g, idx)
sgGWR.optimizers.sg_numpy module
Optimizers using stochastic gradients.
- class sgGWR.optimizers.sg_numpy.ASGD(learning_rate0=0.1, lam=0.0001)
Bases:
SGD
Avereaged Stochastic Gradient Descent Algorithm
reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
- lr_schedule(t)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.sg_numpy.SGD(learning_rate0=0.1, lam=0.0001)
Bases:
object
Stochastic Gradient Descent Algorithm
reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
- lr_schedule(t)
- run(model, maxiter=1000, batchsize=100, rng=Generator(PCG64) at 0x7F2794622960, tol=0.001, n_iter_no_change=100, verbose=True)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.sg_numpy.SGDarmijo(learning_rate0=1.0, c=0.5, ls_decay=0.5, reset_decay=2.0, search_from_lr0=False)
Bases:
SGD
Stochastic Gradient Descent Algorithm with Armijo Line-search
reference: Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., & Lacoste-Julien, S. (2019). Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32.
- armijo_cond(f, f0, x, grads, g2, lr)
- armijo_search(x, grads, lr, f)
- lr_schedule(t)
- step(t, x, f, g, f_and_g, idx)
sgGWR.optimizers.vr module
Optimizers using variance reduced stochastic gradients. They are recommended if you require high-accuracy optimization.
- class sgGWR.optimizers.vr.KatyushaXs(learning_rate0=0.1, lam=0, neg_moment=0.1)
Bases:
SVRG
KatyushaXs algorithm
refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.
- class sgGWR.optimizers.vr.KatyushaXw(learning_rate0=0.1, lam=0)
Bases:
SVRG
KatyushaXw algorithm
refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.
- class sgGWR.optimizers.vr.SVRG(learning_rate0=0.1, lam=0)
Bases:
SGD
Stochastic Variance Reduced Gradient with mini-batch
refernces: Johnson, R., & Zhang, T. (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In Advances in Neural Information Processing Systems (Vol. 26).
Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.
- batch_step(f_step, x0, idxs)
- run(model, max_epoch=100, batchsize=None, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=5, min_epoch=5, verbose=True, lax_scan=True)
- step(t, x, f, g, f_and_g, idx)
Module contents
- class sgGWR.optimizers.ASGD(learning_rate0=1.0, lam=0.0001)
Bases:
SGD
Avereaged Stochastic Gradient Descent Algorithm
reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
- lr_schedule(t)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.Adam(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)
Bases:
SGD
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.KatyushaXs(learning_rate0=0.1, lam=0, neg_moment=0.1)
Bases:
SVRG
KatyushaXs algorithm
refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.
- class sgGWR.optimizers.KatyushaXw(learning_rate0=0.1, lam=0)
Bases:
SVRG
KatyushaXw algorithm
refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.
- class sgGWR.optimizers.SGD(learning_rate0=1.0, lam=0.0001)
Bases:
object
Stochastic Gradient Descent Algorithm
reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
- lr_schedule(t)
- run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.SGDarmijo(learning_rate0=1.0, c=0.5, ls_decay=0.5, reset_decay=2.0, search_from_lr0=False)
Bases:
SGD
Stochastic Gradient Descent Algorithm with Armijo Line-search
reference: Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., & Lacoste-Julien, S. (2019). Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32.
- armijo_cond(f, f0, x, grads, g2, lr)
- armijo_search(x, grads, lr, f)
- lr_schedule(t)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.SGN(learning_rate0=1.0, lam=0)
Bases:
SGD
Stochastic Gauss-Newton methods. Small value is added to approximated Hessian to guarantee its positive definitness.
refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).
- run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)
- step(t, x, f_g_J, idx)
- class sgGWR.optimizers.SGN_BFGS(learning_rate0=1.0, lam=0)
Bases:
SGN
Stochastic Gauss-Newton methods. BFGS formula is applied to guarantee the positive definiteness of approximated Hessian matrix. Note: I recommend small learning rate
refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).
- step(t, x, f_g_J, idx)
- class sgGWR.optimizers.SGN_LM(learning_rate0=1.0, lam=0, lam_LM0=1.0, boost=1.01, drop=0.99, eps=0.25, tau=0.001)
Bases:
SGN
Stochastic Gauss-Newton methods with damping of Levenberg-Marquardt (LM) method. The damping improves the stability. This algorithm refered to as “SMW-GN” in Ren and Goldfarb (2019) because they used Sherman-Morrison-Woodbury(SMW) formula. SMW formula is not used in this implementation because we assume that the number of parameters is not far larger than the mini-batch size.
refernces: Ren, Y., & Goldfarb, D. (2019). Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks. https://arxiv.org/abs/1906.02353v1
- step(t, x, f_g_J, idx)
- class sgGWR.optimizers.SVRG(learning_rate0=0.1, lam=0)
Bases:
SGD
Stochastic Variance Reduced Gradient with mini-batch
refernces: Johnson, R., & Zhang, T. (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In Advances in Neural Information Processing Systems (Vol. 26).
Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.
- batch_step(f_step, x0, idxs)
- run(model, max_epoch=100, batchsize=None, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=5, min_epoch=5, verbose=True, lax_scan=True)
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.Yogi(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)
Bases:
Adam
- step(t, x, f, g, f_and_g, idx)
- class sgGWR.optimizers.golden_section
Bases:
object
- run(model, maxiter=1000, bracket=None, tol=1e-05, aicc=False, verbose=True)
- class sgGWR.optimizers.optax_optimizer(optax_optim=(<function chain.<locals>.init_fn>, <function chain.<locals>.update_fn>))
Bases:
object
- run(model, maxiter=1000, batchsize=100, PRNGkey=None, diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)
- class sgGWR.optimizers.scipy_L_BFGS_B
Bases:
object
same setting to the ‘scgwr’ package in R see: https://github.com/cran/scgwr/blob/master/R/scgwr.R
- run(model, diff_mode='auto', tol=None, kwargs_minimize={})