sgGWR.optimizers package

Submodules

sgGWR.optimizers.existings module

Optimizers using other packages. Currently we support optimizers in scipy and optax.

class sgGWR.optimizers.existings.optax_optimizer(optax_optim=(<function chain.<locals>.init_fn>, <function chain.<locals>.update_fn>))

Bases: object

run(model, maxiter=1000, batchsize=100, PRNGkey=None, diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)

class sgGWR.optimizers.existings.scipy_L_BFGS_B

Bases: object

same setting to the ‘scgwr’ package in R see: https://github.com/cran/scgwr/blob/master/R/scgwr.R

run(model, diff_mode='auto', tol=None, kwargs_minimize={})

sgGWR.optimizers.existings_numpy module

Optimizers using other packages. Currently we support optimizers in scipy.

class sgGWR.optimizers.existings_numpy.scipy_L_BFGS_B

Bases: object

same setting to the ‘scgwr’ package in R see: https://github.com/cran/scgwr/blob/master/R/scgwr.R

run(model, diff_mode='manual', tol=None, kwargs_minimize={})

sgGWR.optimizers.golden module

Golden Section Search for adaptive bandwidth

class sgGWR.optimizers.golden.golden_section

Bases: object

run(model, maxiter=1000, bracket=None, tol=1e-05, aicc=False, verbose=True)

sgGWR.optimizers.second module

Optimizers using stochastic gradients with second order informations. Faster convergence can be expected.

class sgGWR.optimizers.second.SGN(learning_rate0=1.0, lam=0)

Bases: SGD

Stochastic Gauss-Newton methods. Small value is added to approximated Hessian to guarantee its positive definitness.

refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).

run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)

step(t, x, f_g_J, idx)

class sgGWR.optimizers.second.SGN_BFGS(learning_rate0=1.0, lam=0)

Bases: SGN

Stochastic Gauss-Newton methods. BFGS formula is applied to guarantee the positive definiteness of approximated Hessian matrix. Note: I recommend small learning rate

refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).

step(t, x, f_g_J, idx)

class sgGWR.optimizers.second.SGN_LM(learning_rate0=1.0, lam=0, lam_LM0=1.0, boost=1.01, drop=0.99, eps=0.25, tau=0.001)

Bases: SGN

Stochastic Gauss-Newton methods with damping of Levenberg-Marquardt (LM) method. The damping improves the stability. This algorithm refered to as “SMW-GN” in Ren and Goldfarb (2019) because they used Sherman-Morrison-Woodbury(SMW) formula. SMW formula is not used in this implementation because we assume that the number of parameters is not far larger than the mini-batch size.

refernces: Ren, Y., & Goldfarb, D. (2019). Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks. https://arxiv.org/abs/1906.02353v1

step(t, x, f_g_J, idx)

sgGWR.optimizers.sg module

Optimizers using stochastic gradients.

class sgGWR.optimizers.sg.ASGD(learning_rate0=1.0, lam=0.0001)

Bases: SGD

Avereaged Stochastic Gradient Descent Algorithm

reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16

lr_schedule(t)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.sg.Adam(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)

Bases: SGD

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.sg.SGD(learning_rate0=1.0, lam=0.0001)

Bases: object

Stochastic Gradient Descent Algorithm

reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16

lr_schedule(t)

run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.sg.SGDarmijo(learning_rate0=1.0, c=0.5, ls_decay=0.5, reset_decay=2.0, search_from_lr0=False)

Bases: SGD

Stochastic Gradient Descent Algorithm with Armijo Line-search

reference: Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., & Lacoste-Julien, S. (2019). Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32.

armijo_cond(f, f0, x, grads, g2, lr)

armijo_search(x, grads, lr, f)

lr_schedule(t)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.sg.Yogi(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)

Bases: Adam

step(t, x, f, g, f_and_g, idx)

sgGWR.optimizers.sg_numpy module

Optimizers using stochastic gradients.

class sgGWR.optimizers.sg_numpy.ASGD(learning_rate0=0.1, lam=0.0001)

Bases: SGD

Avereaged Stochastic Gradient Descent Algorithm

reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16

lr_schedule(t)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.sg_numpy.SGD(learning_rate0=0.1, lam=0.0001)

Bases: object

Stochastic Gradient Descent Algorithm

reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16

lr_schedule(t)

run(model, maxiter=1000, batchsize=100, rng=Generator(PCG64) at 0x7F2794622960, tol=0.001, n_iter_no_change=100, verbose=True)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.sg_numpy.SGDarmijo(learning_rate0=1.0, c=0.5, ls_decay=0.5, reset_decay=2.0, search_from_lr0=False)

Bases: SGD

Stochastic Gradient Descent Algorithm with Armijo Line-search

reference: Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., & Lacoste-Julien, S. (2019). Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32.

armijo_cond(f, f0, x, grads, g2, lr)

armijo_search(x, grads, lr, f)

lr_schedule(t)

step(t, x, f, g, f_and_g, idx)

sgGWR.optimizers.vr module

Optimizers using variance reduced stochastic gradients. They are recommended if you require high-accuracy optimization.

class sgGWR.optimizers.vr.KatyushaXs(learning_rate0=0.1, lam=0, neg_moment=0.1)

Bases: SVRG

KatyushaXs algorithm

refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.

class sgGWR.optimizers.vr.KatyushaXw(learning_rate0=0.1, lam=0)

Bases: SVRG

KatyushaXw algorithm

refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.

class sgGWR.optimizers.vr.SVRG(learning_rate0=0.1, lam=0)

Bases: SGD

Stochastic Variance Reduced Gradient with mini-batch

refernces: Johnson, R., & Zhang, T. (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In Advances in Neural Information Processing Systems (Vol. 26).

Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.

batch_step(f_step, x0, idxs)

run(model, max_epoch=100, batchsize=None, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=5, min_epoch=5, verbose=True, lax_scan=True)

step(t, x, f, g, f_and_g, idx)

Module contents

class sgGWR.optimizers.ASGD(learning_rate0=1.0, lam=0.0001)

Bases: SGD

Avereaged Stochastic Gradient Descent Algorithm

reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16

lr_schedule(t)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.Adam(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)

Bases: SGD

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.KatyushaXs(learning_rate0=0.1, lam=0, neg_moment=0.1)

Bases: SVRG

KatyushaXs algorithm

refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.

class sgGWR.optimizers.KatyushaXw(learning_rate0=0.1, lam=0)

Bases: SVRG

KatyushaXw algorithm

refernces: Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.

class sgGWR.optimizers.SGD(learning_rate0=1.0, lam=0.0001)

Bases: object

Stochastic Gradient Descent Algorithm

reference: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers, 177–186. https://doi.org/10.1007/978-3-7908-2604-3_16

lr_schedule(t)

run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.SGDarmijo(learning_rate0=1.0, c=0.5, ls_decay=0.5, reset_decay=2.0, search_from_lr0=False)

Bases: SGD

Stochastic Gradient Descent Algorithm with Armijo Line-search

reference: Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., & Lacoste-Julien, S. (2019). Painless stochastic gradient: Interpolation, line-search, and convergence rates. Advances in neural information processing systems, 32.

armijo_cond(f, f0, x, grads, g2, lr)

armijo_search(x, grads, lr, f)

lr_schedule(t)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.SGN(learning_rate0=1.0, lam=0)

Bases: SGD

Stochastic Gauss-Newton methods. Small value is added to approximated Hessian to guarantee its positive definitness.

refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).

run(model, maxiter=1000, batchsize=100, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)

step(t, x, f_g_J, idx)

class sgGWR.optimizers.SGN_BFGS(learning_rate0=1.0, lam=0)

Bases: SGN

Stochastic Gauss-Newton methods. BFGS formula is applied to guarantee the positive definiteness of approximated Hessian matrix. Note: I recommend small learning rate

refernces: Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. In SIAM Review (Vol. 60, Issue 2, pp. 223–311).

step(t, x, f_g_J, idx)

class sgGWR.optimizers.SGN_LM(learning_rate0=1.0, lam=0, lam_LM0=1.0, boost=1.01, drop=0.99, eps=0.25, tau=0.001)

Bases: SGN

Stochastic Gauss-Newton methods with damping of Levenberg-Marquardt (LM) method. The damping improves the stability. This algorithm refered to as “SMW-GN” in Ren and Goldfarb (2019) because they used Sherman-Morrison-Woodbury(SMW) formula. SMW formula is not used in this implementation because we assume that the number of parameters is not far larger than the mini-batch size.

refernces: Ren, Y., & Goldfarb, D. (2019). Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks. https://arxiv.org/abs/1906.02353v1

step(t, x, f_g_J, idx)

class sgGWR.optimizers.SVRG(learning_rate0=0.1, lam=0)

Bases: SGD

Stochastic Variance Reduced Gradient with mini-batch

refernces: Johnson, R., & Zhang, T. (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In Advances in Neural Information Processing Systems (Vol. 26).

Allen-Zhu, Z. (2018). Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. 35th International Conference on Machine Learning, ICML 2018, 1, 284–290.

batch_step(f_step, x0, idxs)

run(model, max_epoch=100, batchsize=None, PRNGkey=Array([0, 123], dtype=uint32), diff_mode='manual', tol=0.001, n_iter_no_change=5, min_epoch=5, verbose=True, lax_scan=True)

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.Yogi(learning_rate0=1.0, b1=0.9, b2=0.999, eps=1e-08, correct_bias=True)

Bases: Adam

step(t, x, f, g, f_and_g, idx)

class sgGWR.optimizers.golden_section

Bases: object

run(model, maxiter=1000, bracket=None, tol=1e-05, aicc=False, verbose=True)

class sgGWR.optimizers.optax_optimizer(optax_optim=(<function chain.<locals>.init_fn>, <function chain.<locals>.update_fn>))

Bases: object

run(model, maxiter=1000, batchsize=100, PRNGkey=None, diff_mode='manual', tol=0.001, n_iter_no_change=100, verbose=True)

class sgGWR.optimizers.scipy_L_BFGS_B

Bases: object

same setting to the ‘scgwr’ package in R see: https://github.com/cran/scgwr/blob/master/R/scgwr.R

run(model, diff_mode='auto', tol=None, kwargs_minimize={})