带有方差减小的加权零阶随机梯度下降算法

doi:10.3969/j.issn.1000-1565.2019.05.015

摘要/Abstract

摘要： 随机梯度下降(stochastic gradient descent,SGD)算法是机器学习问题中的高效求解方法之一.但是,对于非平衡数据,传统的随机梯度下降算法,在训练时多数类点被抽到的概率远大于少数类点,易导致计算不平衡;对于目标函数不可导或不易求导的问题,计算代价太大或无法进行计算;在每次迭代中利用单个样本梯度近似代替全梯度,这必然会产生方差,严重影响算法的分类性能.针对上述问题,提出了带有方差减小的加权零阶随机梯度下降算法,考虑了数据的间隔分布情况,在目标函数中引入了间隔均值项,并对多数类样例赋予了较小的权值,对少数类样例赋予较大的权值.在对优化问题的求解中,采用零阶优化的方法对梯度进行估计,并且引入了方差减小策略.在一些非平衡数据集上的实验验证了所提算法的有效性,并有效解决了上述问题.

关键词: 零阶优化, 随机梯度下降, 方差减小, 非平衡数据, 支持向量机

Abstract: Stochastic gradient descent(SGD)is one of the most efficient methods for solving machine learning problems.However, for imbalanced data, the probability of most class points being extracted is much higher than that of a few class points, which easily leads to unbalanced calculation.When it is non-differentiable or not easy to take the derivative on the objective function, the calculation cost is too large or cannot be calculated.A single sample gradient is used to approximately replace the full gradient in each iteration, which will inevitably produce variance.This seriously affects the classification performance of the algorithm.In order to solve the above problems, the weighted zeroth-order stochastic gradient descent algorithm with variance reduction is presented.Considering the margin distribution of data, the margin mean term is introduced into the objective function, which gives smaller weights to most class of samples, and a larger weight to a few classes of samples.In solving the optimization problem, the zeroth-order optimization method is used to estimate the gradient and the variance reduction strategy is introduced.Experiments- DOI:10.3969/j.issn.1000-1565.2019.05.015带有方差减小的加权零阶随机梯度下降算法鲁淑霞¹,张罗幻¹, 蔡莲香¹,孙丽丽²(1.河北大学数学与信息科学学院,河北省机器学习与计算机智能重点实验室,河北保定 071002;2.河北省教育考试院,河北石家庄050091)摘要:随机梯度下降(stochastic gradient descent,SGD)算法是机器学习问题中的高效求解方法之一.但是,对于非平衡数据,传统的随机梯度下降算法,在训练时多数类点被抽到的概率远大于少数类点,易导致计算不平衡;对于目标函数不可导或不易求导的问题,计算代价太大或无法进行计算;在每次迭代中利用单个样本梯度近似代替全梯度,这必然会产生方差,严重影响算法的分类性能.针对上述问题,提出了带有方差减小的加权零阶随机梯度下降算法,考虑了数据的间隔分布情况,在目标函数中引入了间隔均值项,并对多数类样例赋予了较小的权值,对少数类样例赋予较大的权值.在对优化问题的求解中,采用零阶优化的方法对梯度进行估计,并且引入了方差减小策略.在一些非平衡数据集上的实验验证了所提算法的有效性,并有效解决了上述问题.关键词:零阶优化;随机梯度下降;方差减小;非平衡数据;支持向量机中图分类号:TP181 文献标志码:A 文章编号:1000-1565(2019)05-0536-11Weighted zeroth-order stochastic gradient descentalgorithm with variance reductionLU Shuxia¹, ZHANG Luohuan¹, CAI Lianxiang¹, SUN Lili²(1.Key Laboratory of Machine Learning and Computational Intelligence of Hebei Province, College of Mathematics and Information Science, Hebei University, Baoding 071002,China; 2.Hebei Education Examinations Authority, Shijiazhuang 050091,China)Abstract: Stochastic gradient descent(SGD)is one of the most efficient methods for solving machine learning problems.However, for imbalanced data, the probability of most class points being extracted is much higher than that of a few class points, which easily leads to unbalanced calculation.When it is non-differentiable or not easy to take the derivative on the objective function, the calculation cost is too large or cannot be calculated.A single sample gradient is used to approximately replace the full gradient in each iteration, which will inevitably produce variance.This seriously affects the classification performance of the algorithm.In order to solve the above problems, the weighted zeroth-order stochastic gradient descent algorithm with variance reduction is presented.Considering the margin distribution of data, the margin mean term is introduced into the objective function, which gives smaller weights to most class of samples, and a larger weight to a few classes of samples.In solving the optimization problem, the zeroth-order optimization method is used to estimate the gradient and the variance reduction strategy is introduced.Experiments- 收稿日期:2019-01-06 基金项目:河北省自然科学基金资助项目(F2015201185) 第一作者:鲁淑霞(1966—),女,河北保定人,河北大学教授,博士,主要从事机器学习方向研究. E-mail:cmclusx@126.com第5期鲁淑霞等:带有方差减小的加权零阶随机梯度下降算法on some imbalanced datasets demonstrate the effectiveness of the proposed algorithm which effectively solve the above problems.

Key words: zeroth-order optimization, stochastic gradient descent, variance reduction, imbalanced data set, support vector machine

中图分类号:

TP181

鲁淑霞,张罗幻, 蔡莲香,孙丽丽. 带有方差减小的加权零阶随机梯度下降算法[J]. 河北大学学报(自然科学版), 2019, 39(5): 536-546.

LU Shuxia, ZHANG Luohuan, CAI Lianxiang, SUN Lili. Weighted zeroth-order stochastic gradient descent algorithm with variance reduction[J]. Journal of Hebei University (Natural Science Edition), 2019, 39(5): 536-546.

参考文献

[1] CORTESC, VAPNIK V.Support vector networks[J].Machine Learning,1995,20(3): 273-297.DOI:10.1007/BF00994018.
[2] SHALEV-SHWARTS S, SINGER Y.Pegasos: primal estimated sub-gradient solver for SVM[J].Mathematical Programming, 2011, 127(1): 3-30.DOI:10.1145/127349.1273598.
[3] JOHNSON R, ZHANG T. Accelerating stochastic gradient descent using predictive variance reduction[Z]. The 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, United States, 2013.
[4] LEI L H, JU C, CHEN J B, et al. Nonconvex finite sum optimization via SCSG methods [Z]. The 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA,2017.
[5] ALLEN-ZHU. Natasha: Faster non-convex optimization than SGD[ED/OL].(2017-08-29)[2018-10-10].https://arxiv.org/pdf/1708.08694.pdf.
[6] LI J Y, WU C Z, WU Z Y, et al.Gradient-free method for nonsmooth distributed optimization[J].Journal of Global Optimization,2015, 61(2): 325-340.DOI:10.1007/s10898-014-0174-2.
[7] NESTEROV Y.Random gradient-free minimization of convex function[J].Foundations of Computational Mathematics, 2017, 17(2): 527-566.DOI:10.1007/s10208-015-9296-2.
[8] LIU L, CHENG M H,HSIEH CJ,et al.Stochastic zeroth-order optimization via variance reduction method[EB/OL].[2018-10-11].https://arxiv.org/pdf/1805.11811.pdf.2018-08-02.
[9] LIU S.Zeroth-order stochastic variance reduction for nonconvex optimization[EB/OL].[2018-10-20].https://arxiv.org/pdf/1805.10367.pdf.2018-06-07.
[10] GU B, HUO Z Y, HUANG H.Zeroth-order asynchronous doubly stochastic algorithm with variance reduction[EB/OL].[2018-06-12].https://arxiv.org/pdf/1612.01425.pdf.2016-12-05.
[11] BAO L, CAO J, LI J T, et al.Boosted near-miss under-sampling on SVM ensembles for concept detection in large-scale imbalanceddatasets[J].Neurocomputing, 2016,172(C): 198-206.DOI:10.1016/j.neucom.2014.05.096.
[12] ZHU T F, LIN Y P, LIU Y H.Synthetic minority oversampling technique for multiclass imbalance problems[J].Pattern Recognition, 2017, 72: 327-340.DOI:10.1016/j.patcog.2017.07.024.
[13] 周宇航,周志华.代价敏感大间隔分布学习机[J].计算机研究与发展, 2016, 53(9): 1964-1970.DOI:10.7544/issn1000-1239.2016.20150436.
[14] CHENG F Y, ZHANG J, WEN C H, et al.Large cost-sensitive margin distribution machine for imbalanced data classification[J].Neurocomputing, 2016, 114(8): 45-57.DOI:10.1016/j.neucom.2016.10.053.
[15] DING S Y, MIRAZ B, LIN Z P, et al.Kernel based online learning for imbalance multiclass classification[J].Neurocomputing, 2018, 277: 139-148.DOI:10.1016/j.neucom.2017.02.102.
[16] VAIRAGADE M. KEEL: A software tool to assess evolutionay algorithmms for data mining problems[EB/OL].[2018-12-20].http://www.keel.es/.2003-10-17.