L1 norm (稀疏编码的优化原理)
L1 norm和L2 norm的原理及其区别,以及其对优化问题的 不同含义Logistic Regression Problem米 Will be explained by evank all we need to know for now is that we areagain trying to find a vector x which willminimize a loss functionProblem with lsP/lrP?k When the number of observations ortraining examples m is not large enoughcompared to the number of featurevariables n, over-fitting may occur* Tends to occur when large weights arefound in xi What can we do to prevent over-fitting?* k Use L2-reqularizationk RegularizationMinimize: (LOSS Function )+(regularization term)3L2-Regularization米 Regularization termk2>0 is the regularization parameter米 For lsP, this becomesk MinAFRegularization term restricts large value componentsSpecial case of Tikhonov regularizationCan be computed directly(o(n )iterative methods(enjugate gradientsmethod)米 For lre, this becomesk Minimize lavg (v, x)+alxak Smooth and convex, can be solved using gradientdescent, steepest descent, Newton, quasi-Newtontruncated newton cg methods1-RegularizationRegularization term: ill*LSP:|4x-y+Fx+邓* LRP: lav (v,x)+all* The regularization term penalizes all factors equallyk this makes the x SParse*k a sparse x means reduced complexitk Can be viewed as a selection of relevant/importantfeaturesNon- differentiable -> harder problemk Can transform into convex quadratic problem* minimize|4x-+1+2∑subject to-1≤x≤u,i=1,nk and use standard convex optimization methods to solvebut these usually cannot handle large practical problemsEffects of L1-Regularizationground triunregularized solution(n=0)h=0.05A=0201-Regularization米 Usagek Signal Processing米 basis pursuit* compressed sensing米 signal recover亲 wavelet thresholdingk statistics米 Lasso algorithm米 fused lassok Others**decoding linear codes米 geophysics problems* k maximum likelihood estimation5L1-Regularization米 Regularization path* Family of solutions over入=(0,∞)水 Piecewise linear*k Path-following methods slow for large-scaleproblemsx Truncated Newton Interior-pointmethod (for L1-regularzied LSPs米 Initialize.t:=1/max20入;,x:=0,u:=1米 Repeat*k Compute search direction(AX, Au) using truncatedNewton methodCompute step size s by backtracking line searchUpdate the iterate by (x, u): =(x, u)+s(AX, Au)k Setx=xConstruct dual feasible point(v,u)Evaluate duality gapn*Qutn/G(v≤8k Update tEfficiency of TNIPMI1-nagic21020rumber cf featuresFignre 3: Runtime nf the truncated Newton interin" -point method (TNIPM) MnSEKPDCO-CHOL(abbreviated as ClloL), PDCD-LEQE(abbreviated as LSQR), and:1-magicfor 10 randomly generated sparse problems, with the regularization parameter A0.1AmaxRuntime (inINIPM·⊥N:O(n12)MUSEK141·MLSK:O(n:25)PDCO-CHOL109PDCO(Cholesky): O(n)11-magi·PDC0( LSQR:O(n29motopy11.3·1- magiC:O(n1Summaryi L2-Regression suppresses over-fitting* L2-Regression does not add too much complexityto existing problems-> easy to calculatei L1-Regression creates sparse answers, andbetter approximations in relevant casesi L1-Regression problems are not differentiable->need other ways of solving problem(usingconvex optimization techniques, iterativeapproaches, etc.)7
用户评论