今日学术视野(2015.9.3)

• [cs.AI]Value function approximation via low-rank models

• [cs.CV]Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization
• [cs.CV]Iterative hypothesis testing for multi-object tracking in presence of features with variable reliability
• [cs.CV]Robust Face Recognition via Multimodal Deep Face Representation
• [cs.CY]Echo chambers in the age of misinformation
• [cs.DC]Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices
• [cs.LG]Differentially Private Online Learning for Video Recommendation with Social Big Data over Media Cloud
• [cs.LG]Fingerprinting-Based Positioning in Distributed Massive MIMO Systems
• [cs.NE]A Telescopic Binary Learning Machine for Training Neural Networks
• [cs.NE]Continuous parameter working memory in a balanced chaotic neural network
• [cs.NE]Evolving Unipolar Memristor Spiking Neural Networks
• [cs.NE]Pure and Hybrid Evolutionary Computing in Global Optimization of Chemical Structures: from Atoms and Molecules to Clusters and Crystals
• [cs.SI]Community detection in networks with unequal groups
• [cs.SI]Strong Localization in Personalized PageRank Vectors
• [cs.SI]Visualizing signatures of human activity in cities across the globe
• [math.ST]A breakpoint detection error function for segmentation model selection and evaluation
• [math.ST]Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods
• [math.ST]Estimation of matrices with row sparsity
• [math.ST]Scalable Computation of Regularized Precision Matrices via Stochastic Optimization
• [math.ST]Segmentation of functional-biased series by a Bayesian approach
• [stat.AP]Band Depth Clustering for Nonstationary Time Series and Wind Speed Behavior
• [stat.AP]Bayesian Models for Heterogeneous Personalized Health Data
• [stat.ML]Adaptive Smoothing Algorithms for Nonsmooth Composite Convex Minimization
• [stat.ML]Learning A Task-Specific Deep Architecture For Clustering
• [stat.ML]Learning Deep ℓ0 Encoders
• [stat.ML]Multi-Sensor Slope Change Detection
• [stat.ML]Online Supervised Subspace Tracking
• [stat.ML]Sequential Information Guided Sensing
• [stat.ML]Tumor Motion Tracking in Liver Ultrasound Images Using Mean Shift and Active Contour

·····································

• [cs.AI]Value function approximation via low-rank models
Hao Yi Ong
//arxiv.org/abs/1509.00061v1

We propose a novel value function approximation technique for Markov decision processes. We consider the problem of compactly representing the state-action value function using a low-rank and sparse matrix model. The problem is to decompose a matrix that encodes the true value function into low-rank and sparse components, and we achieve this using Robust Principal Component Analysis (PCA). Under minimal assumptions, this Robust PCA problem can be solved exactly via the Principal Component Pursuit convex optimization problem. We experiment the procedure on several examples and demonstrate that our method yields approximations essentially identical to the true function.

• [cs.CV]Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization
Tae-Hyun Oh, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
//arxiv.org/abs/1509.00296v1

Rank minimization can be boiled down to tractable surrogate problems, such as Nuclear Norm Minimization (NNM) and Weighted NNM (WNNM). The problems related to NNM (or WNNM) can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT) (or Weighted SVT), but they suffer from high computational cost of computing Singular Value Decomposition (SVD) at each iteration. We propose a fast and accurate approximation method for SVT, that we call fast randomized SVT (FRSVT), where we avoid direct computation of SVD. The key idea is to extract an approximate basis for the range of a matrix from its compressed matrix. Given the basis, we compute the partial singular values of the original matrix from a small factored matrix. In addition, by adopting a range propagation technique, our method further speeds up the extraction of approximate basis at each iteration. Our theoretical analysis shows the relationship between the approximation bound of SVD and its effect to NNM via SVT. Along with the analysis, our empirical results quantitatively and qualitatively show that our approximation rarely harms the convergence of the host algorithms. We assess the efficiency and accuracy of our method on various vision problems, e.g., subspace clustering, weather artifact removal, and simultaneous multi-image alignment and rectification.

• [cs.CV]Iterative hypothesis testing for multi-object tracking in presence of features with variable reliability
Amit Kumar K. C., Damien Delannay, Christophe De Vleeschouwer
//arxiv.org/abs/1509.00313v1

This paper assumes prior detections of multiple targets at each time instant, and uses a graph-based approach to connect those detections across time, based on their position and appearance estimates. In contrast to most earlier works in the field, our framework has been designed to exploit the appearance features, even when they are only sporadically available, or affected by a non-stationary noise, along the sequence of detections. This is done by implementing an iterative hypothesis testing strategy to progressively aggregate the detections into short trajectories, named tracklets. Specifically, each iteration considers a node, named key-node, and investigates how to link this key-node with other nodes in its neighborhood, under the assumption that the target appearance is defined by the key-node appearance estimate. This is done through shortest path computation in a temporal neighborhood of the key-node. The approach is conservative in that it only aggregates the shortest paths that are sufficiently better compared to alternative paths. It is also multi-scale in that the size of the investigated neighborhood is increased proportionally to the number of detections already aggregated into the key-node. The multi-scale nature of the process and the progressive relaxation of its conservativeness makes it both computationally efficient and effective. Experimental validations are performed extensively on a toy example, a 15 minutes long multi-view basketball dataset, and other monocular pedestrian datasets.

• [cs.CV]Robust Face Recognition via Multimodal Deep Face Representation
Changxing Ding, Dacheng Tao
//arxiv.org/abs/1509.00244v1

Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefited from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.

• [cs.CY]Echo chambers in the age of misinformation
Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, Walter Quattrociocchi
//arxiv.org/abs/1509.00189v1

The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. Despite the enthusiastic rhetoric on the part of some that this process generates “collective intelligence”, the WWW also allows the rapid dissemination of unsubstantiated conspiracy theories that often elicite rapid, large, but naive social responses such as the recent case of Jade Helm 15 – where a simple military exercise turned out to be perceived as the beginning of the civil war in the US. We study how Facebook users consume information related to two different kinds of narrative: scientific and conspiracy news. We find that although consumers of scientific and conspiracy stories present similar consumption patterns with respect to content, the sizes of the spreading cascades differ. Homogeneity appears to be the primary driver for the diffusion of contents, but each echo chamber has its own cascade dynamics. To mimic these dynamics, we introduce a data-driven percolation model on signed networks.

• [cs.DC]Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices
Justus A. Calvin, Cannada A. Lewis, Edward F. Valeev
//arxiv.org/abs/1509.00309v1

A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization.Scalability of iterative computation of square-root inverse of block-rank-sparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).

• [cs.LG]Differentially Private Online Learning for Video Recommendation with Social Big Data over Media Cloud
Pan Zhou, Yingxue Zhou, Dapeng Wu, Hai Jin
//arxiv.org/abs/1509.00181v1

With the rapid growth in multimedia services and the enormous offers of video contents in online social networks, users have difficulty in obtaining their interests. Therefore, various personalized recommendation systems have been proposed. However, they ignore that the accelerated proliferation of social media data has led to the big data era, which has greatly impeded the process of video recommendation. In addition, none of them has considered both the privacy of users' contexts (e,g,. social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value. To handle the problems, we propose a cloud-assisted differentially private video recommendation system based on distributed online learning. In our framework, service vendors are modeled as distributed cooperative learners, recommending videos according to user’s context, while simultaneously adapting the video-selection strategy based on user-click feedback to maximize total user clicks (reward). Considering the sparsity and heterogeneity of big social media data, we also propose a novel \emph{geometric differentially private} model, which can greatly reduce the performance (recommendation accuracy) loss. Our simulation shows the proposed algorithms outperform other existing methods and keep a delicate balance between computing accuracy and privacy preserving level.

• [cs.LG]Fingerprinting-Based Positioning in Distributed Massive MIMO Systems
Vladimir Savic, Erik G. Larsson
//arxiv.org/abs/1509.00202v1

Location awareness in wireless networks may enable many applications such as emergency services, autonomous driving and geographic routing. Although there are many available positioning techniques, none of them is adapted to work with massive multiple-in-multiple-out (MIMO) systems, which represent a leading 5G technology candidate. In this paper, we discuss possible solutions for positioning of mobile stations using a vector of signals at the base station, equipped with many antennas distributed over deployment area. Our main proposal is to use fingerprinting techniques based on a vector of received signal strengths. This kind of methods are able to work in highly-cluttered multipath environments, and require just one base station, in contrast to standard range-based and angle-based techniques. We also provide a solution for fingerprinting-based positioning based on Gaussian process regression, and discuss main applications and challenges.

• [cs.NE]A Telescopic Binary Learning Machine for Training Neural Networks
Mauro Brunato, Roberto Battiti
//arxiv.org/abs/1509.00174v1

This paper proposes a new algorithm based on multi-scale stochastic local search with binary representation for training neural networks. In particular, we study the effects of neighborhood evaluation strategies, the effect of the number of bits per weight and that of the maximum weight range used for mapping binary strings to real values. Following this preliminary investigation, we propose a telescopic multi-scale version of local search where the number of bits is increased in an adaptive manner, leading to a faster search and to local minima of better quality. An analysis related to adapting the number of bits in a dynamic way is also presented. The control on the number of bits, which happens in a natural manner in the proposed method, is effective to increase the generalization performance. Benchmark tasks include a highly non-linear artificial problem, a control problem requiring either feed-forward or recurrent architectures for feedback control, and challenging real-world tasks in different application domains. The results demonstrate the effectiveness of the proposed method.

• [cs.NE]Continuous parameter working memory in a balanced chaotic neural network
Nimrod Shaham, Yoram Burak
//arxiv.org/abs/1508.06944v2

Working memory, the ability to maintain and use information for several seconds, is central to many functions of the brain. In the context continuous variables, an important theoretical model of working memory is based on neural networks, whose dynamics possess a continuum of marginally stable steady states. It has been unclear whether this theoretical idea is compatible with one of the main proposals for the architecture of cortical circuits, the balanced network. Here we study a network with random connectivity which generates a balanced state. We find an architecture for which the network has a continuum of balanced states, in the limit of many neurons and many synapses per neuron. Finite networks can sustain slow dynamics in a certain direction in the mean activities space, but the chaotic dynamics drive diffusive motion along the line attractor, which gradually degrades the stored memory. We analyse the coefficient of diffusion along the attractor, and show that it scales inversely with the system size. For a large enough (but realistic) network size, and with suitable tuning of the network connections, it is possible to obtain persistence over time intervals which are larger by several orders of magnitude than the single neuron time scale.

• [cs.NE]Evolving Unipolar Memristor Spiking Neural Networks
David Howard, Larry Bull, Ben De Lacy Costello
//arxiv.org/abs/1509.00105v1

Neuromorphic computing — brainlike computing in hardware — typically requires myriad CMOS spiking neurons interconnected by a dense mesh of nanoscale plastic synapses. Memristors are frequently citepd as strong synapse candidates due to their statefulness and potential for low-power implementations. To date, plentiful research has focused on the bipolar memristor synapse, which is capable of incremental weight alterations and can provide adaptive self-organisation under a Hebbian learning scheme. In this paper we consider the Unipolar memristor synapse — a device capable of non-Hebbian switching between only two states (conductive and resistive) through application of a suitable input voltage — and discuss its suitability for neuromorphic systems. A self-adaptive evolutionary process is used to autonomously find highly fit network configurations. Experimentation on a two robotics tasks shows that unipolar memristor networks evolve task-solving controllers faster than both bipolar memristor networks and networks containing constant nonplastic connections whilst performing at least comparably.

• [cs.NE]Pure and Hybrid Evolutionary Computing in Global Optimization of Chemical Structures: from Atoms and Molecules to Clusters and Crystals
Kanchan Sarkar, S. P. Bhattacharyya
//arxiv.org/abs/1509.00028v1

The growth of evolutionary computing (EC) methods in the exploration of complex potential energy landscapes of atomic and molecular clusters, as well as crystals over the last decade or so is reviewed. The trend of growth indicates that pure as well as hybrid evolutionary computing techniques in conjunction of DFT has been emerging as a powerful tool, although work on molecular clusters has been rather limited so far. Some attempts to solve the atomic/molecular Schrodinger Equation (SE) directly by genetic algorithms (GA) are available in literature. At the Born-Oppenheimer level of approximation GA-density methods appear to be a viable tool which could be more extensively explored in the coming years, specially in the context of designing molecules and materials with targeted properties.

• [cs.SI]Community detection in networks with unequal groups
Pan Zhang, Cristopher Moore, M. E. J. Newman
//arxiv.org/abs/1509.00107v1

Recently, a phase transition has been discovered in the network community detection problem below which no algorithm can tell which nodes belong to which communities with success any better than a random guess. This result has, however, so far been limited to the case where the communities have the same size or the same average degree. Here we consider the case where the sizes or average degrees are different. This asymmetry allows us to assign nodes to communities with better-than- random success by examining their their local neighborhoods. Using the cavity method, we show that this removes the detectability transition completely for networks with four groups or fewer, while for more than four groups the transition persists up to a critical amount of asymmetry but not beyond. The critical point in the latter case coincides with the point at which local information percolates, causing a global transition from a less-accurate solution to a more-accurate one.

• [cs.SI]Strong Localization in Personalized PageRank Vectors
Huda Nassar, Kyle Kloster, David F. Gleich
//arxiv.org/abs/1509.00016v1

The personalized PageRank diffusion is a fundamental tool in network analysis tasks like community detection and link prediction. This tool models the spread of a quantity from a small, initial set of seed nodes, and has long been observed to stay localized near this seed set. We derive a sublinear upper-bound on the number of nonzeros necessary to approximate a personalized PageRank vector on a power-law graph. Our experimental results on power-law graphs with a wide variety of parameter settings demonstrate that the bound is loose, and instead supports a new conjectured bound.

• [cs.SI]Visualizing signatures of human activity in cities across the globe
Dániel Kondor, Pierrick Thebault, Sebastian Grauwin, István Gódor, Simon Moritz, Stanislav Sobolevsky, Carlo Ratti
//arxiv.org/abs/1509.00459v1

The availability of big data on human activity is currently changing the way we look at our surroundings. With the high penetration of mobile phones, nearly everyone is already carrying a high-precision sensor providing an opportunity to monitor and analyze the dynamics of human movement on unprecedented scales. In this article, we present a technique and visualization tool which uses aggregated activity measures of mobile networks to gain information about human activity shaping the structure of the cities. Based on ten months of mobile network data, activity patterns can be compared through time and space to unravel the “city’s pulse” as seen through the specific signatures of different locations. Furthermore, the tool allows classifying the neighborhoods into functional clusters based on the timeline of human activity, providing valuable insights on the actual land use patterns within the city. This way, the approach and the tool provide new ways of looking at the city structure from historical perspective and potentially also in real-time based on dynamic up-to-date records of human behavior. The online tool presents results for four global cities: New York, London, Hong Kong and Los Angeles.

• [math.ST]A breakpoint detection error function for segmentation model selection and evaluation
Toby Dylan Hocking
//arxiv.org/abs/1509.00368v1

We consider the multiple breakpoint detection problem, which is concerned with detecting the locations of several distinct changes in a one-dimensional noisy data series. We propose the breakpointError, a function that can be used to evaluate estimated breakpoint locations, given the known locations of true breakpoints. We discuss an application of the breakpointError for finding optimal penalties for breakpoint detection in simulated data. Finally, we show how to relax the breakpointError to obtain an annotation error function which can be used more readily in practice on real data. A fast C implementation of an algorithm that computes the breakpointError is available in an R package on R-Forge.

• [math.ST]Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods
Chris Sherlock, Andrew Golightly, Daniel A. Henderson
//arxiv.org/abs/1509.00172v1

When conducting Bayesian inference, delayed acceptance (DA) Metropolis-Hastings (MH) algorithms and DA pseudo-marginal MH algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased estimate thereof, but a computationally cheap approximation is available. A first accept-reject stage is applied, with the cheap approximation substituted for the true posterior in the MH acceptance ratio. Only for those proposals which pass through the first stage is the computationally expensive true posterior (or unbiased estimate thereof) evaluated, with a second accept-reject stage ensuring that detailed balance is satisfied with respect to the intended true posterior. In some scenarios there is no obvious computationally cheap approximation. A weighted average of previous evaluations of the computationally expensive posterior provides a generic approximation to the posterior. If only the $k$-nearest neighbours have non-zero weights then evaluation of the approximate posterior can be made computationally cheap provided that the points at which the posterior has been evaluated are stored in a multi-dimensional binary tree, known as a KD-tree. The contents of the KD-tree are potentially updated after every computationally intensive evaluation. The resulting adaptive, delayed-acceptance [pseudo-marginal] Metropolis-Hastings algorithm is justified both theoretically and empirically. Guidance on tuning parameters is provided and the methodology is applied to a discretely observed Markov jump process characterising predator-prey interactions and an ODE system describing the dynamics of an autoregulatory gene network.

• [math.ST]Estimation of matrices with row sparsity
O. Klopp, A. B. Tsybakov
//arxiv.org/abs/1509.00319v1

An increasing number of applications is concerned with recovering a sparse matrix from noisy observations. In this paper, we consider the setting where each row of the unknown matrix is sparse. We establish minimax optimal rates of convergence for estimating matrices with row sparsity. A major focus in the present paper is on the derivation of lower bounds.

• [math.ST]Scalable Computation of Regularized Precision Matrices via Stochastic Optimization
Yves F. Atchadé, Rahul Mazumder, Jie Chen
//arxiv.org/abs/1509.00426v1

We consider the problem of computing a positive definite $p \times p$ inverse covariance matrix aka precision matrix $\theta=(\theta{ij})$ which optimizes a regularized Gaussian maximum likelihood problem, with the elastic-net regularizer $\sum{i,j=1}^{p} \lambda (\alpha|\theta{ij}| + \frac{1}{2}(1- \alpha) \theta{ij}^2),$ with regularization parameters $\alpha \in [0,1]$ and $\lambda>0$. The associated convex semidefinite optimization problem is notoriously difficult to scale to large problems and has demanded significant attention over the past several years. We propose a new algorithmic framework based on stochastic proximal optimization (on the primal problem) that can be used to obtain near optimal solutions with substantial computational savings over deterministic algorithms. A key challenge of our work stems from the fact that the optimization problem being investigated does not satisfy the usual assumptions required by stochastic gradient methods. Our proposal has (a) computational guarantees and (b) scales well to large problems, even if the solution is not too sparse; thereby, enhancing the scope of regularized maximum likelihood problems to many large-scale problems of contemporary interest. An important aspect of our proposal is to bypass the \emph{deterministic} computation of a matrix inverse by drawing random samples from a suitable multivariate Gaussian distribution.

• [math.ST]Segmentation of functional-biased series by a Bayesian approach
Meili Baragatti, Karine Bertin, Emilie Lebarbier, Cristian Meza
//arxiv.org/abs/1509.00049v1

In this paper, we propose a Bayesian method to segment one-dimensional piecewise constant signals corrupted by a functional part. In our model, the piecewise constant part is expressed as a product of a lower triangular matrix by a sparse vector and the functional bias is represented as a sparse lineal combination of functions from a dictionary. A Stochastic Search Variable Selection approach is used to estimate the sparse vectors, which allows us to obtain estimators for the breakpoints location, the means over the segments and the functional bias. The performance of our method is assessed using simulated data and real geodetic data where GPS coordinate series are affected by undocumented artificial abrupt changes and additionally show prominent periodic variations.

• [stat.AP]Band Depth Clustering for Nonstationary Time Series and Wind Speed Behavior
Laura L. Tupper, David S. Matteson, C. Lindsay Anderson
//arxiv.org/abs/1509.00051v1

We explore the behavior of wind speed over time, using the Eastern Wind Dataset published by the National Renewable Energy Laboratory. This dataset gives wind speeds over three years at hundreds of potential wind farm sites. Wind speed analysis is necessary to the integration of wind energy into the power grid; short-term variability in wind speed affects decisions about usage of other power sources, so that the shape of the wind speed curve becomes as important as the overall level. To assess differences in intra-day time series, we propose a functional distance measure, the band distance, which extends the band depth of Lopez-Pintado and Romo (2009). This measure emphasizes the shape of time series or functional observations relative to other members of a dataset, and allows clustering of observations without reliance on pointwise Euclidean distance. To emphasize short-term variability, we examine the short-time Fourier transform of the nonstationary speed time series; we can also adjust for seasonal effects, and use these standardizations as input for the band distance. We show that these approaches to characterizing the data go beyond mean-dependent standard clustering methods, such as k-means, to provide more shape-influenced cluster representatives useful for power grid decisions.

• [stat.AP]Bayesian Models for Heterogeneous Personalized Health Data
Kai Fan, Allison E. Aiello, Katherine A. Heller
//arxiv.org/abs/1509.00110v1

The purpose of this study is to leverage modern technology (such as mobile or web apps in Beckman et al. (2014)) to enrich epidemiology data and infer the transmission of disease. Homogeneity related research on population level has been intensively studied in previous work. In contrast, we develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) to simultaneously track the spread of infection in a small cell phone community and capture person-specific infection parameters by leveraging a link prior that incorporates additional covariates. We also reexamine the model evolution of the hGCHMM from simple HMMs and LDA, elucidating additional flexibility and interpretability. Due to the non-conjugacy of sparsely coupled HMMs, we design a new approximate distribution, allowing our approach to be more applicable to other application areas. Additionally, we investigate two common link functions, the beta-exponential prior and sigmoid function, both of which allow the development of a principled Bayesian hierarchical framework for disease transmission. The results of our model allow us to predict the probability of infection for each person on each day, and also to infer personal physical vulnerability and the relevant association with covariates. We demonstrate our approach experimentally on both simulation data and real epidemiological records.

• [stat.ML]Adaptive Smoothing Algorithms for Nonsmooth Composite Convex Minimization
Quoc Tran-Dinh
//arxiv.org/abs/1509.00106v1

We propose a novel adaptive smoothing algorithm based on Nesterov’s smoothing technique in \cite{Nesterov2005c} for solving nonsmooth composite convex optimization problems. Our method combines both Nesterov’s accelerated proximal gradient scheme and a new homotopy strategy for smoothness parameter. By an appropriate choice of smoothing functions, we develop a new algorithm that has the $\mathcal{O}\left(\frac{1}{\varepsilon}\right)$ optimal worst-case iteration-complexity while allows one to automatically update the smoothness parameter at each iteration. We then further exploit the structure of problems to select smoothing functions and develop suitable algorithmic variants that reduce the complexity-per-iteration, while preserve the optimal worst-case iteration-complexity. We also specify our algorithm to solve constrained convex optimization problems and show its convergence guarantee on the primal sequence of iterates. Our preliminarily numerical tests verify the efficiency of our algorithms.

• [stat.ML]Learning A Task-Specific Deep Architecture For Clustering
Zhangyang Wang, Shiyu Chang, Jiayu Zhou, Thomas S. Huang
//arxiv.org/abs/1509.00151v1

While deep networks show to be highly effective in extensive applications, few efforts have been spent on studying its potential in clustering. In this paper, we argue that the successful domain expertise of sparse coding in clustering is still valuable, and can be combined with the key ingredients of deep learning. A novel feed-forward architecture, named TAG-LISTA, is constructed from graph-regularized sparse coding. It is then trained with task-specific loss functions from end to end. The inner connections of the proposed network to sparse coding leads to more effective training. Moreover, by introducing auxiliary clustering tasks to the hierarchy of intermediate features, we present DTAG-LISTA and obtain a further performance boost. We demonstrate extensive experiments on several benchmark datasets, under a wide variety of settings. The results verify that the proposed model performs significantly outperforms the generic architectures of the same parameter capacity, and also gains remarkable margins over several state-of-the-art methods.

• [stat.ML]Learning Deep ℓ0 Encoders
Zhangyang Wang, Qing Ling, Thomas S. Huang
//arxiv.org/abs/1509.00153v1

Despite its nonconvex, intractable nature, $\ell0$ sparse approximation is desirable in many theoretical and application cases. We study the $\ell0$ sparse approximation problem with the tool of deep learning, by proposing Deep $\ell0$ Encoders. Two typical forms, the $\ell0$ regularized problem and the $M$-sparse problem, are investigated. Based on solid iterative algorithms, we model them as feed-forward neural networks, through introducing novel neurons and pooling functions. The deep encoders enjoy faster inference, larger learning capacity, and better scalability compared to conventional sparse coding solutions. Furthermore, when applying them to classification and clustering, the models can be conveniently optimized from end to end, using task-driven losses. Numerical results demonstrate the impressive performances of the proposed encoders.

• [stat.ML]Multi-Sensor Slope Change Detection
Yang Cao, Yao Xie, Nagi Gebraeel
//arxiv.org/abs/1509.00114v1

We develop a mixture procedure for multi-sensor systems to monitor parallel streams of data for a change-point that causes a gradual degradation to a subset of data streams. Observations are assumed initially to be normal random variables with known constant means and variances. After a change-point the observations in a subset of the streams of data have increasing or decreasing mean values. The subset and the slope changes are unknown. Our procedure uses a mixture statistics which assumes that each sensor is affected with probability $p_0$. Analytic expressions are obtained for the average run length (ARL) and the expected detection delay (EDD) of the mixture procedure, which are demonstrated to be quite accurate numerically. We establish asymptotic optimality of the mixture procedure. Numerical examples demonstrate the good performance of the proposed procedure. We also discuss an adaptive mixture procedure using empirical Bayes. This paper extends our earlier work on detecting an abrupt change-point that causes a mean-shift, by tackling the challenges posed by the non-stationarity of the slope-change problem.

• [stat.ML]Online Supervised Subspace Tracking
Yao Xie, Ruiyang Song, Hanjun Dai, Qingbin Li, Le Song
//arxiv.org/abs/1509.00137v1

We present a framework for supervised subspace tracking, when there are two time series $xt$ and $yt$, one being the high-dimensional predictors and the other being the response variables and the subspace tracking needs to take into consideration of both sequences. It extends the classic online subspace tracking work which can be viewed as tracking of $x_t$ only. Our online sufficient dimensionality reduction (OSDR) is a meta-algorithm that can be applied to various cases including linear regression, logistic regression, multiple linear regression, multinomial logistic regression, support vector machine, the random dot product model and the multi-scale union-of-subspace model. OSDR reduces data-dimensionality on-the-fly with low-computational complexity and it can also handle missing data and dynamic data. OSDR uses an alternating minimization scheme and updates the subspace via gradient descent on the Grassmannian manifold. The subspace update can be performed efficiently utilizing the fact that the Grassmannian gradient with respect to the subspace in many settings is rank-one (or low-rank in certain cases). The optimization problem for OSDR is non-convex and hard to analyze in general; we provide convergence analysis of OSDR in a simple linear regression setting. The good performance of OSDR compared with the conventional unsupervised subspace tracking are demonstrated via numerical examples on simulated and real data.

• [stat.ML]Sequential Information Guided Sensing
Ruiyang Song, Yao Xie, Sebastian Pokutta
//arxiv.org/abs/1509.00130v1

We study the value of information in sequential compressed sensing by characterizing the performance of sequential information guided sensing in practical scenarios when information is inaccurate. In particular, we assume the signal distribution is parameterized through Gaussian or Gaussian mixtures with estimated mean and covariance matrices, and we can measure compressively through a noisy linear projection or using one-sparse vectors, i.e., observing one entry of the signal each time. We establish a set of performance bounds for the bias and variance of the signal estimator via posterior mean, by capturing the conditional entropy (which is also related to the size of the uncertainty), and the additional power required due to inaccurate information to reach a desired precision. Based on this, we further study how to estimate covariance based on direct samples or covariance sketching. Numerical examples also demonstrate the superior performance of Info-Greedy Sensing algorithms compared with their random and non-adaptive counterparts.

• [stat.ML]Tumor Motion Tracking in Liver Ultrasound Images Using Mean Shift and Active Contour
Jalil Rasekhi
//arxiv.org/abs/1509.00154v1

In this paper we present a new method for motion tracking of tumors in liver ultrasound image sequences. Our algorithm has two main steps. In the first step, we apply mean shift algorithm with multiple features to estimate the center of the target in each frame. Target in the first frame is defined using an ellipse. Edge, texture, and intensity features are extracted from the first frame, and then mean shift algorithm is applied to each feature separately to find the center of ellipse related to that feature in the next frame. The center of ellipse will be the weighted average of these centers. By using mean shift actually we estimate the target movement between two consecutive frames. Once the correct ellipsoid in each frame is known, in the second step we apply the Dynamic Directional Gradient Vector Flow (DDGVF) version of active contour models, in order to find the correct boundary of tumors. We sample a few points on the boundary of active contour then translate those points based on the translation of the center of ellipsoid in two consecutive frames to determine the target movement. We use these translated sample points as an initial guess for active contour in the next frame. Our experimental results show that, the suggested method provides a reliable performance for liver tumor tracking in ultrasound image sequences.