今日学术视野(2015.10.8)

2015年10月8日 05:54 阅读 126

cs.CL - 计算与语言
cs.CV - 机器视觉与模式识别
cs.GR - 计算机图形学
cs.HC - 人机接口
cs.IR - 信息检索
cs.IT - 信息论
cs.LG - 自动学习
cs.SI - 社交网络与信息网络
math.OC - 优化与控制
math.ST - 统计理论
stat.AP - 应用统计
stat.ME - 统计方法论
stat.ML - (统计)机器学习

• [cs.CL]Analyzer and generator for Pali
• [cs.CL]Language Segmentation
• [cs.CL]Rank-frequency relations of phonemes uncover an author-dependency of their usage
• [cs.CV]A Latent Source Model for Patch-Based Image Segmentation
• [cs.CV]Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks
• [cs.CV]Directional Global Three-part Image Decomposition
• [cs.CV]Euclidean Auto Calibration of Camera Networks: Baseline Constraint Removes Scale Ambiguity
• [cs.CV]Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
• [cs.CV]Learning Deep Representations of Appearance and Motion for Anomalous Event Detection
• [cs.CV]On the Existence of Epipolar Matrices
• [cs.CV]Predicting Daily Activities From Egocentric Images Using Deep Learning
• [cs.CV]SentiCap: Generating Image Descriptions with Sentiments
• [cs.CV]Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders
• [cs.CV]Within-Brain Classification for Brain Tumor Segmentation
• [cs.GR]RAID: A Relation-Augmented Image Descriptor
• [cs.HC]Thousands of Positive Reviews: Distributed Mentoring in Online Fan Communities
• [cs.IR]Parameterized Neural Network Language Models for Information Retrieval
• [cs.IT]Quantifying Emergent Behavior of Autonomous Robots
• [cs.IT]Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices
• [cs.LG]A Stochastic Gradient Method with Linear Convergence Rate for a Class of Non-smooth Non-strongly Convex Optimizations
• [cs.LG]Large-scale subspace clustering using sketching and validation
• [cs.LG]Population-Contrastive-Divergence: Does Consistency help with RBM training?
• [cs.SI]Model of Multilayer Knowledge Diffusion for Competence Development in an Organization
• [cs.SI]On The Network You Keep: Analyzing Persons of Interest using Cliqster
• [cs.SI]On the evaluation potential of quality functions in community detection for different contexts
• [math.OC]DC Decomposition of Nonconvex Polynomials with Algebraic Techniques
• [math.ST]Change-point detection using the conditional entropy of ordinal patterns
• [math.ST]Inverse Problems for a Class of Conditional Probability Measure-Dependent Evolution Equations
• [stat.AP]Improving Ice Sheet Model Calibration Using Paleoclimate and Modern Data
• [stat.AP]The Problem with Assessing Statistical Methods
• [stat.ME]Four-Point, 2D, Free-Ranging, IMSPE-Optimal, Twin-Point Designs
• [stat.ML]Batch Normalized Recurrent Neural Networks
• [stat.ML]Bayesian Markov Blanket Estimation
• [stat.ML]Improved Estimation of Class Prior Probabilities through Unlabeled Data
• [stat.ML]Structured Transforms for Small-Footprint Deep Learning 

·····································

• [cs.CL]Analyzer and generator for Pali
David Alfter
//arxiv.org/abs/1510.01570v1 

This work describes a system that performs morphological analysis and generation of Pali words. The system works with regular inflectional paradigms and a lexical database. The generator is used to build a collection of inflected and derived words, which in turn is used by the analyzer. Generating and storing morphological forms along with the corresponding morphological information allows for efficient and simple look up by the analyzer. Indeed, by looking up a word and extracting the attached morphological information, the analyzer does not have to compute this information. As we must, however, assume the lexical database to be incomplete, the system can also work without the dictionary component, using a rule-based approach. 

• [cs.CL]Language Segmentation
David Alfter
//arxiv.org/abs/1510.01717v1 

Language segmentation consists in finding the boundaries where one language ends and another language begins in a text written in more than one language. This is important for all natural language processing tasks. The problem can be solved by training language models on language data. However, in the case of low- or no-resource languages, this is problematic. I therefore investigate whether unsupervised methods perform better than supervised methods when it is difficult or impossible to train supervised approaches. A special focus is given to difficult texts, i.e. texts that are rather short (one sentence), containing abbreviations, low-resource languages and non-standard language. I compare three approaches: supervised n-gram language models, unsupervised clustering and weakly supervised n-gram language model induction. I devised the weakly supervised approach in order to deal with difficult text specifically. In order to test the approach, I compiled a small corpus of different text types, ranging from one-sentence texts to texts of about 300 words. The weakly supervised language model induction approach works well on short and difficult texts, outperforming the clustering algorithm and reaching scores in the vicinity of the supervised approach. The results look promising, but there is room for improvement and a more thorough investigation should be undertaken. 

• [cs.CL]Rank-frequency relations of phonemes uncover an author-dependency of their usage
Weibing Deng, Armen E. Allahverdyan
//arxiv.org/abs/1510.01315v1 

We study rank-frequency relations for phonemes in texts written by different authors. We show that they can be described by generating phonemes via random probabilities governed by the (one-parameter) Dirichlet density, the simplest density for random probabilities. This description allows us to demonstrate that the rank-frequency relations for phonemes of a text do depend on the author. The author-dependency effect is not caused by common words used in different texts. This suggests that it is directly related to phonemes or/and syllables. These features contrast to rank-frequency relations for words, which are both author and text independent and are governed by the Zipf’s law. 

• [cs.CV]A Latent Source Model for Patch-Based Image Segmentation
George Chen, Devavrat Shah, Polina Golland
//arxiv.org/abs/1510.01648v1 

Despite the popularity and empirical success of patch-based nearest-neighbor and weighted majority voting approaches to medical image segmentation, there has been no theoretical development on when, why, and how well these nonparametric methods work. We bridge this gap by providing a theoretical performance guarantee for nearest-neighbor and weighted majority voting segmentation under a new probabilistic model for patch-based image segmentation. Our analysis relies on a new local property for how similar nearby patches are, and fuses existing lines of work on modeling natural imagery patches and theory for nonparametric classification. We use the model to derive a new patch-based segmentation algorithm that iterates between inferring local label patches and merging these local segmentations to produce a globally consistent image segmentation. Many existing patch-based algorithms arise as special cases of the new algorithm. 

• [cs.CV]Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks
Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, Tinne Tuytelaars
//arxiv.org/abs/1510.01544v1 

How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data? In this work we make a first contribution to answer this question in the context of image classification. We frame this quest as an active learning problem and use zero-shot classifiers to guide the learning process by linking the new task to the existing classifiers. By revisiting the dual formulation of adaptive SVM, we reveal two basic conditions to choose greedily only the most relevant samples to be annotated. On this basis we propose an effective active learning algorithm which learns the best possible target classification model with minimum human labeling effort. Extensive experiments on two challenging datasets show the value of our approach compared to the state-of-the-art active learning methodologies, as well as its potential to reuse past datasets with minimal effort for future tasks. 

• [cs.CV]Directional Global Three-part Image Decomposition
Duy Hoang Thai, Carsten Gottschlich
//arxiv.org/abs/1510.01490v1 

We consider the task of image decomposition and we introduce a new model coined directional global three-part decomposition (DG3PD) for solving it. As key ingredients of the DG3PD model, we introduce a discrete multi-directional total variation norm and a discrete multi-directional G-norm. Using these novel norms, the proposed discrete DG3PD model can decompose an image into two parts or into three parts. Existing models for image decomposition by Vese and Osher, by Aujol and Chambolle, by Starck et al., and by Thai and Gottschlich are included as special cases in the new model. Decomposition of an image by DG3PD results in a cartoon image, a texture image and a residual image. Advantages of the DG3PD model over existing ones lie in the properties enforced on the cartoon and texture images. The geometric objects in the cartoon image have a very smooth surface and sharp edges. The texture image yields oscillating patterns on a defined scale which is both smooth and sparse. Moreover, the DG3PD method achieves the goal of perfect reconstruction by summation of all components better than the other considered methods. Relevant applications of DG3PD are a novel way of image compression as well as feature extraction for applications such as latent fingerprint processing and optical character recognition. 

• [cs.CV]Euclidean Auto Calibration of Camera Networks: Baseline Constraint Removes Scale Ambiguity
Kiran Kumar Vupparaboina, Kamala Raghavan, Soumya Jana
//arxiv.org/abs/1510.01663v1 

Metric auto calibration of a camera network from multiple views has been reported by several authors. Resulting 3D reconstruction recovers shape faithfully, but not scale. However, preservation of scale becomes critical in applications, such as multi-party telepresence, where multiple 3D scenes need to be fused into a single coordinate system. In this context, we propose a camera network configuration that includes a stereo pair with known baseline separation, and analytically demonstrate Euclidean auto calibration of such network under mild conditions. Further, we experimentally validate our theory using a four-camera network. Importantly, our method not only recovers scale, but also compares favorably with the well known Zhang and Pollefeys methods in terms of shape recovery. 

• [cs.CV]Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
Ruobing Wu, Baoyuan Wang, Wenping Wang, Yizhou Yu
//arxiv.org/abs/1510.01440v1 

Recent work on scene classification still makes use of generic CNN features in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline built upon deep CNN features to harvest discriminative visual objects and parts for scene classification. We first use a region proposal technique to generate a set of high-quality patches potentially containing objects, and apply a pre-trained CNN to extract generic deep features from these patches. Then we perform both unsupervised and weakly supervised learning to screen these patches and discover discriminative ones representing category-specific objects and parts. We further apply discriminative clustering enhanced with local CNN fine-tuning to aggregate similar objects and parts into groups, called meta objects. A scene image representation is constructed by pooling the feature response maps of all the learned meta objects at multiple spatial scales. We have confirmed that the scene image representation obtained using this new pipeline is capable of delivering state-of-the-art performance on two popular scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and Sun397~\cite{Sun397} 

• [cs.CV]Learning Deep Representations of Appearance and Motion for Anomalous Event Detection
Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, Nicu Sebe
//arxiv.org/abs/1510.01553v1 

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches. 

• [cs.CV]On the Existence of Epipolar Matrices
Sameer Agarwal, Hon-Leung Lee, Bernd Sturmfels, Rekha R. Thomas
//arxiv.org/abs/1510.01401v1 

This paper considers the foundational question of the existence of a fundamental (resp. essential) matrix given $m$ point correspondences in two views. We present a complete answer for the existence of fundamental matrices for any value of $m$. Using examples we disprove the widely held beliefs that fundamental matrices always exist whenever $m \leq 7$. At the same time, we prove that they exist unconditionally when $m \leq 5$. Under a mild genericity condition, we show that an essential matrix always exists when $m \leq 4$. We also characterize the six and seven point configurations in two views for which all matrices satisfying the epipolar constraint have rank at most one. 

• [cs.CV]Predicting Daily Activities From Egocentric Images Using Deep Learning
Daniel Castro, Steven Hickson, Vinay Bettadapura, Edison Thomaz, Gregory Abowd, Henrik Christensen, Irfan Essa
//arxiv.org/abs/1510.01576v1 

We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person’s activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data. 

• [cs.CV]SentiCap: Generating Image Descriptions with Sentiments
Alexander Mathews, Lexing Xie, Xuming He
//arxiv.org/abs/1510.01431v1 

The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One of such styles is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions, of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment. 

• [cs.CV]Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders
Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo
//arxiv.org/abs/1510.01442v1 

With the growing popularity of short-form video sharing platforms such as \em{Instagram} and \em{Vine}, there has been an increasing need for techniques that automatically extract highlights from video. Whereas prior works have approached this problem with heuristic rules or supervised learning, we present an unsupervised learning approach that takes advantage of the abundance of user-edited videos on social media websites such as YouTube. Based on the idea that the most significant sub-events within a video class are commonly present among edited videos while less interesting ones appear less frequently, we identify the significant sub-events via a robust recurrent auto-encoder trained on a collection of user-edited videos queried for each particular class of interest. The auto-encoder is trained using a proposed shrinking exponential loss function that makes it robust to noise in the web-crawled training data, and is configured with bidirectional long short term memory (LSTM)~\cite{LSTM:97} cells to better model the temporal structure of highlight segments. Different from supervised techniques, our method can infer highlights using only a set of downloaded edited videos, without also needing their pre-edited counterparts which are rarely available online. Extensive experiments indicate the promise of our proposed solution in this challenging unsupervised settin 

• [cs.CV]Within-Brain Classification for Brain Tumor Segmentation
Mohammad Havaei, Hugo Larochelle, Philippe Poulin, Pierre-Marc Jodoin
//arxiv.org/abs/1510.01344v1 

Purpose: In this paper, we investigate a framework for interactive brain tumor segmentation which, at its core, treats the problem of interactive brain tumor segmentation as a machine learning problem. Methods: This method has an advantage over typical machine learning methods for this task where generalization is made across brains. The problem with these methods is that they need to deal with intensity bias correction and other MRI-specific noise. In this paper, we avoid these issues by approaching the problem as one of within brain generalization. Specifically, we propose a semi-automatic method that segments a brain tumor by training and generalizing within that brain only, based on some minimum user interaction. Conclusion: We investigate how adding spatial feature coordinates (i.e. $i$, $j$, $k$) to the intensity features can significantly improve the performance of different classification methods such as SVM, kNN and random forests. This would only be possible within an interactive framework. We also investigate the use of a more appropriate kernel and the adaptation of hyper-parameters specifically for each brain. Results: As a result of these experiments, we obtain an interactive method whose results reported on the MICCAI-BRATS 2013 dataset are the second most accurate compared to published methods, while using significantly less memory and processing power than most state-of-the-art methods. 

• [cs.GR]RAID: A Relation-Augmented Image Descriptor
Paul Guerrero, Niloy J. Mitra, Peter Wonka
//arxiv.org/abs/1510.01113v2 

As humans, we regularly interpret images based on the relations between image regions. For example, a person riding object X, or a plank bridging two objects. Current methods provide limited support to search for images based on such relations. We present RAID, a relation-augmented image descriptor that supports queries based on inter-region relations. The key idea of our descriptor is to capture the spatial distribution of simple point-to-region relationships to describe more complex relationships between two image regions. We evaluate the proposed descriptor by querying into a large subset of the Microsoft COCO database and successfully extract nontrivial images demonstrating complex inter-region relations, which are easily missed or erroneously classified by existing methods. 

• [cs.HC]Thousands of Positive Reviews: Distributed Mentoring in Online Fan Communities
Julie Ann Campbell, Cecilia Aragon, Katie Davis, Sarah Evans, Abigail Evans, David P. Randall
//arxiv.org/abs/1510.01425v1 

Young people worldwide are participating in ever-increasing numbers in online fan communities. Far from mere shallow repositories of pop culture, these sites are accumulating significant evidence that sophisticated informal learning is taking place online in novel and unexpected ways. In order to understand and analyze in more detail how learning might be occurring, we conducted an in-depth nine-month ethnographic investigation of online fanfiction communities, including participant observation and fanfiction author interviews. Our observations led to the development of a theory we term distributed mentoring, which we present in detail in this paper. Distributed mentoring exemplifies one instance of how networked technology affords new extensions of behaviors that were previously bounded by time and space. Distributed mentoring holds potential for application beyond the spontaneous mentoring observed in this investigation and may help students receive diverse, thoughtful feedback in formal learning environments as well. 

• [cs.IR]Parameterized Neural Network Language Models for Information Retrieval
Benjamin Piwowarski, Sylvain Lamprier, Nicolas Despres
//arxiv.org/abs/1510.01562v1 

Information Retrieval (IR) models need to deal with two difficult issues, vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to the difficulty of retrieving relevant documents that do not contain exact query terms but semantically related terms. Term dependencies refers to the need of considering the relationship between the words of the query when estimating the relevance of a document. A multitude of solutions has been proposed to solve each of these two problems, but no principled model solve both. In parallel, in the last few years, language models based on neural networks have been used to cope with complex natural language processing tasks like emotion and paraphrase detection. Although they present good abilities to cope with both term dependencies and vocabulary mismatch problems, thanks to the distributed representation of words they are based upon, such models could not be used readily in IR, where the estimation of one language model per document (or query) is required. This is both computationally unfeasible and prone to over-fitting. Based on a recent work that proposed to learn a generic language model that can be modified through a set of document-specific parameters, we explore use of new neural network models that are adapted to ad-hoc IR tasks. Within the language model IR framework, we propose and study the use of a generic language model as well as a document-specific language model. Both can be used as a smoothing component, but the latter is more adapted to the document at hand and has the potential of being used as a full document language model. We experiment with such models and analyze their results on TREC-1 to 8 datasets. 

• [cs.IT]Quantifying Emergent Behavior of Autonomous Robots
Georg Martius, Eckehard Olbrich
//arxiv.org/abs/1510.01495v1 

Quantifying behaviors of robots which were generated autonomously from task-independent objective functions is an important prerequisite for objective comparisons of algorithms and movements of animals. The temporal sequence of such a behavior can be considered as a time series and hence complexity measures developed for time series are natural candidates for its quantification. The predictive information and the excess entropy are such complexity measures. They measure the amount of information the past contains about the future and thus quantify the nonrandom structure in the temporal sequence. However, when using these measures for systems with continuous states one has to deal with the fact that their values will depend on the resolution with which the systems states are observed. For deterministic systems both measures will diverge with increasing resolution. We therefore propose a new decomposition of the excess entropy in resolution dependent and resolution independent parts and discuss how they depend on the dimensionality of the dynamics, correlations and the noise level. For the practical estimation we propose to use estimates based on the correlation integral instead of the direct estimation of the mutual information using the algorithm by Kraskov et al. (2004) which is based on next neighbor statistics because the latter allows less control of the scale dependencies. Using our algorithm we are able to show how autonomous learning generates behavior of increasing complexity with increasing learning duration. 

• [cs.IT]Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices
Sohail Bahmani, Justin Romberg
//arxiv.org/abs/1510.01670v1 

We introduce a technique for estimating a structured covariance matrix from observations of a random vector which have been sketched. Each observed random vector $\boldsymbol{x}t$ is reduced to a single number by taking its inner product against one of a number of pre-selected vector $\boldsymbol{a}\ell$. These observations are used to form estimates of linear observations of the covariance matrix $\boldsymbol{varSigma}$, which is assumed to be simultaneously sparse and low-rank. We show that if the sketching vectors $\boldsymbol{a}_\ell$ have a special structure, then we can use straightforward two-stage algorithm that exploits this structure. We show that the estimate is accurate when the number of sketches is proportional to the maximum of the rank times the number of significant rows/columns of $\boldsymbol{\varSigma}$. Moreover, our algorithm takes direct advantage of the low-rank structure of $\boldsymbol{\varSigma}$ by only manipulating matrices that are far smaller than the original covariance matrix. 

• [cs.LG]A Stochastic Gradient Method with Linear Convergence Rate for a Class of Non-smooth Non-strongly Convex Optimizations
Tianbao Yang, Qihang Lin
//arxiv.org/abs/1510.01444v1 

In this paper, we show that a (stochastic) gradient decent method with multiple restarting, named {\bf Restarted (S)GD}, can achieve a \textit{linear convergence rate} from a class of non-smooth and non-strongly convex optimization problems where the epigraph of the objective function is a polyhedron. Its applications in machine learning include minimizing $\ell_1$ constrained or regularized piecewise linear loss. To the best of our knowledge, this is the first result on the linear convergence rate of a stochastic gradient method for non-smooth and non-strongly convex optimization. 

• [cs.LG]Large-scale subspace clustering using sketching and validation
Panagiotis A. Traganitis, Konstantinos Slavakis, Georgios B. Giannakis
//arxiv.org/abs/1510.01628v1 

The nowadays massive amounts of generated and communicated data present major challenges in their processing. While capable of successfully classifying nonlinearly separable objects in various settings, subspace clustering (SC) methods incur prohibitively high computational complexity when processing large-scale data. Inspired by the random sampling and consensus (RANSAC) approach to robust regression, the present paper introduces a randomized scheme for SC, termed sketching and validation (SkeVa-)SC, tailored for large-scale data. At the heart of SkeVa-SC lies a randomized scheme for approximating the underlying probability density function of the observed data by kernel smoothing arguments. Sparsity in data representations is also exploited to reduce the computational burden of SC, while achieving high clustering accuracy. Performance analysis as well as extensive numerical tests on synthetic and real data corroborate the potential of SkeVa-SC and its competitive performance relative to state-of-the-art scalable SC approaches. Keywords: Subspace clustering, big data, kernel smoothing, randomization, sketching, validation, sparsity. 

• [cs.LG]Population-Contrastive-Divergence: Does Consistency help with RBM training?
Oswin Krause, Asja Fischer, Christian Igel
//arxiv.org/abs/1510.01624v1 

Estimating the log-likelihood gradient with respect to the parameters of a Restricted Boltzmann Machine (RBM) typically requires sampling using Markov Chain Monte Carlo (MCMC) techniques. To save computation time, the Markov chains are only run for a small number of steps, which leads to a biased estimate. This bias can cause RBM training algorithms such as Contrastive Divergence (CD) learning to deteriorate. We adopt the idea behind Population Monte Carlo (PMC) methods to devise a new RBM training algorithm termed Population-Contrastive-Divergence (pop-CD). Compared to CD, it leads to a consistent estimate and may have a significantly lower bias. Its computational overhead is negligible compared to CD. However, the variance of the gradient estimate increases. We experimentally show that pop-CD can significantly outperform CD. In many cases, we observed a smaller bias and achieved higher log-likelihood values. However, when the RBM distribution has many hidden neurons, the consistent estimate of pop-CD may still have a considerable bias and the variance of the gradient estimate requires a smaller learning rate. Thus, despite its superior theoretical properties, it is not advisable to use pop-CD in its current form on large problems. 

• [cs.SI]Model of Multilayer Knowledge Diffusion for Competence Development in an Organization
Przemyslaw Rozewski, Jaroslaw Jankowski
//arxiv.org/abs/1510.01577v1 

Growing role of intellectual capital within organizations is affecting new strategies related to knowledge management and competence development. Among different aspects related to this field, knowledge diffusion has become one of interesting areas from both practitioner and researchers perspective. Several models were proposed with main goal to simulate diffusion and to explain the nature of these processes. Existing models are focused on knowledge diffusion and they assume diffusion within a single layer using knowledge representation. From the organizational perspective connecting several types of knowledge and modelling changes of competence can bring additional value. In the article we extended existing approaches by using multilayer diffusion model and focused on analysis of competence development process. The proposed model describes competence development process in a new way through horizontal and vertical knowledge diffusion in multilayer network. In the network, agents collaborate and interchange various kind of knowledge through different layers and this mutual activities affect the competences in a positive or negative way. Taking under consideration workers cognitive and social abilities and the previous level of competence the new competence level can be estimated. The model is developed to support competence management in different organizations. 

• [cs.SI]On The Network You Keep: Analyzing Persons of Interest using Cliqster
Saber Shokat Fadaee, Mehrdad Farajtabar, Ravi Sundaram, Javed A. Aslam, Nikos Passas
//arxiv.org/abs/1510.01374v1 

Our goal is to determine the structural differences between different categories of networks and to use these differences to predict the network category. Existing work on this topic has looked at social networks such as Facebook, Twitter, co-author networks etc. We, instead, focus on a novel data set that we have assembled from a variety of sources, including law-enforcement agencies, financial institutions, commercial database providers and other similar organizations. The data set comprises networks of “persons of interest” with each network belonging to different categories such as suspected terrorists, convicted individuals etc. We demonstrate that such “anti-social” networks are qualitatively different from the usual social networks and that new techniques are required to identify and learn features of such networks for the purposes of prediction and classification. We propose Cliqster, a new generative Bernoulli process-based model for unweighted networks. The generating probabilities are the result of a decomposition which reflects a network’s community structure. Using a maximum likelihood solution for the network inference leads to a least-squares problem. By solving this problem, we are able to present an efficient algorithm for transforming the network to a new space which is both concise and discriminative. This new space preserves the identity of the network as much as possible. Our algorithm is interpretable and intuitive. Finally, by comparing our research against the baseline method (SVD) and against a state-of-the-art Graphlet algorithm, we show the strength of our algorithm in discriminating between different categories of networks. 

• [cs.SI]On the evaluation potential of quality functions in community detection for different contexts
Jean Creusefond, Thomas Largillier, Sylvain Peyronnet
//arxiv.org/abs/1510.01714v1 

Due to nowadays networks' sizes, the evaluation of a community detection algorithm can only be done using quality functions. These functions measure different networks/graphs structural properties, each of them corresponding to a different definition of a community. Since there exists many definitions for a community, choosing a quality function may be a difficult task, even if the networks' statistics/origins can give some clues about which one to choose. In this paper, we apply a general methodology to identify different contexts, i.e. groups of graphs where the quality functions behave similarly. In these contexts we identify the best quality functions, i.e. quality functions whose results are consistent with expectations from real life applications. 

• [math.OC]DC Decomposition of Nonconvex Polynomials with Algebraic Techniques
Amir Ali Ahmadi, Georgina Hall
//arxiv.org/abs/1510.01518v1 

We consider the problem of decomposing a multivariate polynomial as the difference of two convex polynomials. We introduce algebraic techniques which reduce this task to linear, second order cone, and semidefinite programming. This allows us to optimize over subsets of valid difference of convex decompositions (dcds) and find ones that speed up the convex-concave procedure (CCP). We prove, however, that optimizing over the entire set of dcds is NP-hard. 

• [math.ST]Change-point detection using the conditional entropy of ordinal patterns
Anton M. Unakafov, Karsten Keller
//arxiv.org/abs/1510.01457v1 

This paper is devoted to change-point detection using only the ordinal structure of a time series. A statistic based on the conditional entropy of ordinal patterns characterizing the local up and down in a time series is introduced and investigated. The statistic requires only minimal a priori information on given data and shows good performance in numerical experiments. 

• [math.ST]Inverse Problems for a Class of Conditional Probability Measure-Dependent Evolution Equations
David M. Bortz, Erin C. Byrne, Inom Mirzaev
//arxiv.org/abs/1510.01355v1 

We investigate the inverse problem of identifying a conditional probability measure in a measure-dependent dynamical system. We provide existence and well-posedness results and outline a discretization scheme for approximating a measure. For this scheme, we prove general method stability. The work is motivated by Partial Differential Equation (PDE) models of flocculation for which the shape of the post-fragmentation conditional probability measure greatly impacts the solution dynamics. To illustrate our methodology, we apply the theory to a particular PDE model that arises in the study of population dynamics for flocculating bacterial aggregates in suspension, and provide numerical evidence for the utility of the approach. 

• [stat.AP]Improving Ice Sheet Model Calibration Using Paleoclimate and Modern Data
Won Chang, Murali Haran, Patrick Applegate, David Pollard
//arxiv.org/abs/1510.01676v1 

Human-induced climate change may cause significant ice volume loss from the West Antarctic Ice Sheet (WAIS). Projections of ice volume change from ice-sheet models and corresponding future sea-level rise have large uncertainties due to poorly constrained input parameters. In most future applications to date, model calibration has utilized only modern or recent (decadal) observations, leaving input parameters that control the long-term behavior of WAIS largely unconstrained. Many paleo-observations are in the form of localized time series, while modern observations are non-Gaussian spatial data; combining information across these types poses non-trivial statistical challenges. Here we introduce a computationally efficient calibration approach that utilizes both modern and paleo-observations to generate better-constrained ice volume projections. Using fast emulators built upon principal component analysis and a reduced dimension calibration model, we can efficiently handle high-dimensional and non-Gaussian data. We apply our calibration approach to the PSU3D-ICE model which can realistically simulate long-term behavior of WAIS. Our results show that using paleo observations in calibration significantly reduces parametric uncertainty, resulting in sharper projections about the future state of WAIS. One benefit of using paleo observations is found to be that unrealistic simulations with overshoots in past ice retreat and projected future regrowth are eliminated. 

• [stat.AP]The Problem with Assessing Statistical Methods
Abigail Arnold, Jason Loeppky
//arxiv.org/abs/1510.01417v1 

In this paper, we investigate the problem of assessing statistical methods and effectively summarizing results from simulations. Specifically, we consider problems of the type where multiple methods are compared on a reasonably large test set of problems. These simulation studies are typically used to provide advice on an effective method for analyzing future untested problems. Most of these simulation studies never apply statistical methods to find which method(s) are expected to perform best. Instead, conclusions are based on a qualitative assessment of poorly chosen graphical and numerical summaries of the results. We illustrate that the Empirical Cumulative Distribution Function when used appropriately is an extremely effective tool for assessing what matters in large scale statistical simulations. 

• [stat.ME]Four-Point, 2D, Free-Ranging, IMSPE-Optimal, Twin-Point Designs
Selden Crary, Jan Stormann
//arxiv.org/abs/1510.01685v1 

We report the discovery of a set of four-point, two-factor, free-ranging, putatively IMSPE-optimal designs with a pair of twin points, in the statistical design of computer experiments, under Gaussian-process, fixed-Gaussian-covariance parameter, and zero-nugget assumptions. We conjecture this is the set of free-ranging, twin-point designs with the smallest number of degrees of freedom. 

• [stat.ML]Batch Normalized Recurrent Neural Networks
César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio
//arxiv.org/abs/1510.01378v1 

Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feedforward neural networks . In particular, batch normalization, which uses mini-batch statistics to standardize features, was shown to significantly reduce training time. In this paper, we show that applying batch normalization to the hidden-to-hidden transitions of our RNNs doesn’t help the training procedure. We also show that when applied to the input-to-hidden transitions, batch normalization can lead to a faster convergence of the training criterion but doesn’t seem to improve the generalization performance on both our language modelling and speech recognition tasks. All in all, applying batch normalization to RNNs turns out to be more challenging than applying it to feedforward networks, but certain variants of it can still be beneficial. 

• [stat.ML]Bayesian Markov Blanket Estimation
Dinu Kaufmann, Sonali Parbhoo, Aleksander Wieczorek, Sebastian Keller, David Adametz, Volker Roth
//arxiv.org/abs/1510.01485v1 

This paper considers a Bayesian view for estimating a sub-network in a Markov random field. The sub-network corresponds to the Markov blanket of a set of query variables, where the set of potential neighbours here is big. We factorize the posterior such that the Markov blanket is conditionally independent of the network of the potential neighbours. By exploiting this blockwise decoupling, we derive analytic expressions for posterior conditionals. Subsequently, we develop an inference scheme which makes use of the factorization. As a result, estimation of a sub-network is possible without inferring an entire network. Since the resulting Gibbs sampler scales linearly with the number of variables, it can handle relatively large neighbourhoods. The proposed scheme results in faster convergence and superior mixing of the Markov chain than existing Bayesian network estimation techniques. 

• [stat.ML]Improved Estimation of Class Prior Probabilities through Unlabeled Data
Norman Matloff
//arxiv.org/abs/1510.01422v1 

Work in the classification literature has shown that in computing a classification function, one need not know the class membership of all observations in the training set; the unlabeled observations still provide information on the marginal distribution of the feature set, and can thus contribute to increased classification accuracy for future observations. The present paper will show that this scheme can also be used for the estimation of class prior probabilities, which would be very useful in applications in which it is difficult or expensive to determine class membership. Both parametric and nonparametric estimators are developed. Asymptotic distributions of the estimators are derived, and it is proven that the use of the unlabeled observations does reduce asymptotic variance. This methodology is also extended to the estimation of subclass probabilities. 

• [stat.ML]Structured Transforms for Small-Footprint Deep Learning
Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar
//arxiv.org/abs/1510.01722v1 

We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices. We propose a unified framework to learn a broad family of structured parameter matrices that are characterized by the notion of low displacement rank. Our structured transforms admit fast function and gradient evaluation, and span a rich range of parameter sharing configurations whose statistical modeling capacity can be explicitly tuned along a continuum from structured to unstructured. Experimental results show that these transforms can significantly accelerate inference and forward/backward passes during training, and offer superior accuracy-compactness-speed tradeoffs in comparison to a number of existing techniques. In keyword spotting applications in mobile speech recognition, our methods are much more effective than standard linear low-rank bottleneck layers and nearly retain the performance of state of the art models, while providing more than 3.5-fold compression.

北邮PRIS模式识别实验室陈老师 商务合作 QQ:1289468869 Email:1289468869@qq.com