Paper Group ANR 330
Discover and Learn New Objects from Documentaries. Data-Dependent Stability of Stochastic Gradient Descent. Minimax Rates and Efficient Algorithms for Noisy Sorting. On the Consistency of $k$-means++ algorithm. Role of Deep LSTM Neural Networks And WiFi Networks in Support of Occupancy Prediction in Smart Buildings. Enabling Multi-Source Neural Mac …
Discover and Learn New Objects from Documentaries
Title | Discover and Learn New Objects from Documentaries |
Authors | Kai Chen, Hang Song, Chen Change Loy, Dahua Lin |
Abstract | Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task. Detectors learned from a public dataset can only work with a fixed list of categories, while training from scratch usually requires a large amount of training data with detailed annotations. This work aims to explore a novel approach – learning object detectors from documentary films in a weakly supervised manner. This is inspired by the observation that documentaries often provide dedicated exposition of certain object categories, where visual presentations are aligned with subtitles. We believe that object detectors can be learned from such a rich source of information. Towards this goal, we develop a joint probabilistic framework, where individual pieces of information, including video frames and subtitles, are brought together via both visual and linguistic links. On top of this formulation, we further derive a weakly supervised learning algorithm, where object model learning and training set mining are unified in an optimization procedure. Experimental results on a real world dataset demonstrate that this is an effective approach to learning new object detectors. |
Tasks | |
Published | 2017-07-30 |
URL | http://arxiv.org/abs/1707.09593v1 |
http://arxiv.org/pdf/1707.09593v1.pdf | |
PWC | https://paperswithcode.com/paper/discover-and-learn-new-objects-from |
Repo | |
Framework | |
Data-Dependent Stability of Stochastic Gradient Descent
Title | Data-Dependent Stability of Stochastic Gradient Descent |
Authors | Ilja Kuzborskij, Christoph H. Lampert |
Abstract | We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient. |
Tasks | |
Published | 2017-03-05 |
URL | http://arxiv.org/abs/1703.01678v4 |
http://arxiv.org/pdf/1703.01678v4.pdf | |
PWC | https://paperswithcode.com/paper/data-dependent-stability-of-stochastic |
Repo | |
Framework | |
Minimax Rates and Efficient Algorithms for Noisy Sorting
Title | Minimax Rates and Efficient Algorithms for Noisy Sorting |
Authors | Cheng Mao, Jonathan Weed, Philippe Rigollet |
Abstract | There has been a recent surge of interest in studying permutation-based models for ranking from pairwise comparison data. Despite being structurally richer and more robust than parametric ranking models, permutation-based models are less well understood statistically and generally lack efficient learning algorithms. In this work, we study a prototype of permutation-based ranking models, namely, the noisy sorting model. We establish the optimal rates of learning the model under two sampling procedures. Furthermore, we provide a fast algorithm to achieve near-optimal rates if the observations are sampled independently. Along the way, we discover properties of the symmetric group which are of theoretical interest. |
Tasks | |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10388v1 |
http://arxiv.org/pdf/1710.10388v1.pdf | |
PWC | https://paperswithcode.com/paper/minimax-rates-and-efficient-algorithms-for |
Repo | |
Framework | |
On the Consistency of $k$-means++ algorithm
Title | On the Consistency of $k$-means++ algorithm |
Authors | Mieczysław A. Kłopotek |
Abstract | We prove in this paper that the expected value of the objective function of the $k$-means++ algorithm for samples converges to population expected value. As $k$-means++, for samples, provides with constant factor approximation for $k$-means objectives, such an approximation can be achieved for the population with increase of the sample size. This result is of potential practical relevance when one is considering using subsampling when clustering large data sets (large data bases). |
Tasks | |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.06120v1 |
http://arxiv.org/pdf/1702.06120v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-consistency-of-k-means-algorithm |
Repo | |
Framework | |
Role of Deep LSTM Neural Networks And WiFi Networks in Support of Occupancy Prediction in Smart Buildings
Title | Role of Deep LSTM Neural Networks And WiFi Networks in Support of Occupancy Prediction in Smart Buildings |
Authors | Basheer Qolomany, Ala Al-Fuqaha, Driss Benhaddou, Ajay Gupta |
Abstract | Knowing how many people occupy a building, and where they are located, is a key component of smart building services. Commercial, industrial and residential buildings often incorporate systems used to determine occupancy. However, relatively simple sensor technology and control algorithms limit the effectiveness of smart building services. In this paper we propose to replace sensor technology with time series models that can predict the number of occupants at a given location and time. We use Wi-Fi data sets readily available in abundance for smart building services and train Auto Regression Integrating Moving Average (ARIMA) models and Long Short-Term Memory (LSTM) time series models. As a use case scenario of smart building services, these models allow forecasting of the number of people at a given time and location in 15, 30 and 60 minutes time intervals at building as well as Access Point (AP) level. For LSTM, we build our models in two ways: a separate model for every time scale, and a combined model for the three time scales. Our experiments show that LSTM combined model reduced the computational resources with respect to the number of neurons by 74.48 % for the AP level, and by 67.13 % for the building level. Further, the root mean square error (RMSE) was reduced by 88.2% - 93.4% for LSTM in comparison to ARIMA for the building levels models and by 80.9% - 87% for the AP level models. |
Tasks | Time Series |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10355v1 |
http://arxiv.org/pdf/1711.10355v1.pdf | |
PWC | https://paperswithcode.com/paper/role-of-deep-lstm-neural-networks-and-wifi |
Repo | |
Framework | |
Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages
Title | Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages |
Authors | Raj Dabre, Fabien Cromieres, Sadao Kurohashi |
Abstract | In this paper, we explore a simple solution to “Multi-Source Neural Machine Translation” (MSNMT) which only relies on preprocessing a N-way multilingual corpus without modifying the Neural Machine Translation (NMT) architecture or training procedure. We simply concatenate the source sentences to form a single long multi-source input sentence while keeping the target side sentence as it is and train an NMT system using this preprocessed corpus. We evaluate our method in resource poor as well as resource rich settings and show its effectiveness (up to 4 BLEU using 2 source languages and up to 6 BLEU using 5 source languages). We also compare against existing methods for MSNMT and show that our solution gives competitive results despite its simplicity. We also provide some insights on how the NMT system leverages multilingual information in such a scenario by visualizing attention. |
Tasks | Machine Translation |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.06135v4 |
http://arxiv.org/pdf/1702.06135v4.pdf | |
PWC | https://paperswithcode.com/paper/enabling-multi-source-neural-machine |
Repo | |
Framework | |
Scaling Properties of Human Brain Functional Networks
Title | Scaling Properties of Human Brain Functional Networks |
Authors | Riccardo Zucca, Xerxes D. Arsiwalla, Hoang Le, Mikail Rubinov, Paul Verschure |
Abstract | We investigate scaling properties of human brain functional networks in the resting-state. Analyzing network degree distributions, we statistically test whether their tails scale as power-law or not. Initial studies, based on least-squares fitting, were shown to be inadequate for precise estimation of power-law distributions. Subsequently, methods based on maximum-likelihood estimators have been proposed and applied to address this question. Nevertheless, no clear consensus has emerged, mainly because results have shown substantial variability depending on the data-set used or its resolution. In this study, we work with high-resolution data (10K nodes) from the Human Connectome Project and take into account network weights. We test for the power-law, exponential, log-normal and generalized Pareto distributions. Our results show that the statistics generally do not support a power-law, but instead these degree distributions tend towards the thin-tail limit of the generalized Pareto model. This may have implications for the number of hubs in human brain functional networks. |
Tasks | |
Published | 2017-02-02 |
URL | http://arxiv.org/abs/1702.00768v1 |
http://arxiv.org/pdf/1702.00768v1.pdf | |
PWC | https://paperswithcode.com/paper/scaling-properties-of-human-brain-functional |
Repo | |
Framework | |
Irregular Convolutional Neural Networks
Title | Irregular Convolutional Neural Networks |
Authors | Jiabin Ma, Wei Wang, Liang Wang |
Abstract | Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like ${3\times3}$, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN. |
Tasks | Semantic Segmentation |
Published | 2017-06-24 |
URL | http://arxiv.org/abs/1706.07966v1 |
http://arxiv.org/pdf/1706.07966v1.pdf | |
PWC | https://paperswithcode.com/paper/irregular-convolutional-neural-networks |
Repo | |
Framework | |
Integration of Japanese Papers Into the DBLP Data Set
Title | Integration of Japanese Papers Into the DBLP Data Set |
Authors | Paul Christian Sommerhoff |
Abstract | If someone is looking for a certain publication in the field of computer science, the searching person is likely to use the DBLP to find the desired publication. The DBLP data set is continuously extended with new publications, or rather their metadata, for example the names of involved authors, the title and the publication date. While the size of the data set is already remarkable, specific areas can still be improved. The DBLP offers a huge collection of English papers because most papers concerning computer science are published in English. Nevertheless, there are official publications in other languages which are supposed to be added to the data set. One kind of these are Japanese papers. This diploma thesis will show a way to automatically process publication lists of Japanese papers and to make them ready for an import into the DBLP data set. Especially important are the problems along the way of processing, such as transcription handling and Personal Name Matching with Japanese names. |
Tasks | |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.09119v1 |
http://arxiv.org/pdf/1709.09119v1.pdf | |
PWC | https://paperswithcode.com/paper/integration-of-japanese-papers-into-the-dblp |
Repo | |
Framework | |
Survey on Models and Techniques for Root-Cause Analysis
Title | Survey on Models and Techniques for Root-Cause Analysis |
Authors | Marc Solé, Victor Muntés-Mulero, Annie Ibrahim Rana, Giovani Estrada |
Abstract | Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how to find appropriate mechanisms to determine root causes that can handle huge amounts of data or may provide valuable feedback in real-time. While many survey papers aim at summarizing the landscape of techniques for modelling system behavior and infering the root cause of a problem based in the resulting models, none of those focuses on analyzing how the different techniques in the literature fit growing requirements in terms of performance and scalability. In this survey, we provide a review of root-cause analysis, focusing on these particular aspects. We also provide guidance to choose the best root-cause analysis strategy depending on the requirements of a particular system and application. |
Tasks | |
Published | 2017-01-30 |
URL | http://arxiv.org/abs/1701.08546v2 |
http://arxiv.org/pdf/1701.08546v2.pdf | |
PWC | https://paperswithcode.com/paper/survey-on-models-and-techniques-for-root |
Repo | |
Framework | |
Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH
Title | Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH |
Authors | Luis Argerich, Natalia Golmar |
Abstract | In this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing. These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH families. While the theoretical analysis is done for the angular distance, these families can also be used in practice for the euclidean distance with excellent results [2]. Our tests using real datasets show that the proposed LSH functions work well for the euclidean distance. |
Tasks | |
Published | 2017-04-15 |
URL | http://arxiv.org/abs/1704.04684v1 |
http://arxiv.org/pdf/1704.04684v1.pdf | |
PWC | https://paperswithcode.com/paper/generic-lsh-families-for-the-angular-distance |
Repo | |
Framework | |
Model-Based Clustering of Nonparametric Weighted Networks with Application to Water Pollution Analysis
Title | Model-Based Clustering of Nonparametric Weighted Networks with Application to Water Pollution Analysis |
Authors | Amal Agarwal, Lingzhou Xue |
Abstract | Water pollution is a major global environmental problem, and it poses a great environmental risk to public health and biological diversity. This work is motivated by assessing the potential environmental threat of coal mining through increased sulfate concentrations in river networks, which do not belong to any simple parametric distribution. However, existing network models mainly focus on binary or discrete networks and weighted networks with known parametric weight distributions. We propose a principled nonparametric weighted network model based on exponential-family random graph models and local likelihood estimation and study its model-based clustering with application to large-scale water pollution network analysis. We do not require any parametric distribution assumption on network weights. The proposed method greatly extends the methodology and applicability of statistical network models. Furthermore, it is scalable to large and complex networks in large-scale environmental studies. The power of our proposed methods is demonstrated in simulation studies and a real application to sulfate pollution network analysis in Ohio watershed located in Pennsylvania, United States. |
Tasks | |
Published | 2017-12-21 |
URL | https://arxiv.org/abs/1712.07800v2 |
https://arxiv.org/pdf/1712.07800v2.pdf | |
PWC | https://paperswithcode.com/paper/model-based-clustering-of-nonparametric |
Repo | |
Framework | |
Learning of Colors from Color Names: Distribution and Point Estimation
Title | Learning of Colors from Color Names: Distribution and Point Estimation |
Authors | Lyndon White, Roberto Togneri, Wei Liu, Mohammed Bennamoun |
Abstract | Color names are often made up of multiple words. As a task in natural language understanding we investigate in depth the capacity of neural networks based on sums of word embeddings (SOWE), recurrence (LSTM and GRU based RNNs) and convolution (CNN), to estimate colors from sequences of terms. We consider both point and distribution estimates of color. We argue that the latter has a particular value as there is no clear agreement between people as to what a particular color describes – different people have a different idea of what it means to be ``very dark orange’', for example. Surprisingly, despite it’s simplicity, the sum of word embeddings generally performs the best on almost all evaluations. | |
Tasks | Word Embeddings |
Published | 2017-09-27 |
URL | https://arxiv.org/abs/1709.09360v3 |
https://arxiv.org/pdf/1709.09360v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-distributions-of-meant-color |
Repo | |
Framework | |
Counterfactual Control for Free from Generative Models
Title | Counterfactual Control for Free from Generative Models |
Authors | Nicholas Guttenberg, Yen Yu, Ryota Kanai |
Abstract | We introduce a method by which a generative model learning the joint distribution between actions and future states can be used to automatically infer a control scheme for any desired reward function, which may be altered on the fly without retraining the model. In this method, the problem of action selection is reduced to one of gradient descent on the latent space of the generative model, with the model itself providing the means of evaluating outcomes and finding the gradient, much like how the reward network in Deep Q-Networks (DQN) provides gradient information for the action generator. Unlike DQN or Actor-Critic, which are conditional models for a specific reward, using a generative model of the full joint distribution permits the reward to be changed on the fly. In addition, the generated futures can be inspected to gain insight in to what the network ‘thinks’ will happen, and to what went wrong when the outcomes deviate from prediction. |
Tasks | |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06676v2 |
http://arxiv.org/pdf/1702.06676v2.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-control-for-free-from |
Repo | |
Framework | |
On the ERM Principle with Networked Data
Title | On the ERM Principle with Networked Data |
Authors | Yuanhong Wang, Yuyi Wang, Xingwu Liu, Juhua Pu |
Abstract | Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d.\ assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the classical empirical risk minimization (ERM) principle that always weights every example equally, but this strategy leads to unsatisfactory bounds. We consider general weighted ERM and show new universal risk bounds for this problem. These new bounds naturally define an optimization problem which leads to appropriate weights for networked examples. Though this optimization problem is not convex in general, we devise a new fully polynomial-time approximation scheme (FPTAS) to solve it. |
Tasks | Learning-To-Rank, Link Prediction |
Published | 2017-11-12 |
URL | http://arxiv.org/abs/1711.04297v2 |
http://arxiv.org/pdf/1711.04297v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-erm-principle-with-networked-data |
Repo | |
Framework | |