Paper Group ANR 152
High-Resolution Semantic Labeling with Convolutional Neural Networks. Dense CNN Learning with Equivalent Mappings. Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change. A Focused Dynamic Attention Model for Visual Question Answering. Exploiting Temporal Information for DCNN-based Fine-Grained Object Classifica …
High-Resolution Semantic Labeling with Convolutional Neural Networks
Title | High-Resolution Semantic Labeling with Convolutional Neural Networks |
Authors | Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, Pierre Alliez |
Abstract | Convolutional neural networks (CNNs) have received increasing attention over the last few years. They were initially conceived for image categorization, i.e., the problem of assigning a semantic label to an entire input image. In this paper we address the problem of dense semantic labeling, which consists in assigning a semantic label to every pixel in an image. Since this requires a high spatial accuracy to determine where labels are assigned, categorization CNNs, intended to be highly robust to local deformations, are not directly applicable. By adapting categorization networks, many semantic labeling CNNs have been recently proposed. Our first contribution is an in-depth analysis of these architectures. We establish the desired properties of an ideal semantic labeling CNN, and assess how those methods stand with regard to these properties. We observe that even though they provide competitive results, these CNNs often underexploit properties of semantic labeling that could lead to more effective and efficient architectures. Out of these observations, we then derive a CNN framework specifically adapted to the semantic labeling problem. In addition to learning features at different resolutions, it learns how to combine these features. By integrating local and global information in an efficient and flexible manner, it outperforms previous techniques. We evaluate the proposed framework and compare it with state-of-the-art architectures on public benchmarks of high-resolution aerial image labeling. |
Tasks | Image Categorization |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.01962v1 |
http://arxiv.org/pdf/1611.01962v1.pdf | |
PWC | https://paperswithcode.com/paper/high-resolution-semantic-labeling-with |
Repo | |
Framework | |
Dense CNN Learning with Equivalent Mappings
Title | Dense CNN Learning with Equivalent Mappings |
Authors | Jianxin Wu, Chen-Wei Xie, Jian-Hao Luo |
Abstract | Large receptive field and dense prediction are both important for achieving high accuracy in pixel labeling tasks such as semantic segmentation. These two properties, however, contradict with each other. A pooling layer (with stride 2) quadruples the receptive field size but reduces the number of predictions to 25%. Some existing methods lead to dense predictions using computations that are not equivalent to the original model. In this paper, we propose the equivalent convolution (eConv) and equivalent pooling (ePool) layers, leading to predictions that are both dense and equivalent to the baseline CNN model. Dense prediction models learned using eConv and ePool can transfer the baseline CNN’s parameters as a starting point, and can inverse transfer the learned parameters in a dense model back to the original one, which has both fast testing speed and high accuracy. The proposed eConv and ePool layers have achieved higher accuracy than baseline CNN in various tasks, including semantic segmentation, object localization, image categorization and apparent age estimation, not only in those tasks requiring dense pixel labeling. |
Tasks | Age Estimation, Image Categorization, Object Localization, Semantic Segmentation |
Published | 2016-05-24 |
URL | http://arxiv.org/abs/1605.07251v1 |
http://arxiv.org/pdf/1605.07251v1.pdf | |
PWC | https://paperswithcode.com/paper/dense-cnn-learning-with-equivalent-mappings |
Repo | |
Framework | |
Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change
Title | Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change |
Authors | William L. Hamilton, Jure Leskovec, Dan Jurafsky |
Abstract | Words shift in meaning for many reasons, including cultural factors like new technologies and regular linguistic processes like subjectification. Understanding the evolution of language and culture requires disentangling these underlying causes. Here we show how two different distributional measures can be used to detect two different types of semantic change. The first measure, which has been used in many previous works, analyzes global shifts in a word’s distributional semantics, it is sensitive to changes due to regular processes of linguistic drift, such as the semantic generalization of promise (“I promise.” -> “It promised to be exciting."). The second measure, which we develop here, focuses on local changes to a word’s nearest semantic neighbors; it is more sensitive to cultural shifts, such as the change in the meaning of cell (“prison cell” -> “cell phone”). Comparing measurements made by these two methods allows researchers to determine whether changes are more cultural or linguistic in nature, a distinction that is essential for work in the digital humanities and historical linguistics. |
Tasks | |
Published | 2016-06-09 |
URL | http://arxiv.org/abs/1606.02821v2 |
http://arxiv.org/pdf/1606.02821v2.pdf | |
PWC | https://paperswithcode.com/paper/cultural-shift-or-linguistic-drift-comparing |
Repo | |
Framework | |
A Focused Dynamic Attention Model for Visual Question Answering
Title | A Focused Dynamic Attention Model for Visual Question Answering |
Authors | Ilija Ilievski, Shuicheng Yan, Jiashi Feng |
Abstract | Visual Question and Answering (VQA) problems are attracting increasing interest from multiple research disciplines. Solving VQA problems requires techniques from both computer vision for understanding the visual contents of a presented image or video, as well as the ones from natural language processing for understanding semantics of the question and generating the answers. Regarding visual content modeling, most of existing VQA methods adopt the strategy of extracting global features from the image or video, which inevitably fails in capturing fine-grained information such as spatial configuration of multiple objects. Extracting features from auto-generated regions – as some region-based image recognition methods do – cannot essentially address this problem and may introduce some overwhelming irrelevant features with the question. In this work, we propose a novel Focused Dynamic Attention (FDA) model to provide better aligned image content representation with proposed questions. Being aware of the key words in the question, FDA employs off-the-shelf object detector to identify important regions and fuse the information from the regions and global features via an LSTM unit. Such question-driven representations are then combined with question representation and fed into a reasoning unit for generating the answers. Extensive evaluation on a large-scale benchmark dataset, VQA, clearly demonstrate the superior performance of FDA over well-established baselines. |
Tasks | Question Answering, Visual Question Answering |
Published | 2016-04-06 |
URL | http://arxiv.org/abs/1604.01485v1 |
http://arxiv.org/pdf/1604.01485v1.pdf | |
PWC | https://paperswithcode.com/paper/a-focused-dynamic-attention-model-for-visual |
Repo | |
Framework | |
Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification
Title | Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification |
Authors | ZongYuan Ge, Chris McCool, Conrad Sanderson, Peng Wang, Lingqiao Liu, Ian Reid, Peter Corke |
Abstract | Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutional neural network (DCNN) based approaches, which we specifically adapt to the task. We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences. We then fuse the bilinear DCNN and early fusion of the two-stream approach to combine the spatial and temporal information at the local and global level (Spatio-Temporal Co-occurrence). Using the new and challenging video dataset of birds, classification performance is improved from 23.1% (using single images) to 41.1% when using the Spatio-Temporal Co-occurrence system. Incorporating automatically detected bounding box location further improves the classification accuracy to 53.6%. |
Tasks | Object Classification |
Published | 2016-08-01 |
URL | http://arxiv.org/abs/1608.00486v3 |
http://arxiv.org/pdf/1608.00486v3.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-temporal-information-for-dcnn |
Repo | |
Framework | |
Harassment detection: a benchmark on the #HackHarassment dataset
Title | Harassment detection: a benchmark on the #HackHarassment dataset |
Authors | Alexei Bastidas, Edward Dixon, Chris Loo, John Ryan |
Abstract | Online harassment has been a problem to a greater or lesser extent since the early days of the internet. Previous work has applied anti-spam techniques like machine-learning based text classification (Reynolds, 2011) to detecting harassing messages. However, existing public datasets are limited in size, with labels of varying quality. The #HackHarassment initiative (an alliance of 1 tech companies and NGOs devoted to fighting bullying on the internet) has begun to address this issue by creating a new dataset superior to its predecssors in terms of both size and quality. As we (#HackHarassment) complete further rounds of labelling, later iterations of this dataset will increase the available samples by at least an order of magnitude, enabling corresponding improvements in the quality of machine learning models for harassment detection. In this paper, we introduce the first models built on the #HackHarassment dataset v1.0 (a new open dataset, which we are delighted to share with any interested researcherss) as a benchmark for future research. |
Tasks | Text Classification |
Published | 2016-09-09 |
URL | http://arxiv.org/abs/1609.02809v1 |
http://arxiv.org/pdf/1609.02809v1.pdf | |
PWC | https://paperswithcode.com/paper/harassment-detection-a-benchmark-on-the |
Repo | |
Framework | |
Encoding Temporal Markov Dynamics in Graph for Visualizing and Mining Time Series
Title | Encoding Temporal Markov Dynamics in Graph for Visualizing and Mining Time Series |
Authors | Lu Liu, Zhiguang Wang |
Abstract | Time series and signals are attracting more attention across statistics, machine learning and pattern recognition as it appears widely in the industry especially in sensor and IoT related research and applications, but few advances has been achieved in effective time series visual analytics and interaction due to its temporal dimensionality and complex dynamics. Inspired by recent effort on using network metrics to characterize time series for classification, we present an approach to visualize time series as complex networks based on the first order Markov process in its temporal ordering. In contrast to the classical bar charts, line plots and other statistics based graph, our approach delivers more intuitive visualization that better preserves both the temporal dependency and frequency structures. It provides a natural inverse operation to map the graph back to raw signals, making it possible to use graph statistics to characterize time series for better visual exploration and statistical analysis. Our experimental results suggest the effectiveness on various tasks such as pattern discovery and classification on both synthetic and the real time series and sensor data. |
Tasks | Time Series |
Published | 2016-10-24 |
URL | http://arxiv.org/abs/1610.07273v4 |
http://arxiv.org/pdf/1610.07273v4.pdf | |
PWC | https://paperswithcode.com/paper/encoding-temporal-markov-dynamics-in-graph |
Repo | |
Framework | |
Effective Multi-Robot Spatial Task Allocation using Model Approximations
Title | Effective Multi-Robot Spatial Task Allocation using Model Approximations |
Authors | Okan Aşık, H. Levent Akın |
Abstract | Real-world multi-agent planning problems cannot be solved using decision-theoretic planning methods due to the exponential complexity. We approximate firefighting in rescue simulation as a spatially distributed task and model with multi-agent Markov decision process. We use recent approximation methods for spatial task problems to reduce the model complexity. Our approximations are single-agent, static task, shortest path pruning, dynamic planning horizon, and task clustering. We create scenarios from RoboCup Rescue Simulation maps and evaluate our methods on these graph worlds. The results show that our approach is faster and better than comparable methods and has negligible performance loss compared to the optimal policy. We also show that our method has a similar performance as DCOP methods on example RCRS scenarios. |
Tasks | |
Published | 2016-06-04 |
URL | http://arxiv.org/abs/1606.01380v1 |
http://arxiv.org/pdf/1606.01380v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-multi-robot-spatial-task-allocation |
Repo | |
Framework | |
What Can Be Predicted from Six Seconds of Driver Glances?
Title | What Can Be Predicted from Six Seconds of Driver Glances? |
Authors | Lex Fridman, Heishiro Toyoda, Sean Seaman, Bobbie Seppelt, Linda Angell, Joonbum Lee, Bruce Mehler, Bryan Reimer |
Abstract | We consider a large dataset of real-world, on-road driving from a 100-car naturalistic study to explore the predictive power of driver glances and, specifically, to answer the following question: what can be predicted about the state of the driver and the state of the driving environment from a 6-second sequence of macro-glances? The context-based nature of such glances allows for application of supervised learning to the problem of vision-based gaze estimation, making it robust, accurate, and reliable in messy, real-world conditions. So, it’s valuable to ask whether such macro-glances can be used to infer behavioral, environmental, and demographic variables? We analyze 27 binary classification problems based on these variables. The takeaway is that glance can be used as part of a multi-sensor real-time system to predict radio-tuning, fatigue state, failure to signal, talking, and several environment variables. |
Tasks | Gaze Estimation |
Published | 2016-11-26 |
URL | http://arxiv.org/abs/1611.08754v1 |
http://arxiv.org/pdf/1611.08754v1.pdf | |
PWC | https://paperswithcode.com/paper/what-can-be-predicted-from-six-seconds-of |
Repo | |
Framework | |
Neural Machine Translation with Pivot Languages
Title | Neural Machine Translation with Pivot Languages |
Authors | Yong Cheng, Yang Liu, Qian Yang, Maosong Sun, Wei Xu |
Abstract | While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages. |
Tasks | Machine Translation |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04928v2 |
http://arxiv.org/pdf/1611.04928v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-with-pivot |
Repo | |
Framework | |
A case study of algorithm selection for the traveling thief problem
Title | A case study of algorithm selection for the traveling thief problem |
Authors | Markus Wagner, Marius Lindauer, Mustafa Misir, Samadhi Nallaperuma, Frank Hutter |
Abstract | Many real-world problems are composed of several interacting components. In order to facilitate research on such interactions, the Traveling Thief Problem (TTP) was created in 2013 as the combination of two well-understood combinatorial optimization problems. With this article, we contribute in four ways. First, we create a comprehensive dataset that comprises the performance data of 21 TTP algorithms on the full original set of 9720 TTP instances. Second, we define 55 characteristics for all TPP instances that can be used to select the best algorithm on a per-instance basis. Third, we use these algorithms and features to construct the first algorithm portfolios for TTP, clearly outperforming the single best algorithm. Finally, we study which algorithms contribute most to this portfolio. |
Tasks | Combinatorial Optimization |
Published | 2016-09-02 |
URL | http://arxiv.org/abs/1609.00462v1 |
http://arxiv.org/pdf/1609.00462v1.pdf | |
PWC | https://paperswithcode.com/paper/a-case-study-of-algorithm-selection-for-the |
Repo | |
Framework | |
Parameter Learning for Log-supermodular Distributions
Title | Parameter Learning for Log-supermodular Distributions |
Authors | Tatiana Shpakova, Francis Bach |
Abstract | We consider log-supermodular models on binary variables, which are probabilistic models with negative log-densities which are submodular. These models provide probabilistic interpretations of common combinatorial optimization tasks such as image segmentation. In this paper, we focus primarily on parameter estimation in the models from known upper-bounds on the intractable log-partition function. We show that the bound based on separable optimization on the base polytope of the submodular function is always inferior to a bound based on “perturb-and-MAP” ideas. Then, to learn parameters, given that our approximation of the log-partition function is an expectation (over our own randomization), we use a stochastic subgradient technique to maximize a lower-bound on the log-likelihood. This can also be extended to conditional maximum likelihood. We illustrate our new results in a set of experiments in binary image denoising, where we highlight the flexibility of a probabilistic model to learn with missing data. |
Tasks | Combinatorial Optimization, Denoising, Image Denoising, Semantic Segmentation |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05258v1 |
http://arxiv.org/pdf/1608.05258v1.pdf | |
PWC | https://paperswithcode.com/paper/parameter-learning-for-log-supermodular |
Repo | |
Framework | |
How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods
Title | How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods |
Authors | Andrej Risteski |
Abstract | We consider the problem of approximating partition functions for Ising models. We make use of recent tools in combinatorial optimization: the Sherali-Adams and Lasserre convex programming hierarchies, in combination with variational methods to get algorithms for calculating partition functions in these families. These techniques give new, non-trivial approximation guarantees for the partition function beyond the regime of correlation decay. They also generalize some classical results from statistical physics about the Curie-Weiss ferromagnetic Ising model, as well as provide a partition function counterpart of classical results about max-cut on dense graphs \cite{arora1995polynomial}. With this, we connect techniques from two apparently disparate research areas – optimization and counting/partition function approximations. (i.e. #-P type of problems). Furthermore, we design to the best of our knowledge the first provable, convex variational methods. Though in the literature there are a host of convex versions of variational methods \cite{wainwright2003tree, wainwright2005new, heskes2006convexity, meshi2009convexifying}, they come with no guarantees (apart from some extremely special cases, like e.g. the graph has a single cycle \cite{weiss2000correctness}). We consider dense and low threshold rank graphs, and interestingly, the reason our approach works on these types of graphs is because local correlations propagate to global correlations – completely the opposite of algorithms based on correlation decay. In the process we design novel entropy approximations based on the low-order moments of a distribution. Our proof techniques are very simple and generic, and likely to be applicable to many other settings other than Ising models. |
Tasks | Combinatorial Optimization |
Published | 2016-07-11 |
URL | http://arxiv.org/abs/1607.03183v1 |
http://arxiv.org/pdf/1607.03183v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-calculate-partition-functions-using |
Repo | |
Framework | |
Labeling of Query Words using Conditional Random Field
Title | Labeling of Query Words using Conditional Random Field |
Authors | Satanu Ghosh, Souvick Ghosh, Dipankar Das |
Abstract | This paper describes our approach on Query Word Labeling as an attempt in the shared task on Mixed Script Information Retrieval at Forum for Information Retrieval Evaluation (FIRE) 2015. The query is written in Roman script and the words were in English or transliterated from Indian regional languages. A total of eight Indian languages were present in addition to English. We also identified the Named Entities and special symbols as part of our task. A CRF based machine learning framework was used for labeling the individual words with their corresponding language labels. We used a dictionary based approach for language identification. We also took into account the context of the word while identifying the language. Our system demonstrated an overall accuracy of 75.5% for token level language identification. The strict F-measure scores for the identification of token level language labels for Bengali, English and Hindi are 0.7486, 0.892 and 0.7972 respectively. The overall weighted F-measure of our system was 0.7498. |
Tasks | Information Retrieval, Language Identification |
Published | 2016-07-29 |
URL | http://arxiv.org/abs/1607.08883v1 |
http://arxiv.org/pdf/1607.08883v1.pdf | |
PWC | https://paperswithcode.com/paper/labeling-of-query-words-using-conditional |
Repo | |
Framework | |
Approximated Robust Principal Component Analysis for Improved General Scene Background Subtraction
Title | Approximated Robust Principal Component Analysis for Improved General Scene Background Subtraction |
Authors | Salehe Erfanian Ebadi, Valia Guerra Ones, Ebroul Izquierdo |
Abstract | The research reported in this paper addresses the fundamental task of separation of locally moving or deforming image areas from a static or globally moving background. It builds on the latest developments in the field of robust principal component analysis, specifically, the recently reported practical solutions for the long-standing problem of recovering the low-rank and sparse parts of a large matrix made up of the sum of these two components. This article addresses a few critical issues including: embedding global motion parameters in the matrix decomposition model, i.e., estimation of global motion parameters simultaneously with the foreground/background separation task, considering matrix block-sparsity rather than generic matrix sparsity as natural feature in video processing applications, attenuating background ghosting effects when foreground is subtracted, and more critically providing an extremely efficient algorithm to solve the low-rank/sparse matrix decomposition task. The first aspect is important for background/foreground separation in generic video sequences where the background usually obeys global displacements originated by the camera motion in the capturing process. The second aspect exploits the fact that in video processing applications the sparse matrix has a very particular structure, where the non-zero matrix entries are not randomly distributed but they build small blocks within the sparse matrix. The next feature of the proposed approach addresses removal of ghosting effects originated from foreground silhouettes and the lack of information in the occluded background regions of the image. Finally, the proposed model also tackles algorithmic complexity by introducing an extremely efficient “SVD-free” technique that can be applied in most background/foreground separation tasks for conventional video processing. |
Tasks | |
Published | 2016-03-18 |
URL | http://arxiv.org/abs/1603.05875v1 |
http://arxiv.org/pdf/1603.05875v1.pdf | |
PWC | https://paperswithcode.com/paper/approximated-robust-principal-component |
Repo | |
Framework | |