January 27, 2020

3247 words 16 mins read

Paper Group ANR 1270

Character 3-gram Mover’s Distance: An Effective Method for Detecting Near-duplicate Japanese-language Recipes. Histographs: Graphs in Histopathology. Weakly Supervised Object Localization with Inter-Intra Regulated CAMs. A Short Note on the Kinetics-700 Human Action Dataset. Benefit of Interpolation in Nearest Neighbor Algorithms. Document-level Ne …

Character 3-gram Mover’s Distance: An Effective Method for Detecting Near-duplicate Japanese-language Recipes


Title	Character 3-gram Mover’s Distance: An Effective Method for Detecting Near-duplicate Japanese-language Recipes
Authors	Masaki Oguni, Yohei Seki, Yu Hirate
Abstract	In user-generated recipe websites, users post their-original recipes. Some recipes, however, are very similar in major components such as the cooking instructions to other recipes. We refer to such recipes as “near-duplicate recipes”. In this study, we propose a method that extends the “Word Mover’s Distance”, which calculates distances between texts based on word embedding, to character 3-gram embedding. Using a corpus of over 1.21 million recipes, we learned the word embedding and the character 3-gram embedding by using a Skip-Gram model with negative sampling and fastText to extract candidate pairs of near-duplicate recipes. We then annotated these candidates and evaluated the proposed method against a comparison method. Our results demonstrated that near-duplicate recipes that were not detected by the comparison method were successfully detected by the proposed method.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05171v2
PDF	https://arxiv.org/pdf/1912.05171v2.pdf
PWC	https://paperswithcode.com/paper/character-3-gram-movers-distance-an-effective
Repo
Framework

Histographs: Graphs in Histopathology


Title	Histographs: Graphs in Histopathology
Authors	Shrey Gadiya, Deepak Anand, Amit Sethi
Abstract	Spatial arrangement of cells of various types, such as tumor infiltrating lymphocytes and the advancing edge of a tumor, are important features for detecting and characterizing cancers. However, convolutional neural networks (CNNs) do not explicitly extract intricate features of the spatial arrangements of the cells from histopathology images. In this work, we propose to classify cancers using graph convolutional networks (GCNs) by modeling a tissue section as a multi-attributed spatial graph of its constituent cells. Cells are detected using their nuclei in H&E stained tissue image, and each cell’s appearance is captured as a multi-attributed high-dimensional vertex feature. The spatial relations between neighboring cells are captured as edge features based on their distances in a graph. We demonstrate the utility of this approach by obtaining classification accuracy that is competitive with CNNs, specifically, Inception-v3, on two tasks-cancerous versus non-cancerous and in situ versus invasive-on the BACH breast cancer dataset.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05020v1
PDF	https://arxiv.org/pdf/1908.05020v1.pdf
PWC	https://paperswithcode.com/paper/histographs-graphs-in-histopathology
Repo
Framework

Weakly Supervised Object Localization with Inter-Intra Regulated CAMs


Title	Weakly Supervised Object Localization with Inter-Intra Regulated CAMs
Authors	Guofeng Cui, Ziyi Kou, Shaojie Wang, Wentian Zhao, Chenliang Xu
Abstract	Weakly supervised object localization (WSOL) aims to locate objects in images by learning only from image-level labels. Current methods are trying to obtain localization results relying on Class Activation Maps (CAMs). Usually, they propose additional CAMs or feature maps generated from internal layers of deep networks to encourage different CAMs to be either \textbf{adversarial} or \textbf{cooperated} with each other. In this work, instead of following one of the two main approaches before, we analyze their internal relationship and propose a novel intra-sample strategy which regulates two CAMs of the same sample, generated from different classifiers, to dynamically adapt each of their pixels involved in adversarial or cooperative process based on their own values. We mathematically demonstrate that our approach is a more general version of the current state-of-the-art method with less hyper-parameters. Besides, we further develop an inter-sample criterion module for our WSOL task, which is originally proposed in co-segmentation problems, to refine generated CAMs of each sample. The module considers a subgroup of samples under the same category and regulates their object regions. With experiment on two widely-used datasets, we show that our proposed method significantly outperforms existing state-of-the-art, setting a new record for weakly-supervised object localization.
Tasks	Object Localization, Weakly-Supervised Object Localization
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07160v2
PDF	https://arxiv.org/pdf/1911.07160v2.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-object-localization-with-2
Repo
Framework

A Short Note on the Kinetics-700 Human Action Dataset


Title	A Short Note on the Kinetics-700 Human Action Dataset
Authors	Joao Carreira, Eric Noland, Chloe Hillier, Andrew Zisserman
Abstract	We describe an extension of the DeepMind Kinetics human action dataset from 600 classes to 700 classes, where for each class there are at least 600 video clips from different YouTube videos. This paper details the changes introduced for this new release of the dataset, and includes a comprehensive set of statistics as well as baseline results using the I3D neural network architecture.
Tasks
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06987v1
PDF	https://arxiv.org/pdf/1907.06987v1.pdf
PWC	https://paperswithcode.com/paper/a-short-note-on-the-kinetics-700-human-action
Repo
Framework

Benefit of Interpolation in Nearest Neighbor Algorithms


Title	Benefit of Interpolation in Nearest Neighbor Algorithms
Authors	Yue Xing, Qifan Song, Guang Cheng
Abstract	The over-parameterized models attract much attention in the era of data science and deep learning. It is empirically observed that although these models, e.g. deep neural networks, over-fit the training data, they can still achieve small testing error, and sometimes even {\em outperform} traditional algorithms which are designed to avoid over-fitting. The major goal of this work is to sharply quantify the benefit of data interpolation in the context of nearest neighbors (NN) algorithm. Specifically, we consider a class of interpolated weighting schemes and then carefully characterize their asymptotic performances. Our analysis reveals a U-shaped performance curve with respect to the level of data interpolation, and proves that a mild degree of data interpolation {\em strictly} improves the prediction accuracy and statistical stability over those of the (un-interpolated) optimal $k$NN algorithm. This theoretically justifies (predicts) the existence of the second U-shaped curve in the recently discovered double descent phenomenon. Note that our goal in this study is not to promote the use of interpolated-NN method, but to obtain theoretical insights on data interpolation inspired by the aforementioned phenomenon.
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11720v1
PDF	https://arxiv.org/pdf/1909.11720v1.pdf
PWC	https://paperswithcode.com/paper/benefit-of-interpolation-in-nearest-neighbor
Repo
Framework

Document-level Neural Machine Translation with Inter-Sentence Attention


Title	Document-level Neural Machine Translation with Inter-Sentence Attention
Authors	Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, Hai Zhao, Bao-liang Lu
Abstract	Standard neural machine translation (NMT) is on the assumption of document-level context independent. Most existing document-level NMT methods only focus on briefly introducing document-level information but fail to concern about selecting the most related part inside document context. The capacity of memory network for detecting the most relevant part of the current sentence from the memory provides a natural solution for the requirement of modeling document-level context by document-level NMT. In this work, we propose a Transformer NMT system with associated memory network (AMN) to both capture the document-level context and select the most salient part related to the concerned translation from the memory. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.
Tasks	Machine Translation
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14528v1
PDF	https://arxiv.org/pdf/1910.14528v1.pdf
PWC	https://paperswithcode.com/paper/document-level-neural-machine-translation-2
Repo
Framework

Pseudo-task Regularization for ConvNet Transfer Learning


Title	Pseudo-task Regularization for ConvNet Transfer Learning
Authors	Yang Zhong, Atsuto Maki
Abstract	This paper is about regularizing deep convolutional networks (ConvNets) based on an adaptive multi-objective framework for transfer learning with limited training data in the target domain. Recent advances of ConvNets regularization in this context are commonly due to the use of additional regularization objectives. They guide the training away from the target task using some concrete tasks. Unlike those related approaches, we report that an objective without a concrete goal can serve surprisingly well as a regularizer. In particular, we demonstrate Pseudo-task Regularization (PtR) which dynamically regularizes a network by simply attempting to regress image representations to a pseudo-target during fine-tuning. Through numerous experiments, the improvements on classification accuracy by PtR are shown greater or on a par to the recent state-of-the-art methods. These results also indicate a room for rethinking on the requirements for a regularizer, i.e., if specifically designed task for regularization is indeed a key ingredient. The contributions of this paper are: a) PtR provides an effective and efficient alternative for regularization without dependence on concrete tasks or extra data; b) desired strength of regularization effect in PtR is dynamically adjusted and maintained based on the gradient norms of the target objective and the pseudo-task.
Tasks	Transfer Learning
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05997v1
PDF	https://arxiv.org/pdf/1908.05997v1.pdf
PWC	https://paperswithcode.com/paper/pseudo-task-regularization-for-convnet
Repo
Framework

Combinational Class Activation Maps for Weakly Supervised Object Localization


Title	Combinational Class Activation Maps for Weakly Supervised Object Localization
Authors	Seunghan Yang, Yoonhyung Kim, Youngeun Kim, Changick Kim
Abstract	Weakly supervised object localization has recently attracted attention since it aims to identify both class labels and locations of objects by using image-level labels. Most previous methods utilize the activation map corresponding to the highest activation source. Exploiting only one activation map of the highest probability class is often biased into limited regions or sometimes even highlights background regions. To resolve these limitations, we propose to use activation maps, named combinational class activation maps (CCAM), which are linear combinations of activation maps from the highest to the lowest probability class. By using CCAM for localization, we suppress background regions to help highlighting foreground objects more accurately. In addition, we design the network architecture to consider spatial relationships for localizing relevant object regions. Specifically, we integrate non-local modules into an existing base network at both low- and high-level layers. Our final model, named non-local combinational class activation maps (NL-CCAM), obtains superior performance compared to previous methods on representative object localization benchmarks including ILSVRC 2016 and CUB-200-2011. Furthermore, we show that the proposed method has a great capability of generalization by visualizing other datasets.
Tasks	Object Localization, Weakly-Supervised Object Localization
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05518v2
PDF	https://arxiv.org/pdf/1910.05518v2.pdf
PWC	https://paperswithcode.com/paper/combinational-class-activation-maps-for
Repo
Framework

n-MeRCI: A new Metric to Evaluate the Correlation Between Predictive Uncertainty and True Error


Title	n-MeRCI: A new Metric to Evaluate the Correlation Between Predictive Uncertainty and True Error
Authors	Michel Moukari, Loïc Simon, Sylvaine Picard, Frédéric Jurie
Abstract	As deep learning applications are becoming more and more pervasive in robotics, the question of evaluating the reliability of inferences becomes a central question in the robotics community. This domain, known as predictive uncertainty, has come under the scrutiny of research groups developing Bayesian approaches adapted to deep learning such as Monte Carlo Dropout. Unfortunately, for the time being, the real goal of predictive uncertainty has been swept under the rug. Indeed, these approaches are solely evaluated in terms of raw performance of the network prediction, while the quality of their estimated uncertainty is not assessed. Evaluating such uncertainty prediction quality is especially important in robotics, as actions shall depend on the confidence in perceived information. In this context, the main contribution of this article is to propose a novel metric that is adapted to the evaluation of relative uncertainty assessment and directly applicable to regression with deep neural networks. To experimentally validate this metric, we evaluate it on a toy dataset and then apply it to the task of monocular depth estimation.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07253v1
PDF	https://arxiv.org/pdf/1908.07253v1.pdf
PWC	https://paperswithcode.com/paper/n-merci-a-new-metric-to-evaluate-the
Repo
Framework

Semi-Supervised Adversarial Monocular Depth Estimation


Title	Semi-Supervised Adversarial Monocular Depth Estimation
Authors	Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, Jiebo Luo
Abstract	In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available. To achieve a high regression accuracy, the state-of-the-art estimation methods rely on CNNs trained with a large number of image-depth pairs, which are prohibitively costly or even infeasible to acquire. Aiming to break the curse of such expensive data collections, we propose a semi-supervised adversarial learning framework that only utilizes a small number of image-depth pairs in conjunction with a large number of easily-available monocular images to achieve high performance. In particular, we use one generator to regress the depth and two discriminators to evaluate the predicted depth , i.e., one inspects the image-depth pair while the other inspects the depth channel alone. These two discriminators provide their feedbacks to the generator as the loss to generate more realistic and accurate depth predictions. Experiments show that the proposed approach can (1) improve most state-of-the-art models on the NYUD v2 dataset by effectively leveraging additional unlabeled data sources; (2) reach state-of-the-art accuracy when the training set is small, e.g., on the Make3D dataset; (3) adapt well to an unseen new dataset (Make3D in our case) after training on an annotated dataset (KITTI in our case).
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02126v1
PDF	https://arxiv.org/pdf/1908.02126v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-adversarial-monocular-depth
Repo
Framework

Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization


Title	Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization
Authors	Wenju Xu, Yuanwei Wu, Wenchi Ma, Guanghui Wang
Abstract	In this paper, we address the problem of weakly supervised object localization (WSL), which trains a detection network on the dataset with only image-level annotations. The proposed approach is built on the observation that the proposal set from the training dataset is a collection of background, object parts, and objects. Several strategies are taken to adaptively eliminate the noisy proposals and generate pseudo object-level annotations for the weakly labeled dataset. A multiple instance learning (MIL) algorithm enhanced by mask-out strategy is adopted to collect the class-specific object proposals, which are then utilized to adapt a pre-trained classification network to a detection network. In addition, the detection results from the detection network are re-weighted by jointly considering the detection scores and the overlap ratio of proposals in a proposal subset optimization framework. The optimal proposals work as object-level labels that enable a pseudo-strongly supervised dataset for training the detection network. Consequently, we establish a fully adaptive detection network. Extensive evaluations on the PASCAL VOC 2007 and 2012 datasets demonstrate a significant improvement compared with the state-of-the-art methods.
Tasks	Denoising, Multiple Instance Learning, Object Localization, Weakly-Supervised Object Localization
Published	2019-10-04
URL	https://arxiv.org/abs/1910.02101v2
PDF	https://arxiv.org/pdf/1910.02101v2.pdf
PWC	https://paperswithcode.com/paper/adaptively-denoising-proposal-collection-for
Repo
Framework

A Rate-Distortion Framework for Explaining Neural Network Decisions


Title	A Rate-Distortion Framework for Explaining Neural Network Decisions
Authors	Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok
Abstract	We formalise the widespread idea of interpreting neural network decisions as an explicit optimisation problem in a rate-distortion framework. A set of input features is deemed relevant for a classification decision if the expected classifier score remains nearly constant when randomising the remaining features. We discuss the computational complexity of finding small sets of relevant features and show that the problem is complete for $\mathsf{NP}^\mathsf{PP}$, an important class of computational problems frequently arising in AI tasks. Furthermore, we show that it even remains $\mathsf{NP}$-hard to only approximate the optimal solution to within any non-trivial approximation factor. Finally, we consider a continuous problem relaxation and develop a heuristic solution strategy based on assumed density filtering for deep ReLU neural networks. We present numerical experiments for two image classification data sets where we outperform established methods in particular for sparse explanations of neural network decisions.
Tasks	Image Classification
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11092v1
PDF	https://arxiv.org/pdf/1905.11092v1.pdf
PWC	https://paperswithcode.com/paper/a-rate-distortion-framework-for-explaining
Repo
Framework

Unique Sharp Local Minimum in $\ell_1$-minimization Complete Dictionary Learning


Title	Unique Sharp Local Minimum in $\ell_1$-minimization Complete Dictionary Learning
Authors	Yu Wang, Siqi Wu, Bin Yu
Abstract	We study the problem of globally recovering a dictionary from a set of signals via $\ell_1$-minimization. We assume that the signals are generated as i.i.d. random linear combinations of the $K$ atoms from a complete reference dictionary $D^\in \mathbb R^{K\times K}$, where the linear combination coefficients are from either a Bernoulli type model or exact sparse model. First, we obtain a necessary and sufficient norm condition for the reference dictionary $D^$ to be a sharp local minimum of the expected $\ell_1$ objective function. Our result substantially extends that of Wu and Yu (2015) and allows the combination coefficient to be non-negative. Secondly, we obtain an explicit bound on the region within which the objective value of the reference dictionary is minimal. Thirdly, we show that the reference dictionary is the unique sharp local minimum, thus establishing the first known global property of $\ell_1$-minimization dictionary learning. Motivated by the theoretical results, we introduce a perturbation-based test to determine whether a dictionary is a sharp local minimum of the objective function. In addition, we also propose a new dictionary learning algorithm based on Block Coordinate Descent, called DL-BCD, which is guaranteed to have monotonic convergence. Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to many state-of-the-art dictionary learning algorithms.
Tasks	Dictionary Learning
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08380v1
PDF	http://arxiv.org/pdf/1902.08380v1.pdf
PWC	https://paperswithcode.com/paper/unique-sharp-local-minimum-in-ell_1
Repo
Framework

Learning to Track Any Object


Title	Learning to Track Any Object
Authors	Achal Dave, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan
Abstract	Object tracking can be formulated as “finding the right object in a video”. We observe that recent approaches for class-agnostic tracking tend to focus on the “finding” part, but largely overlook the “object” part of the task, essentially doing a template matching over a frame in a sliding-window. In contrast, class-specific trackers heavily rely on object priors in the form of category-specific object detectors. In this work, we re-purpose category-specific appearance models into a generic objectness prior. Our approach converts a category-specific object detector into a category-agnostic, object-specific detector (i.e. a tracker) efficiently, on the fly. Moreover, at test time the same network can be applied to detection and tracking, resulting in a unified approach for the two tasks. We achieve state-of-the-art results on two recent large-scale tracking benchmarks (OxUvA and GOT, using external data). By simply adding a mask prediction branch, our approach is able to produce instance segmentation masks for the tracked object. Despite only using box-level information on the first frame, our method outputs high-quality masks, as evaluated on the DAVIS ‘17 video object segmentation benchmark.
Tasks	Instance Segmentation, Object Tracking, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11844v1
PDF	https://arxiv.org/pdf/1910.11844v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-track-any-object
Repo
Framework

Deep Learning of Subsurface Flow via Theory-guided Neural Network


Title	Deep Learning of Subsurface Flow via Theory-guided Neural Network
Authors	Nanzhe Wang, Dongxiao Zhang, Haibin Chang, Heng Li
Abstract	Active researches are currently being performed to incorporate the wealth of scientific knowledge into data-driven approaches (e.g., neural networks) in order to improve the latter’s effectiveness. In this study, the Theory-guided Neural Network (TgNN) is proposed for deep learning of subsurface flow. In the TgNN, as supervised learning, the neural network is trained with available observations or simulation data while being simultaneously guided by theory (e.g., governing equations, other physical constraints, engineering controls, and expert knowledge) of the underlying problem. The TgNN can achieve higher accuracy than the ordinary Artificial Neural Network (ANN) because the former provides physically feasible predictions and can be more readily generalized beyond the regimes covered with the training data. Furthermore, the TgNN model is proposed for subsurface flow with heterogeneous model parameters. Several numerical cases of two-dimensional transient saturated flow are introduced to test the performance of the TgNN. In the learning process, the loss function contains data mismatch, as well as PDE constraint, engineering control, and expert knowledge. After obtaining the parameters of the neural network by minimizing the loss function, a TgNN model is built that not only fits the data, but also adheres to physical/engineering constraints. Predicting the future response can be easily realized by the TgNN model. In addition, the TgNN model is tested in more complicated scenarios, such as prediction with changed boundary conditions, learning from noisy data or outliers, transfer learning, and engineering controls. Numerical results demonstrate that the TgNN model achieves much better predictability, reliability, and generalizability than ANN models due to the physical/engineering constraints in the former.
Tasks	Transfer Learning
Published	2019-10-24
URL	https://arxiv.org/abs/1911.00103v1
PDF	https://arxiv.org/pdf/1911.00103v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-of-subsurface-flow-via-theory
Repo
Framework