January 27, 2020

3144 words 15 mins read

Paper Group ANR 1110

6D Object Pose Estimation Based on 2D Bounding Box. Distributed Submodular Minimization via Block-Wise Updates and Communications. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets. A Multimodal Deep Network for the Reconstruction of T2W MR Images. Towards a New Understanding of the Training of Neural Networks with Mislabe …

6D Object Pose Estimation Based on 2D Bounding Box


Title	6D Object Pose Estimation Based on 2D Bounding Box
Authors	Jin Liu, Sheng He
Abstract	In this paper, we present a simple but powerful method to tackle the problem of estimating the 6D pose of objects from a single RGB image. Our system trains a novel convolutional neural network to regress the unit quaternion, which represents the 3D rotation, from the partial image inside the bounding box returned by 2D detection systems. Then we propose an algorithm we call Bounding Box Equation to efficiently and accurately obtain the 3D translation, using 3D rotation and 2D bounding box. Considering that the quadratic sum of the quaternion’s four elements equals to one, we add a normalization layer to keep the network’s output on the unit sphere and put forward a special loss function for unit quaternion regression. We evaluate our method on the LineMod dataset and experiment shows that our approach outperforms base-line and some state of the art methods.
Tasks	6D Pose Estimation using RGB, Pose Estimation
Published	2019-01-27
URL	http://arxiv.org/abs/1901.09366v1
PDF	http://arxiv.org/pdf/1901.09366v1.pdf
PWC	https://paperswithcode.com/paper/6d-object-pose-estimation-based-on-2d
Repo
Framework

Distributed Submodular Minimization via Block-Wise Updates and Communications


Title	Distributed Submodular Minimization via Block-Wise Updates and Communications
Authors	Francesco Farina, Andrea Testa, Giuseppe Notarstefano
Abstract	In this paper we deal with a network of computing agents with local processing and neighboring communication capabilities that aim at solving (without any central unit) a submodular optimization problem. The cost function is the sum of many local submodular functions and each agent in the network has access to one function in the sum only. In this \emph{distributed} set-up, in order to preserve their own privacy, agents communicate with neighbors but do not share their local cost functions. We propose a distributed algorithm in which agents resort to the Lov`{a}sz extension of their local submodular functions and perform local updates and communications in terms of single blocks of the entire optimization variable. Updates are performed by means of a greedy algorithm which is run only until the selected block is computed, thus resulting in a reduced computational burden. The proposed algorithm is shown to converge in expected value to the optimal cost of the problem, and an approximate solution to the submodular problem is retrieved by a thresholding operation. As an application, we consider a distributed image segmentation problem in which each agent has access only to a portion of the entire image. While agent cannot segment the entire image on their own, they correctly complete the task by cooperating through the proposed distributed algorithm.
Tasks	Semantic Segmentation
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13682v1
PDF	https://arxiv.org/pdf/1905.13682v1.pdf
PWC	https://paperswithcode.com/paper/distributed-submodular-minimization-via-block
Repo
Framework

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets


Title	Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
Authors	Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Sanjeev Arora, Rong Ge
Abstract	Mode connectivity is a surprising phenomenon in the loss landscape of deep nets. Optima – at least those discovered by gradient-based optimization – turn out to be connected by simple paths on which the loss function is almost constant. Often, these paths can be chosen to be piece-wise linear, with as few as two segments. We give mathematical explanations for this phenomenon, assuming generic properties (such as dropout stability and noise stability) of well-trained deep nets, which have previously been identified as part of understanding the generalization properties of deep nets. Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06247v2
PDF	https://arxiv.org/pdf/1906.06247v2.pdf
PWC	https://paperswithcode.com/paper/explaining-landscape-connectivity-of-low-cost
Repo
Framework

A Multimodal Deep Network for the Reconstruction of T2W MR Images


Title	A Multimodal Deep Network for the Reconstruction of T2W MR Images
Authors	Antonio Falvo, Danilo Comminiello, Simone Scardapane, Michele Scarpiniti, Aurelio Uncini
Abstract	Multiple sclerosis is one of the most common chronic neurological diseases affecting the central nervous system. Lesions produced by the MS can be observed through two modalities of magnetic resonance (MR), known as T2W and FLAIR sequences, both providing useful information for formulating a diagnosis. However, long acquisition time makes the acquired MR image vulnerable to motion artifacts. This leads to the need of accelerating the execution of the MR analysis. In this paper, we present a deep learning method that is able to reconstruct subsampled MR images obtained by reducing the k-space data, while maintaining a high image quality that can be used to observe brain lesions. The proposed method exploits the multimodal approach of neural networks and it also focuses on the data acquisition and processing stages to reduce execution time of the MR analysis. Results prove the effectiveness of the proposed method in reconstructing subsampled MR images while saving execution time.
Tasks
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03009v2
PDF	https://arxiv.org/pdf/1908.03009v2.pdf
PWC	https://paperswithcode.com/paper/a-multimodal-deep-network-for-the
Repo
Framework

Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data


Title	Towards a New Understanding of the Training of Neural Networks with Mislabeled Training Data
Authors	Herbert Gish, Jan Silovsky, Man-Ling Sung, Man-Hung Siu, William Hartmann, Zhuolin Jiang
Abstract	We investigate the problem of machine learning with mislabeled training data. We try to make the effects of mislabeled training better understood through analysis of the basic model and equations that characterize the problem. This includes results about the ability of the noisy model to make the same decisions as the clean model and the effects of noise on model performance. In addition to providing better insights we also are able to show that the Maximum Likelihood (ML) estimate of the parameters of the noisy model determine those of the clean model. This property is obtained through the use of the ML invariance property and leads to an approach to developing a classifier when training has been mislabeled: namely train the classifier on noisy data and adjust the decision threshold based on the noise levels and/or class priors. We show how our approach to mislabeled training works with multi-layered perceptrons (MLPs).
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.09136v1
PDF	https://arxiv.org/pdf/1909.09136v1.pdf
PWC	https://paperswithcode.com/paper/towards-a-new-understanding-of-the-training
Repo
Framework

Instance-Based Model Adaptation For Direct Speech Translation


Title	Instance-Based Model Adaptation For Direct Speech Translation
Authors	Mattia Antonino Di Gangi, Viet-Nhat Nguyen, Matteo Negri, Marco Turchi
Abstract	Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora. We tackle this limitation with a method to improve data exploitation and boost the system’s performance at inference time. Our approach allows us to customize “on the fly” an existing model to each incoming translation request. At its core, it exploits an instance selection procedure to retrieve, from a given pool of data, a small set of samples similar to the input query in terms of latent properties of its audio signal. The retrieved samples are then used for an instance-specific fine-tuning of the model. We evaluate our approach in three different scenarios. In all data conditions (different languages, in/out-of-domain adaptation), our instance-based adaptation yields coherent performance gains over static models.
Tasks	Domain Adaptation
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10663v1
PDF	https://arxiv.org/pdf/1910.10663v1.pdf
PWC	https://paperswithcode.com/paper/instance-based-model-adaptation-for-direct
Repo
Framework


Title	Blind Hyperspectral-Multispectral Image Fusion via Graph Laplacian Regularization
Authors	Chandrajit Bajaj, Tianming Wang
Abstract	Fusing a low-resolution hyperspectral image (HSI) and a high-resolution multispectral image (MSI) of the same scene leads to a super-resolution image (SRI), which is information rich spatially and spectrally. In this paper, we super-resolve the HSI using the graph Laplacian defined on the MSI. Unlike many existing works, we don’t assume prior knowledge about the spatial degradation from SRI to HSI, nor a perfectly aligned HSI and MSI pair. Our algorithm progressively alternates between finding the blur kernel and fusing HSI with MSI, generating accurate estimations of the blur kernel and the SRI at convergence. Experiments on various datasets demonstrate the advantages of the proposed algorithm in the quality of fusion and its capability in dealing with unknown spatial degradation.
Tasks	Super-Resolution
Published	2019-02-21
URL	http://arxiv.org/abs/1902.08224v1
PDF	http://arxiv.org/pdf/1902.08224v1.pdf
PWC	https://paperswithcode.com/paper/blind-hyperspectral-multispectral-image
Repo
Framework

Homotopic Convex Transformation: A New Landscape Smoothing Method for the Traveling Salesman Problem


Title	Homotopic Convex Transformation: A New Landscape Smoothing Method for the Traveling Salesman Problem
Authors	Jialong Shi, Jianyong Sun, Qingfu Zhang, Kai Ye
Abstract	This paper proposes a novel landscape smoothing method for the symmetric Traveling Salesman Problem (TSP). We first define the Homotopic Convex (HC) transformation of a TSP as a convex combination of a well-constructed simple TSP and the original TSP. The simple TSP, called the convex-hull TSP, is constructed by transforming a known local or global optimum. We observe that controlled by the coefficient of the convex combination, with local or global optimum, (i) the landscape of the HC transformed TSP is smoothed in terms that its number of local optima is reduced compared to the original TSP; (ii) the fitness distance correlation of the HC transformed TSP is increased. Further, we observe that the smoothing effect of the HC transformation depends highly on the quality of the used optimum. A high-quality optimum leads to a better smoothing effect than a low-quality optimum. We then propose an iterative algorithmic framework in which the proposed HC transformation is combined within a heuristic TSP solver. It works as an escaping scheme from local optima aiming to improve the global search ability of the combined heuristic. Case studies using the 3-Opt and the Lin-Kernighan local search as the heuristic solver show that the resultant algorithms significantly outperform their counterparts and two other smoothing-based TSP heuristic solvers on most of the test instances with up to 20,000 cities.
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1906.03223v3
PDF	https://arxiv.org/pdf/1906.03223v3.pdf
PWC	https://paperswithcode.com/paper/homotopic-convex-transformation-a-new-method
Repo
Framework

Bandwidth Embeddings for Mixed-bandwidth Speech Recognition


Title	Bandwidth Embeddings for Mixed-bandwidth Speech Recognition
Authors	Gautam Mantena, Ozlem Kalinli, Ossama Abdel-Hamid, Don McAllaster
Abstract	In this paper, we tackle the problem of handling narrowband and wideband speech by building a single acoustic model (AM), also called mixed bandwidth AM. In the proposed approach, an auxiliary input feature is used to provide the bandwidth information to the model, and bandwidth embeddings are jointly learned as part of acoustic model training. Experimental evaluations show that using bandwidth embeddings helps the model to handle the variability of the narrow and wideband speech, and makes it possible to train a mixed-bandwidth AM. Furthermore, we propose to use parallel convolutional layers to handle the mismatch between the narrow and wideband speech better, where separate convolution layers are used for each type of input speech signal. Our best system achieves 13% relative improvement on narrowband speech, while not degrading on wideband speech.
Tasks	Speech Recognition
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02667v1
PDF	https://arxiv.org/pdf/1909.02667v1.pdf
PWC	https://paperswithcode.com/paper/bandwidth-embeddings-for-mixed-bandwidth
Repo
Framework

Reinforcement Learning for Integer Programming: Learning to Cut


Title	Reinforcement Learning for Integer Programming: Learning to Cut
Authors	Yunhao Tang, Shipra Agrawal, Yuri Faenza
Abstract	Integer programming (IP) is a general optimization framework widely applicable to a variety of unstructured and structured problems arising in, e.g., scheduling, production planning, and graph optimization. As IP models many provably hard to solve problems, modern IP solvers rely on many heuristics. These heuristics are usually human-designed, and naturally prone to suboptimality. The goal of this work is to show that the performance of those solvers can be greatly enhanced using reinforcement learning (RL). In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method. This method is employed as a subroutine by all modern IP solvers. We present a deep RL formulation, network architecture, and algorithms for intelligent adaptive selection of cutting planes (aka cuts). Across a wide range of IP tasks, we show that the trained RL agent significantly outperforms human-designed heuristics, and effectively generalizes to 10X larger instances and across IP problem classes. The trained agent is also demonstrated to benefit the popular downstream application of cutting plane methods in Branch-and-Cut algorithm, which is the backbone of state-of-the-art commercial IP solvers.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04859v1
PDF	https://arxiv.org/pdf/1906.04859v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-integer
Repo
Framework

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition


Title	Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
Authors	Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny
Abstract	In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models. Therefore mixed-bandwidth (MB) acoustic modeling has important practical values for ASR system deployment. In this paper, we extensively investigate large-scale MB deep neural network acoustic modeling for ASR using 1,150 hours of WB data and 2,300 hours of NB data. We study various MB strategies including downsampling, upsampling and bandwidth extension for MB acoustic modeling and evaluate their performance on 8 diverse WB and NB test sets from various application domains. To deal with the large amounts of training data, distributed training is carried out on multiple GPUs using synchronous data parallelism.
Tasks	Speech Recognition
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04887v1
PDF	https://arxiv.org/pdf/1907.04887v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-mixed-bandwidth-deep-neural
Repo
Framework

PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module


Title	PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module
Authors	Liang Xie, Chao Xiang, Zhengxu Yu, Guodong Xu, Zheng Yang, Deng Cai, Xiaofei He
Abstract	LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Birds Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multi-sensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve state-of-the-art on the metric of 3D AP.
Tasks	3D Object Detection, Object Detection, Semantic Segmentation
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06084v3
PDF	https://arxiv.org/pdf/1911.06084v3.pdf
PWC	https://paperswithcode.com/paper/pi-rcnn-an-efficient-multi-sensor-3d-object
Repo
Framework

Chinese Spelling Error Detection Using a Fusion Lattice LSTM


Title	Chinese Spelling Error Detection Using a Fusion Lattice LSTM
Authors	Hao Wang, Bing Wang, Jianyong Duan, Jiajun Zhang
Abstract	Spelling error detection serves as a crucial preprocessing in many natural language processing applications. Due to the characteristics of Chinese Language, Chinese spelling error detection is more challenging than error detection in English. Existing methods are mainly under a pipeline framework, which artificially divides error detection process into two steps. Thus, these methods bring error propagation and cannot always work well due to the complexity of the language environment. Besides existing methods only adopt character or word information, and ignore the positive effect of fusing character, word, pinyin1 information together. We propose an LF-LSTM-CRF model, which is an extension of the LSTMCRF with word lattices and character-pinyin-fusion inputs. Our model takes advantage of the end-to-end framework to detect errors as a whole process, and dynamically integrates character, word and pinyin information. Experiments on the SIGHAN data show that our LF-LSTM-CRF outperforms existing methods with similar external resources consistently, and confirm the feasibility of adopting the end-to-end framework and the availability of integrating of character, word and pinyin information.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10750v1
PDF	https://arxiv.org/pdf/1911.10750v1.pdf
PWC	https://paperswithcode.com/paper/chinese-spelling-error-detection-using-a
Repo
Framework

Causal inference for climate change events from satellite image time series using computer vision and deep learning


Title	Causal inference for climate change events from satellite image time series using computer vision and deep learning
Authors	Vikas Ramachandra
Abstract	We propose a method for causal inference using satellite image time series, in order to determine the treatment effects of interventions which impact climate change, such as deforestation. Simply put, the aim is to quantify the ‘before versus after’ effect of climate related human driven interventions, such as urbanization; as well as natural disasters, such as hurricanes and forest fires. As a concrete example, we focus on quantifying forest tree cover change/ deforestation due to human led causes. The proposed method involves the following steps. First, we uae computer vision and machine learning/deep learning techniques to detect and quantify forest tree coverage levels over time, at every time epoch. We then look at this time series to identify changepoints. Next, we estimate the expected (forest tree cover) values using a Bayesian structural causal model and projecting/forecasting the counterfactual. This is compared to the values actually observed post intervention, and the difference in the two values gives us the effect of the intervention (as compared to the non intervention scenario, i.e. what would have possibly happened without the intervention). As a specific use case, we analyze deforestation levels before and after the hyperinflation event (intervention) in Brazil (which ended in 1993-94), for the Amazon rainforest region, around Rondonia, Brazil. For this deforestation use case, using our causal inference framework can help causally attribute change/reduction in forest tree cover and increasing deforestation rates due to human activities at various points in time.
Tasks	Causal Inference, Time Series
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11492v1
PDF	https://arxiv.org/pdf/1910.11492v1.pdf
PWC	https://paperswithcode.com/paper/causal-inference-for-climate-change-events
Repo
Framework

Deep Model Transferability from Attribution Maps


Title	Deep Model Transferability from Attribution Maps
Authors	Jie Song, Yixin Chen, Xinchao Wang, Chengchao Shen, Mingli Song
Abstract	Exploring the transferability between heterogeneous tasks sheds light on their intrinsic interconnections, and consequently enables knowledge transfer from one task to another so as to reduce the training effort of the latter. In this paper, we propose an embarrassingly simple yet very efficacious approach to estimating the transferability of deep networks, especially those handling vision tasks. Unlike the seminal work of taskonomy that relies on a large number of annotations as supervision and is thus computationally cumbersome, the proposed approach requires no human annotations and imposes no constraints on the architectures of the networks. This is achieved, specifically, via projecting deep networks into a model space, wherein each network is treated as a point and the distances between two points are measured by deviations of their produced attribution maps. The proposed approach is several-magnitude times faster than taskonomy, and meanwhile preserves a task-wise topological structure highly similar to the one obtained by taskonomy. Code is available at https://github.com/zju-vipa/TransferbilityFromAttributionMaps.
Tasks	Transfer Learning
Published	2019-09-26
URL	https://arxiv.org/abs/1909.11902v2
PDF	https://arxiv.org/pdf/1909.11902v2.pdf
PWC	https://paperswithcode.com/paper/deep-model-transferability-from-attribution
Repo
Framework