January 28, 2020

2680 words 13 mins read

Paper Group ANR 790

Gauge theory and twins paradox of disentangled representations. Deep Griffin-Lim Iteration. Trust-Region Variational Inference with Gaussian Mixture Models. Proceedings of AAAI 2019 Workshop on Network Interpretability for Deep Learning. Deep Robotic Prediction with hierarchical RGB-D Fusion. Supervised Learning of the Next-Best-View for 3D Object …

Gauge theory and twins paradox of disentangled representations


Title	Gauge theory and twins paradox of disentangled representations
Authors	X. Dong, L. Zhou
Abstract	Achieving disentangled representations of information is one of the key goals of deep network based machine learning system. Recently there are more discussions on this issue. In this paper, by comparing the geometric structure of disentangled representation and the geometry of the evolution of mixed states in quantum mechanics, we give a fibre bundle based geometric picture of disentangled representation which can be regarded as a kind of gauge theory. From this perspective we can build a connection between the disentangled representations and the twins paradox in relativity. This can help to clarify some problems about disentangled representation.
Tasks
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10545v1
PDF	https://arxiv.org/pdf/1906.10545v1.pdf
PWC	https://paperswithcode.com/paper/gauge-theory-and-twins-paradox-of
Repo
Framework

Deep Griffin-Lim Iteration


Title	Deep Griffin-Lim Iteration
Authors	Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada
Abstract	This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of the short-time Fourier transform. However, GLA often involves many iterations and produces low-quality signals owing to the lack of prior knowledge of the target signal. In order to address these issues, in this study, we propose an architecture which stacks a sub-block including two GLA-inspired fixed layers and a DNN. The number of stacked sub-blocks is adjustable, and we can trade the performance and computational load based on requirements of applications. The effectiveness of the proposed method is investigated by reconstructing phases from amplitude spectrograms of speeches.
Tasks
Published	2019-03-10
URL	http://arxiv.org/abs/1903.03971v1
PDF	http://arxiv.org/pdf/1903.03971v1.pdf
PWC	https://paperswithcode.com/paper/deep-griffin-lim-iteration
Repo
Framework

Trust-Region Variational Inference with Gaussian Mixture Models


Title	Trust-Region Variational Inference with Gaussian Mixture Models
Authors	Oleg Arenz, Mingjun Zhong, Gerhard Neumann
Abstract	Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by establishing information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. The use of the lower bound ensures convergence to a local optimum of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multi-modal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04710v1
PDF	https://arxiv.org/pdf/1907.04710v1.pdf
PWC	https://paperswithcode.com/paper/trust-region-variational-inference-with
Repo
Framework

Proceedings of AAAI 2019 Workshop on Network Interpretability for Deep Learning


Title	Proceedings of AAAI 2019 Workshop on Network Interpretability for Deep Learning
Authors	Quanshi Zhang, Lixin Fan, Bolei Zhou
Abstract	This is the Proceedings of AAAI 2019 Workshop on Network Interpretability for Deep Learning
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08813v2
PDF	http://arxiv.org/pdf/1901.08813v2.pdf
PWC	https://paperswithcode.com/paper/proceedings-of-aaai-2019-workshop-on-network
Repo
Framework

Deep Robotic Prediction with hierarchical RGB-D Fusion


Title	Deep Robotic Prediction with hierarchical RGB-D Fusion
Authors	Yaoxian Song, Jun Wen, Yuejiao Fei, Changbin Yu
Abstract	Robotic arm grasping is a fundamental operation in robotic control task goals. Most current methods for robotic grasping focus on RGB-D policy in the table surface scenario or 3D point cloud analysis and inference in the 3D space. Comparing to these methods, we propose a novel real-time multimodal hierarchical encoder-decoder neural network that fuses RGB and depth data to realize robotic humanoid grasping in 3D space with only partial observation. The quantification of raw depth data’s uncertainty and depth estimation fusing RGB is considered. We develop a general labeling method to label ground-truth on common RGB-D datasets. We evaluate the effectiveness and performance of our method on a physical robot setup and our method achieves over 90% success rate in both table surface and 3D space scenarios.
Tasks	Depth Estimation, Robotic Grasping
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06585v2
PDF	https://arxiv.org/pdf/1909.06585v2.pdf
PWC	https://paperswithcode.com/paper/deep-robotic-prediction-with-hierarchical-rgb
Repo
Framework

Supervised Learning of the Next-Best-View for 3D Object Reconstruction


Title	Supervised Learning of the Next-Best-View for 3D Object Reconstruction
Authors	Miguel Mendoza, J. Irving Vasquez-Gomez, Hind Taud, Luis Enrique Sucar, Carolina Reta
Abstract	Motivated by the advances in 3D sensing technology and the spreading of low-cost robotic platforms, 3D object reconstruction has become a common task in many areas. Nevertheless, the selection of the optimal sensor pose that maximizes the reconstructed surface is a problem that remains open. It is known in the literature as the next-best-view planning problem. In this paper, we propose a novel next-best-view planning scheme based on supervised deep learning. The scheme contains an algorithm for automatic generation of datasets and an original three-dimensional convolutional neural network (3D-CNN) used to learn the next-best-view. Unlike previous work where the problem is addressed as a search, the trained 3D-CNN directly predicts the sensor pose. We present a comparison of the proposed network against a similar net, and we present several experiments of the reconstruction of unknown objects validating the effectiveness of the proposed scheme.
Tasks	3D Object Reconstruction, Object Reconstruction
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05833v1
PDF	https://arxiv.org/pdf/1905.05833v1.pdf
PWC	https://paperswithcode.com/paper/supervised-learning-of-the-next-best-view-for
Repo
Framework

Multiplicative Up-Drift


Title	Multiplicative Up-Drift
Authors	Benjamin Doerr, Timo Kötzing
Abstract	Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a $(1+\delta)$-multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible near-linear dependence on $1/\delta$ (the previous results had an at least near-quadratic dependence), and it only requires a population size near-linear in $\delta$ (this was super-quadratic in previous results). These improvements immediately lead to stronger run time guarantees for a number of applications. We also discuss the case of large $\delta$ and show stronger results for this setting.
Tasks
Published	2019-04-11
URL	https://arxiv.org/abs/1904.05682v3
PDF	https://arxiv.org/pdf/1904.05682v3.pdf
PWC	https://paperswithcode.com/paper/multiplicative-up-drift
Repo
Framework

Building 3D Object Models during Manipulation by Reconstruction-Aware Trajectory Optimization


Title	Building 3D Object Models during Manipulation by Reconstruction-Aware Trajectory Optimization
Authors	Kanrun Huang, Tucker Hermans
Abstract	Object shape provides important information for robotic manipulation; for instance, selecting an effective grasp depends on both the global and local shape of the object of interest, while reaching into clutter requires accurate surface geometry to avoid unintended contact with the environment. Model-based 3D object manipulation is a widely studied problem; however, obtaining the accurate 3D object models for multiple objects often requires tedious work. In this letter, we exploit Gaussian process implicit surfaces (GPIS) extracted from RGB-D sensor data to grasp an unknown object. We propose a reconstruction-aware trajectory optimization that makes use of the extracted GPIS model plan a motion to improve the ability to estimate the object’s 3D geometry, while performing a pick-and-place action. We present a probabilistic approach for a robot to autonomously learn and track the object, while achieve the manipulation task. We use a sampling-based trajectory generation method to explore the unseen parts of the object using the estimated conditional entropy of the GPIS model. We validate our method with physical robot experiments across eleven different objects of varying shape from the YCB object dataset. Our experiments show that our reconstruction-aware trajectory optimization provides higher-quality 3D object reconstruction when compared with directly solving the manipulation task or using a heuristic to view unseen portions of the object.
Tasks	3D Object Reconstruction, Object Reconstruction
Published	2019-05-10
URL	https://arxiv.org/abs/1905.03907v1
PDF	https://arxiv.org/pdf/1905.03907v1.pdf
PWC	https://paperswithcode.com/paper/building-3d-object-models-during-manipulation
Repo
Framework

Scalable High Performance SDN Switch Architecture on FPGA for Core Networks


Title	Scalable High Performance SDN Switch Architecture on FPGA for Core Networks
Authors	Sasindu Wijeratne, Ashen Ekanayake, Sandaruwan Jayaweera, Danuka Ravishan, Ajith Pasqual
Abstract	Due to the increasing heterogeneity in network user requirements, dynamically varying day to day network traffic patterns and delay in-network service deployment, there is a huge demand for scalability and flexibility in modern networking infrastructure, which in return has paved way for the introduction of Software Defined Networking (SDN) in core networks. In this paper, we present an FPGA-based switch that is fully compliant with OpenFlow; the pioneering protocol for southbound interface of SDN. The switch architecture is completely implemented on hardware. The design consists of an OpenFlow Southbound agent which can process OpenFlow packets at a rate of 10Gbps. The proposed architecture speed scales up to 400Gbps while it consumes only 60% of resources on a Xilinx Virtex-7 featuring XC7VX485T FPGA.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13683v1
PDF	https://arxiv.org/pdf/1910.13683v1.pdf
PWC	https://paperswithcode.com/paper/scalable-high-performance-sdn-switch
Repo
Framework

A Baseline for Few-Shot Image Classification


Title	A Baseline for Few-Shot Image Classification
Authors	Guneet S. Dhillon, Pratik Chaudhari, Avinash Ravichandran, Stefano Soatto
Abstract	Fine-tuning a deep network trained with the standard cross-entropy loss is a strong baseline for few-shot learning. When fine-tuned transductively, this outperforms the current state-of-the-art on standard datasets such as Mini-ImageNet, Tiered-ImageNet, CIFAR-FS and FC-100 with the same hyper-parameters. The simplicity of this approach enables us to demonstrate the first few-shot learning results on the ImageNet-21k dataset. We find that using a large number of meta-training classes results in high few-shot accuracies even for a large number of few-shot classes. We do not advocate our approach as the solution for few-shot learning, but simply use the results to highlight limitations of current benchmarks and few-shot protocols. We perform extensive studies on benchmark datasets to propose a metric that quantifies the “hardness” of a few-shot episode. This metric can be used to report the performance of few-shot algorithms in a more systematic way.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Image Classification
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02729v4
PDF	https://arxiv.org/pdf/1909.02729v4.pdf
PWC	https://paperswithcode.com/paper/a-baseline-for-few-shot-image-classification
Repo
Framework

Learning about spatial inequalities: Capturing the heterogeneity in the urban environment


Title	Learning about spatial inequalities: Capturing the heterogeneity in the urban environment
Authors	J. Siqueira-Gay, M. A. Giannotti, M. Sester
Abstract	Transportation systems can be conceptualized as an instrument of spreading people and resources over the territory, playing an important role in developing sustainable cities. The current rationale of transport provision is based on population demand, disregarding land use and socioeconomic information. To meet the challenge to promote a more equitable resource distribution, this work aims at identifying and describing patterns of urban services supply, their accessibility, and household income. By using a multidimensional approach, the spatial inequalities of a large city of the global south reveal that the low-income population has low access mainly to hospitals and cultural centers. A low-income group presents an intermediate level of accessibility to public schools and sports centers, evidencing the diverse condition of citizens in the peripheries. These complex outcomes generated by the interaction of land use and public transportation emphasize the importance of comprehensive methodological approaches to support decisions of urban projects, plans and programs. Reducing spatial inequalities, especially providing services for deprived groups, is fundamental to promote the sustainable use of resources and optimize the daily commuting.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1908.00625v1
PDF	https://arxiv.org/pdf/1908.00625v1.pdf
PWC	https://paperswithcode.com/paper/learning-about-spatial-inequalities-capturing
Repo
Framework

Pix2Vex: Image-to-Geometry Reconstruction using a Smooth Differentiable Renderer


Title	Pix2Vex: Image-to-Geometry Reconstruction using a Smooth Differentiable Renderer
Authors	Felix Petersen, Amit H. Bermano, Oliver Deussen, Daniel Cohen-Or
Abstract	The long-coveted task of reconstructing 3D geometry from images is still a standing problem. In this paper, we build on the power of neural networks and introduce Pix2Vex, a network trained to convert camera-captured images into 3D geometry. We present a novel differentiable renderer ($DR$) as a forward validation means during training. Our key insight is that $DR$s produce images of a particular appearance, different from typical input images. Hence, we propose adding an image-to-image translation component, converting between these rendering styles. This translation closes the training loop, while allowing to use minimal supervision only, without needing any 3D model as ground truth. Unlike state-of-the-art methods, our $DR$ is $C^\infty$ smooth and thus does not display any discontinuities at occlusions or dis-occlusions. Through our novel training scheme, our network can train on different types of images, where previous work can typically only train on images of a similar appearance to those rendered by a $DR$.
Tasks	3D Object Reconstruction, Domain Adaptation, Image-to-Image Translation, Object Reconstruction
Published	2019-03-26
URL	https://arxiv.org/abs/1903.11149v2
PDF	https://arxiv.org/pdf/1903.11149v2.pdf
PWC	https://paperswithcode.com/paper/pix2vex-image-to-geometry-reconstruction
Repo
Framework

On The Evaluation of Machine Translation Systems Trained With Back-Translation


Title	On The Evaluation of Machine Translation Systems Trained With Back-Translation
Authors	Sergey Edunov, Myle Ott, Marc’Aurelio Ranzato, Michael Auli
Abstract	Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training data. In this work, we show that this conjecture is not empirically supported and that back-translation improves translation quality of both naturally occurring text as well as translationese according to professional human translators. We provide empirical evidence to support the view that back-translation is preferred by humans because it produces more fluent outputs. BLEU cannot capture human preferences because references are translationese when source sentences are natural text. We recommend complementing BLEU with a language model score to measure fluency.
Tasks	Data Augmentation, Language Modelling, Machine Translation
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05204v1
PDF	https://arxiv.org/pdf/1908.05204v1.pdf
PWC	https://paperswithcode.com/paper/on-the-evaluation-of-machine-translation
Repo
Framework

Learning to Impute: A General Framework for Semi-supervised Learning


Title	Learning to Impute: A General Framework for Semi-supervised Learning
Authors	Wei-Hong Li, Chuan-Sheng Foo, Hakan Bilen
Abstract	Recent semi-supervised learning methods have shown to achieve comparable results to their supervised counterparts while using only a small portion of labels in image classification tasks thanks to their regularization strategies. In this paper, we take a more direct approach for semi-supervised learning and propose learning to impute the labels of unlabeled samples such that a network achieves better generalization when it is trained on these labels. We pose the problem in a learning-to-learn formulation which can easily be incorporated to the state-of-the-art semi-supervised techniques and boost their performance especially when the labels are limited. We demonstrate that our method is applicable to both classification and regression problems including image classification and facial landmark detection tasks.
Tasks	Facial Landmark Detection, Image Classification
Published	2019-12-22
URL	https://arxiv.org/abs/1912.10364v1
PDF	https://arxiv.org/pdf/1912.10364v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-impute-a-general-framework-for-1
Repo
Framework

Software Tools for Big Data Resources in Family Names Dictionaries


Title	Software Tools for Big Data Resources in Family Names Dictionaries
Authors	Adam Rambousek, Harry Parkin, Ales Horak
Abstract	This paper describes the design and development of specific software tools used during the creation of Family Names in Britain and Ireland (FaNBI) research project, started by the University of the West of England in 2010 and finished successfully in 2016. First, the overview of the project and methodology is provided. Next section contains the description of dictionary management tools and software tools to combine input data resources.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.09234v1
PDF	http://arxiv.org/pdf/1904.09234v1.pdf
PWC	https://paperswithcode.com/paper/190409234
Repo
Framework