October 20, 2019

3161 words 15 mins read

Paper Group AWR 237

Paper Group AWR 237

Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration. Bayesian Modeling of Intersectional Fairness: The Variance of Bias. Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analys …

Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

Title Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation
Authors Eddy Ilg, Tonmoy Saikia, Margret Keuper, Thomas Brox
Abstract Occlusions play an important role in disparity and optical flow estimation, since matching costs are not available in occluded areas and occlusions indicate depth or motion boundaries. Moreover, occlusions are relevant for motion segmentation and scene flow estimation. In this paper, we present an efficient learning-based approach to estimate occlusion areas jointly with disparities or optical flow. The estimated occlusions and motion boundaries clearly improve over the state-of-the-art. Moreover, we present networks with state-of-the-art performance on the popular KITTI benchmark and good generic performance. Making use of the estimated occlusions, we also show improved results on motion segmentation and scene flow estimation.
Tasks Motion Segmentation, Optical Flow Estimation, Scene Flow Estimation
Published 2018-08-06
URL http://arxiv.org/abs/1808.01838v2
PDF http://arxiv.org/pdf/1808.01838v2.pdf
PWC https://paperswithcode.com/paper/occlusions-motion-and-depth-boundaries-with-a
Repo https://github.com/FilippoAleotti/Dwarf-Tensorflow
Framework tf

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

Title Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
Authors Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
Abstract Previous works utilized ‘‘smaller-norm-less-important’’ criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with ‘‘relatively less’’ importance. When applied to two image classification benchmarks, our method validates its usefulness and strengths. Notably, on CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69% relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than 42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/filter-pruning-geometric-median
Tasks Image Classification
Published 2018-11-01
URL https://arxiv.org/abs/1811.00250v3
PDF https://arxiv.org/pdf/1811.00250v3.pdf
PWC https://paperswithcode.com/paper/pruning-filter-via-geometric-median-for-deep
Repo https://github.com/he-y/filter-pruning-geometric-median
Framework pytorch

Bayesian Modeling of Intersectional Fairness: The Variance of Bias

Title Bayesian Modeling of Intersectional Fairness: The Variance of Bias
Authors James Foulds, Rashidul Islam, Kamrun Keya, Shimei Pan
Abstract Intersectionality is a framework that analyzes how interlocking systems of power and oppression affect individuals along overlapping dimensions including race, gender, sexual orientation, class, and disability. Intersectionality theory therefore implies it is important that fairness in artificial intelligence systems be protected with regard to multi-dimensional protected attributes. However, the measurement of fairness becomes statistically challenging in the multi-dimensional setting due to data sparsity, which increases rapidly in the number of dimensions, and in the values per dimension. We present a Bayesian probabilistic modeling approach for the reliable, data-efficient estimation of fairness with multi-dimensional protected attributes, which we apply to two existing intersectional fairness metrics. Experimental results on census data and the COMPAS criminal justice recidivism dataset demonstrate the utility of our methodology, and show that Bayesian methods are valuable for the modeling and measurement of fairness in an intersectional context.
Tasks
Published 2018-11-18
URL https://arxiv.org/abs/1811.07255v2
PDF https://arxiv.org/pdf/1811.07255v2.pdf
PWC https://paperswithcode.com/paper/bayesian-modeling-of-intersectional-fairness
Repo https://github.com/summerscope/fair-ml-reading-group
Framework none

Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analysis

Title Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analysis
Authors Timothy Wong, Zhiyuan Luo
Abstract Recurrent auto-encoder model summarises sequential data through an encoder structure into a fixed-length vector and then reconstructs the original sequence through the decoder structure. The summarised vector can be used to represent time series features. In this paper, we propose relaxing the dimensionality of the decoder output so that it performs partial reconstruction. The fixed-length vector therefore represents features in the selected dimensions only. In addition, we propose using rolling fixed window approach to generate training samples from unbounded time series data. The change of time series features over time can be summarised as a smooth trajectory path. The fixed-length vectors are further analysed using additional visualisation and unsupervised clustering techniques. The proposed method can be applied in large-scale industrial processes for sensors signal analysis purpose, where clusters of the vector representations can reflect the operating states of the industrial system.
Tasks Time Series
Published 2018-07-10
URL http://arxiv.org/abs/1807.03710v1
PDF http://arxiv.org/pdf/1807.03710v1.pdf
PWC https://paperswithcode.com/paper/recurrent-auto-encoder-model-for-large-scale
Repo https://github.com/lifesailor/data-driven-predictive-maintenance
Framework none

A Preliminary Study of Neural Network-based Approximation for HPC Applications

Title A Preliminary Study of Neural Network-based Approximation for HPC Applications
Authors Wenqian Dong, Anzheng Guolu, Dong Li
Abstract Machine learning, as a tool to learn and model complicated (non)linear relationships between input and output data sets, has shown preliminary success in some HPC problems. Using machine learning, scientists are able to augment existing simulations by improving accuracy and significantly reducing latencies. Our ongoing research work is to create a general framework to apply neural network-based models to HPC applications. In particular, we want to use the neural network to approximate and replace code regions within the HPC application to improve performance (i.e., reducing the execution time) of the HPC application. In this paper, we present our preliminary study and results. Using two applications (the Newton-Raphson method and the Lennard-Jones (LJ) potential in LAMMP) for our case study, we achieve up to 2.7x and 2.46x speedup, respectively.
Tasks
Published 2018-12-18
URL http://arxiv.org/abs/1812.07561v1
PDF http://arxiv.org/pdf/1812.07561v1.pdf
PWC https://paperswithcode.com/paper/a-preliminary-study-of-neural-network-based
Repo https://github.com/daniel-e/papr
Framework none

Learning Pose Specific Representations by Predicting Different Views

Title Learning Pose Specific Representations by Predicting Different Views
Authors Georg Poier, David Schinagl, Horst Bischof
Abstract The labeled data required to learn pose estimation for articulated objects is difficult to provide in the desired quantity, realism, density, and accuracy. To address this issue, we develop a method to learn representations, which are very specific for articulated poses, without the need for labeled training data. We exploit the observation that the object pose of a known object is predictive for the appearance in any known view. That is, given only the pose and shape parameters of a hand, the hand’s appearance from any viewpoint can be approximated. To exploit this observation, we train a model that – given input from one view – estimates a latent representation, which is trained to be predictive for the appearance of the object when captured from another viewpoint. Thus, the only necessary supervision is the second view. The training process of this model reveals an implicit pose representation in the latent space. Importantly, at test time the pose representation can be inferred using only a single view. In qualitative and quantitative experiments we show that the learned representations capture detailed pose information. Moreover, when training the proposed method jointly with labeled and unlabeled data, it consistently surpasses the performance of its fully supervised counterpart, while reducing the amount of needed labeled samples by at least one order of magnitude.
Tasks Hand Pose Estimation, Pose Estimation
Published 2018-04-10
URL http://arxiv.org/abs/1804.03390v2
PDF http://arxiv.org/pdf/1804.03390v2.pdf
PWC https://paperswithcode.com/paper/learning-pose-specific-representations-by
Repo https://github.com/poier/PreView
Framework pytorch

Manifold Learning of Four-dimensional Scanning Transmission Electron Microscopy

Title Manifold Learning of Four-dimensional Scanning Transmission Electron Microscopy
Authors Xin Li, Ondrej E. Dyck, Mark P. Oxley, Andrew R. Lupini, Leland McInnes, John Healy, Stephen Jesse, Sergei V. Kalinin
Abstract Four-dimensional scanning transmission electron microscopy (4D-STEM) of local atomic diffraction patterns is emerging as a powerful technique for probing intricate details of atomic structure and atomic electric fields. However, efficient processing and interpretation of large volumes of data remain challenging, especially for two-dimensional or light materials because the diffraction signal recorded on the pixelated arrays is weak. Here we employ data-driven manifold leaning approaches for straightforward visualization and exploration analysis of the 4D-STEM datasets, distilling real-space neighboring effects on atomically resolved deflection patterns from single-layer graphene, with single dopant atoms, as recorded on a pixelated detector. These extracted patterns relate to both individual atom sites and sublattice structures, effectively discriminating single dopant anomalies via multi-mode views. We believe manifold learning analysis will accelerate physics discoveries coupled between data-rich imaging mechanisms and materials such as ferroelectric, topological spin and van der Waals heterostructures.
Tasks
Published 2018-10-18
URL http://arxiv.org/abs/1811.00080v3
PDF http://arxiv.org/pdf/1811.00080v3.pdf
PWC https://paperswithcode.com/paper/manifold-learning-of-four-dimensional
Repo https://github.com/nonmin/4D-STEM
Framework none

N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification

Title N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification
Authors Sami Abu-El-Haija, Amol Kapoor, Bryan Perozzi, Joonseok Lee
Abstract Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (N-GCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. Our experiments show that our proposed N-GCN model improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and PPI. In addition, our proposed method has other desirable properties, including generalization to recently proposed semi-supervised learning methods such as GraphSAGE, allowing us to propose N-SAGE, and resilience to adversarial input perturbations.
Tasks Node Classification
Published 2018-02-24
URL http://arxiv.org/abs/1802.08888v1
PDF http://arxiv.org/pdf/1802.08888v1.pdf
PWC https://paperswithcode.com/paper/n-gcn-multi-scale-graph-convolution-for-semi
Repo https://github.com/samihaija/mixhop
Framework tf

3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space

Title 3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space
Authors Masoud Abdi, Ehsan Abbasnejad, Chee Peng Lim, Saeid Nahavandi
Abstract Tremendous amounts of expensive annotated data are a vital ingredient for state-of-the-art 3d hand pose estimation. Therefore, synthetic data has been popularized as annotations are automatically available. However, models trained only with synthetic samples do not generalize to real data, mainly due to the gap between the distribution of synthetic and real data. In this paper, we propose a novel method that seeks to predict the 3d position of the hand using both synthetic and partially-labeled real data. Accordingly, we form a shared latent space between three modalities: synthetic depth image, real depth image, and pose. We demonstrate that by carefully learning the shared latent space, we can find a regression model that is able to generalize to real data. As such, we show that our method produces accurate predictions in both semi-supervised and unsupervised settings. Additionally, the proposed model is capable of generating novel, meaningful, and consistent samples from all of the three domains. We evaluate our method qualitatively and quantitively on two highly competitive benchmarks (i.e., NYU and ICVL) and demonstrate its superiority over the state-of-the-art methods. The source code will be made available at https://github.com/masabdi/LSPS.
Tasks Hand Pose Estimation, Pose Estimation
Published 2018-07-14
URL http://arxiv.org/abs/1807.05380v1
PDF http://arxiv.org/pdf/1807.05380v1.pdf
PWC https://paperswithcode.com/paper/3d-hand-pose-estimation-using-simulation-and
Repo https://github.com/masabdi/LSPS
Framework none

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

Title Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Authors Zhirong Wu, Yuanjun Xiong, Stella Yu, Dahua Lin
Abstract Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similarity among instances, instead of classes, by merely asking the feature to be discriminative of individual instances? We formulate this intuition as a non-parametric classification problem at the instance-level, and use noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Our experimental results demonstrate that, under unsupervised learning settings, our method surpasses the state-of-the-art on ImageNet classification by a large margin. Our method is also remarkable for consistently improving test performance with more training data and better network architectures. By fine-tuning the learned feature, we further obtain competitive results for semi-supervised learning and object detection tasks. Our non-parametric model is highly compact: With 128 features per image, our method requires only 600MB storage for a million images, enabling fast nearest neighbour retrieval at the run time.
Tasks Object Detection
Published 2018-05-05
URL http://arxiv.org/abs/1805.01978v1
PDF http://arxiv.org/pdf/1805.01978v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-feature-learning-via-non
Repo https://github.com/DianaSHV/lemniscate_edit
Framework pytorch

Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

Title Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Authors Avikalp Srivastava, Hsin Wen Liu, Sumio Fujita
Abstract Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA, and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.
Tasks Community Question Answering, Question Answering, Representation Learning, Visual Question Answering
Published 2018-08-29
URL https://arxiv.org/abs/1808.09648v2
PDF https://arxiv.org/pdf/1808.09648v2.pdf
PWC https://paperswithcode.com/paper/from-vqa-to-multimodal-cqa-adapting-visual-qa
Repo https://github.com/avikalp7/VQAtoCQA
Framework tf

MTNT: A Testbed for Machine Translation of Noisy Text

Title MTNT: A Testbed for Machine Translation of Noisy Text
Authors Paul Michel, Graham Neubig
Abstract Noisy or non-standard input text can cause disastrous mistranslations in most modern Machine Translation (MT) systems, and there has been growing research interest in creating noise-robust MT systems. However, as of yet there are no publicly available parallel corpora of with naturally occurring noisy inputs and translations, and thus previous work has resorted to evaluating on synthetically created datasets. In this paper, we propose a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit (www.reddit.com) and professionally sourced translations. We commissioned translations of English comments into French and Japanese, as well as French and Japanese comments into English, on the order of 7k-37k sentences per language pair. We qualitatively and quantitatively examine the types of noise included in this dataset, then demonstrate that existing MT models fail badly on a number of noise-related phenomena, even after performing adaptation on a small training set of in-domain data. This indicates that this dataset can provide an attractive testbed for methods tailored to handling noisy text in MT. The data is publicly available at www.cs.cmu.edu/~pmichel1/mtnt/.
Tasks Machine Translation
Published 2018-09-02
URL http://arxiv.org/abs/1809.00388v1
PDF http://arxiv.org/pdf/1809.00388v1.pdf
PWC https://paperswithcode.com/paper/mtnt-a-testbed-for-machine-translation-of
Repo https://github.com/MysteryVaibhav/robust_mtnt
Framework pytorch

Classification by Re-generation: Towards Classification Based on Variational Inference

Title Classification by Re-generation: Towards Classification Based on Variational Inference
Authors Shideh Rezaeifar, Olga Taran, Slava Voloshynovskiy
Abstract As Deep Neural Networks (DNNs) are considered the state-of-the-art in many classification tasks, the question of their semantic generalizations has been raised. To address semantic interpretability of learned features, we introduce a novel idea of classification by re-generation based on variational autoencoder (VAE) in which a separate encoder-decoder pair of VAE is trained for each class. Moreover, the proposed architecture overcomes the scalability issue in current DNN networks as there is no need to re-train the whole network with the addition of new classes and it can be done for each class separately. We also introduce a criterion based on Kullback-Leibler divergence to reject doubtful examples. This rejection criterion should improve the trust in the obtained results and can be further exploited to reject adversarial examples.
Tasks
Published 2018-09-10
URL http://arxiv.org/abs/1809.03259v1
PDF http://arxiv.org/pdf/1809.03259v1.pdf
PWC https://paperswithcode.com/paper/classification-by-re-generation-towards
Repo https://github.com/StijnVerdenius/Boosting_Text_Classifiers_by_Generative_Modelling
Framework pytorch

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Title Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Authors Xingang Pan, Ping Luo, Jianping Shi, Xiaoou Tang
Abstract Convolutional neural networks (CNNs) have achieved great successes in many computer vision problems. Unlike existing works that designed CNN architectures to improve performance on a single task of a single domain and not generalizable, we present IBN-Net, a novel convolutional architecture, which remarkably enhances a CNN’s modeling ability on one domain (e.g. Cityscapes) as well as its generalization capacity on another domain (e.g. GTA5) without finetuning. IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances. This work has three key contributions. (1) By delving into IN and BN, we disclose that IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information. (2) IBN-Net can be applied to many advanced deep architectures, such as DenseNet, ResNet, ResNeXt, and SENet, and consistently improve their performance without increasing computational cost. (3) When applying the trained networks to new domains, e.g. from GTA5 to Cityscapes, IBN-Net achieves comparable improvements as domain adaptation methods, even without using data from the target domain. With IBN-Net, we won the 1st place on the WAD 2018 Challenge Drivable Area track, with an mIoU of 86.18%.
Tasks Domain Adaptation
Published 2018-07-25
URL https://arxiv.org/abs/1807.09441v3
PDF https://arxiv.org/pdf/1807.09441v3.pdf
PWC https://paperswithcode.com/paper/two-at-once-enhancing-learning-and
Repo https://github.com/XingangPan/IBN-Net
Framework pytorch

Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text

Title Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
Authors Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, William W. Cohen
Abstract Open Domain Question Answering (QA) is evolving from complex pipelined systems to end-to-end deep neural networks. Specialized neural models have been developed for extracting answers from either text alone or Knowledge Bases (KBs) alone. In this paper we look at a more practical setting, namely QA over the combination of a KB and entity-linked text, which is appropriate when an incomplete KB is available with a large text corpus. Building on recent advances in graph representation learning we propose a novel model, GRAFT-Net, for extracting answers from a question-specific subgraph containing text and KB entities and relations. We construct a suite of benchmark tasks for this problem, varying the difficulty of questions, the amount of training data, and KB completeness. We show that GRAFT-Net is competitive with the state-of-the-art when tested using either KBs or text alone, and vastly outperforms existing methods in the combined setting. Source code is available at https://github.com/OceanskySun/GraftNet .
Tasks Graph Representation Learning, Open-Domain Question Answering, Question Answering, Representation Learning
Published 2018-09-04
URL http://arxiv.org/abs/1809.00782v1
PDF http://arxiv.org/pdf/1809.00782v1.pdf
PWC https://paperswithcode.com/paper/open-domain-question-answering-using-early
Repo https://github.com/OceanskySun/GraftNet
Framework none
comments powered by Disqus