Paper Group ANR 700
Doubly Semi-Implicit Variational Inference. Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation. Three for one and one for three: Flow, Segmentation, and Surface Normals. Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network. Neural Architecture Search Over a Graph Sea …
Doubly Semi-Implicit Variational Inference
Title | Doubly Semi-Implicit Variational Inference |
Authors | Dmitry Molchanov, Valery Kharitonov, Artem Sobolev, Dmitry Vetrov |
Abstract | We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. In other words, DSIVI performs inference in models where the prior and the posterior can be expressed as an intractable infinite mixture of some analytic density with a highly flexible implicit mixing distribution. We provide a sandwich bound on the evidence lower bound (ELBO) objective that can be made arbitrarily tight. Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact. We evaluate DSIVI on a set of problems that benefit from implicit priors. In particular, we show that DSIVI gives rise to a simple modification of VampPrior, the current state-of-the-art prior for variational autoencoders, which improves its performance. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02789v2 |
http://arxiv.org/pdf/1810.02789v2.pdf | |
PWC | https://paperswithcode.com/paper/doubly-semi-implicit-variational-inference |
Repo | |
Framework | |
Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation
Title | Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation |
Authors | Borja Bovcon, Rok Mandeljc, Janez Perš, Matej Kristan |
Abstract | A new obstacle detection algorithm for unmanned surface vehicles (USVs) is presented. A state-of-the-art graphical model for semantic segmentation is extended to incorporate boat pitch and roll measurements from the on-board inertial measurement unit (IMU), and a stereo verification algorithm that consolidates tentative detections obtained from the segmentation is proposed. The IMU readings are used to estimate the location of horizon line in the image, which automatically adjusts the priors in the probabilistic semantic segmentation model. We derive the equations for projecting the horizon into images, propose an efficient optimization algorithm for the extended graphical model, and offer a practical IMU-camera-USV calibration procedure. Using an USV equipped with multiple synchronized sensors, we captured a new challenging multi-modal dataset, and annotated its images with water edge and obstacles. Experimental results show that the proposed algorithm significantly outperforms the state of the art, with nearly 30% improvement in water-edge detection accuracy, an over 21% reduction of false positive rate, an almost 60% reduction of false negative rate, and an over 65% increase of true positive rate, while its Matlab implementation runs in real-time. |
Tasks | Calibration, Edge Detection, Semantic Segmentation |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.07956v1 |
http://arxiv.org/pdf/1802.07956v1.pdf | |
PWC | https://paperswithcode.com/paper/stereo-obstacle-detection-for-unmanned |
Repo | |
Framework | |
Three for one and one for three: Flow, Segmentation, and Surface Normals
Title | Three for one and one for three: Flow, Segmentation, and Surface Normals |
Authors | Hoang-An Le, Anil S. Baslamisli, Thomas Mensink, Theo Gevers |
Abstract | Optical flow, semantic segmentation, and surface normals represent different information modalities, yet together they bring better cues for scene understanding problems. In this paper, we study the influence between the three modalities: how one impacts on the others and their efficiency in combination. We employ a modular approach using a convolutional refinement network which is trained supervised but isolated from RGB images to enforce joint modality features. To assist the training process, we create a large-scale synthetic outdoor dataset that supports dense annotation of semantic segmentation, optical flow, and surface normals. The experimental results show positive influence among the three modalities, especially for objects’ boundaries, region consistency, and scene structures. |
Tasks | Optical Flow Estimation, Scene Understanding, Semantic Segmentation |
Published | 2018-07-19 |
URL | http://arxiv.org/abs/1807.07473v1 |
http://arxiv.org/pdf/1807.07473v1.pdf | |
PWC | https://paperswithcode.com/paper/three-for-one-and-one-for-three-flow |
Repo | |
Framework | |
Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network
Title | Low-Latency Human Action Recognition with Weighted Multi-Region Convolutional Neural Network |
Authors | Yunfeng Wang, Wengang Zhou, Qilin Zhang, Xiaotian Zhu, Houqiang Li |
Abstract | Spatio-temporal contexts are crucial in understanding human actions in videos. Recent state-of-the-art Convolutional Neural Network (ConvNet) based action recognition systems frequently involve 3D spatio-temporal ConvNet filters, chunking videos into fixed length clips and Long Short Term Memory (LSTM) networks. Such architectures are designed to take advantage of both short term and long term temporal contexts, but also requires the accumulation of a predefined number of video frames (e.g., to construct video clips for 3D ConvNet filters, to generate enough inputs for LSTMs). For applications that require low-latency online predictions of fast-changing action scenes, a new action recognition system is proposed in this paper. Termed “Weighted Multi-Region Convolutional Neural Network” (WMR ConvNet), the proposed system is LSTM-free, and is based on 2D ConvNet that does not require the accumulation of video frames for 3D ConvNet filtering. Unlike early 2D ConvNets that are based purely on RGB frames and optical flow frames, the WMR ConvNet is designed to simultaneously capture multiple spatial and short term temporal cues (e.g., human poses, occurrences of objects in the background) with both the primary region (foreground) and secondary regions (mostly background). On both the UCF101 and HMDB51 datasets, the proposed WMR ConvNet achieves the state-of-the-art performance among competing low-latency algorithms. Furthermore, WMR ConvNet even outperforms the 3D ConvNet based C3D algorithm that requires video frame accumulation. In an ablation study with the optical flow ConvNet stream removed, the ablated WMR ConvNet nevertheless outperforms competing algorithms. |
Tasks | Chunking, Optical Flow Estimation, Temporal Action Localization |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.02877v1 |
http://arxiv.org/pdf/1805.02877v1.pdf | |
PWC | https://paperswithcode.com/paper/low-latency-human-action-recognition-with |
Repo | |
Framework | |
Neural Architecture Search Over a Graph Search Space
Title | Neural Architecture Search Over a Graph Search Space |
Authors | Stanisław Jastrzębski, Quentin de Laroussilhe, Mingxing Tan, Xiao Ma, Neil Houlsby, Andrea Gesmundo |
Abstract | Neural Architecture Search (NAS) enabled the discovery of state-of-the-art architectures in many domains. However, the success of NAS depends on the definition of the search space. Current search spaces are defined as a static sequence of decisions and a set of available actions for each decision. Each possible sequence of actions defines an architecture. We propose a more expressive class of search space: directed graphs. In our formalism, each decision is a vertex and each action is an edge. This allows us to model iterative and branching architecture design decisions. We demonstrate in simulation, and on image classification experiments, basic iterative and branching search structures, and show that the graph representation improves sample efficiency. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2018-12-27 |
URL | https://arxiv.org/abs/1812.10666v2 |
https://arxiv.org/pdf/1812.10666v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-architecture-search-over-a-graph |
Repo | |
Framework | |
Learning the Synthesizability of Dynamic Texture Samples
Title | Learning the Synthesizability of Dynamic Texture Samples |
Authors | Feng Yang, Gui-Song Xia, Dengxin Dai, Liangpei Zhang |
Abstract | A dynamic texture (DT) refers to a sequence of images that exhibit temporal regularities and has many applications in computer vision and graphics. Given an exemplar of dynamic texture, it is a dynamic but challenging task to generate new samples with high quality that are perceptually similar to the input exemplar, which is known to be {\em example-based dynamic texture synthesis (EDTS)}. Numerous approaches have been devoted to this problem, in the past decades, but none them are able to tackle all kinds of dynamic textures equally well. In this paper, we investigate the synthesizability of dynamic texture samples: {\em given a dynamic texture sample, how synthesizable it is by using EDTS, and which EDTS method is the most suitable to synthesize it?} To this end, we propose to learn regression models to connect dynamic texture samples with synthesizability scores, with the help of a compiled dynamic texture dataset annotated in terms of synthesizability. More precisely, we first define the synthesizability of DT samples and characterize them by a set of spatiotemporal features. Based on these features and an annotated dynamic texture dataset, we then train regression models to predict the synthesizability scores of texture samples and learn classifiers to select the most suitable EDTS methods. We further complete the selection, partition and synthesizability prediction of dynamic texture samples in a hierarchical scheme. We finally apply the learned synthesizability to detecting synthesizable regions in videos. The experiments demonstrate that our method can effectively learn and predict the synthesizability of DT samples. |
Tasks | Texture Synthesis |
Published | 2018-02-03 |
URL | http://arxiv.org/abs/1802.00941v1 |
http://arxiv.org/pdf/1802.00941v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-synthesizability-of-dynamic |
Repo | |
Framework | |
Semi-tied Units for Efficient Gating in LSTM and Highway Networks
Title | Semi-tied Units for Efficient Gating in LSTM and Highway Networks |
Authors | Chao Zhang, Philip Woodland |
Abstract | Gating is a key technique used for integrating information from multiple sources by long short-term memory (LSTM) models and has recently also been applied to other models such as the highway network. Although gating is powerful, it is rather expensive in terms of both computation and storage as each gating unit uses a separate full weight matrix. This issue can be severe since several gates can be used together in e.g. an LSTM cell. This paper proposes a semi-tied unit (STU) approach to solve this efficiency issue, which uses one shared weight matrix to replace those in all the units in the same layer. The approach is termed “semi-tied” since extra parameters are used to separately scale each of the shared output values. These extra scaling factors are associated with the network activation functions and result in the use of parameterised sigmoid, hyperbolic tangent, and rectified linear unit functions. Speech recognition experiments using British English multi-genre broadcast data showed that using STUs can reduce the calculation and storage cost by a factor of three for highway networks and four for LSTMs, while giving similar word error rates to the original models. |
Tasks | Speech Recognition |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06513v1 |
http://arxiv.org/pdf/1806.06513v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-tied-units-for-efficient-gating-in-lstm |
Repo | |
Framework | |
Examining Scientific Writing Styles from the Perspective of Linguistic Complexity
Title | Examining Scientific Writing Styles from the Perspective of Linguistic Complexity |
Authors | Chao Lu, Yi Bu, Jie Wang, Ying Ding, Vetle Torvik, Matthew Schnaars, Chengzhi Zhang |
Abstract | Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity. |
Tasks | |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08374v2 |
http://arxiv.org/pdf/1807.08374v2.pdf | |
PWC | https://paperswithcode.com/paper/examining-scientific-writing-styles-from-the |
Repo | |
Framework | |
Bias Reduction via End-to-End Shift Learning: Application to Citizen Science
Title | Bias Reduction via End-to-End Shift Learning: Application to Citizen Science |
Authors | Di Chen, Carla P. Gomes |
Abstract | Citizen science projects are successful at gathering rich datasets for various applications. However, the data collected by citizen scientists are often biased — in particular, aligned more with the citizens’ preferences than with scientific objectives. We propose the Shift Compensation Network (SCN), an end-to-end learning scheme which learns the shift from the scientific objectives to the biased data while compensating for the shift by re-weighting the training data. Applied to bird observational data from the citizen science project eBird, we demonstrate how SCN quantifies the data distribution shift and outperforms supervised learning models that do not address the data bias. Compared with competing models in the context of covariate shift, we further demonstrate the advantage of SCN in both its effectiveness and its capability of handling massive high-dimensional data. |
Tasks | |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00458v4 |
http://arxiv.org/pdf/1811.00458v4.pdf | |
PWC | https://paperswithcode.com/paper/bias-reduction-via-end-to-end-shift-learning |
Repo | |
Framework | |
Hybrid Adaptive Fuzzy Extreme Learning Machine for text classification
Title | Hybrid Adaptive Fuzzy Extreme Learning Machine for text classification |
Authors | Ming Li, Peilun Xiao, Ju Zhang |
Abstract | In traditional ELM and its improved versions suffer from the problems of outliers or noises due to overfitting and imbalance due to distribution. We propose a novel hybrid adaptive fuzzy ELM(HA-FELM), which introduces a fuzzy membership function to the traditional ELM method to deal with the above problems. We define the fuzzy membership function not only basing on the distance between each sample and the center of the class but also the density among samples which based on the quantum harmonic oscillator model. The proposed fuzzy membership function overcomes the shortcoming of the traditional fuzzy membership function and could make itself adjusted according to the specific distribution of different samples adaptively. Experiments show the proposed HA-FELM can produce better performance than SVM, ELM, and RELM in text classification. |
Tasks | Text Classification |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.06524v1 |
http://arxiv.org/pdf/1805.06524v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-adaptive-fuzzy-extreme-learning |
Repo | |
Framework | |
An Explicit Neural Network Construction for Piecewise Constant Function Approximation
Title | An Explicit Neural Network Construction for Piecewise Constant Function Approximation |
Authors | Kailiang Wu, Dongbin Xiu |
Abstract | We present an explicit construction for feedforward neural network (FNN), which provides a piecewise constant approximation for multivariate functions. The proposed FNN has two hidden layers, where the weights and thresholds are explicitly defined and do not require numerical optimization for training. Unlike most of the existing work on explicit FNN construction, the proposed FNN does not rely on tensor structure in multiple dimensions. Instead, it automatically creates Voronoi tessellation of the domain, based on the given data of the target function, and piecewise constant approximation of the function. This makes the construction more practical for applications. We present both theoretical analysis and numerical examples to demonstrate its properties. |
Tasks | |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07390v1 |
http://arxiv.org/pdf/1808.07390v1.pdf | |
PWC | https://paperswithcode.com/paper/an-explicit-neural-network-construction-for |
Repo | |
Framework | |
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
Title | The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary |
Authors | Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao |
Abstract | The 3rd annual installment of the ActivityNet Large- Scale Activity Recognition Challenge, held as a full-day workshop in CVPR 2018, focused on the recognition of daily life, high-level, goal-oriented activities from user-generated videos as those found in internet video portals. The 2018 challenge hosted six diverse tasks which aimed to push the limits of semantic visual understanding of videos as well as bridge visual content with human captions. Three out of the six tasks were based on the ActivityNet dataset, which was introduced in CVPR 2015 and organized hierarchically in a semantic taxonomy. These tasks focused on tracing evidence of activities in time in the form of proposals, class labels, and captions. In this installment of the challenge, we hosted three guest tasks to enrich the understanding of visual information in videos. The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research. |
Tasks | Activity Recognition |
Published | 2018-08-11 |
URL | http://arxiv.org/abs/1808.03766v2 |
http://arxiv.org/pdf/1808.03766v2.pdf | |
PWC | https://paperswithcode.com/paper/the-activitynet-large-scale-activity |
Repo | |
Framework | |
Identification of Internal Faults in Indirect Symmetrical Phase Shift Transformers Using Ensemble Learning
Title | Identification of Internal Faults in Indirect Symmetrical Phase Shift Transformers Using Ensemble Learning |
Authors | Pallav Kumar Bera, Rajesh Kumar, Can Isik |
Abstract | This paper proposes methods to identify 40 different types of internal faults in an Indirect Symmetrical Phase Shift Transformer (ISPST). The ISPST was modeled using Power System Computer Aided Design (PSCAD)/ Electromagnetic Transients including DC (EMTDC). The internal faults were simulated by varying the transformer tapping, backward and forward phase shifts, loading, and percentage of winding faulted. Data for 960 cases of each type of fault was recorded. A series of features were extracted for a, b, and c phases from time, frequency, time-frequency, and information theory domains. The importance of the extracted features was evaluated through univariate tests which helped to reduce the number of features. The selected features were then used for training five state-of-the-art machine learning classifiers. Extremely Random Trees and Random Forest, the ensemble-based learners, achieved the accuracy of 98.76% and 97.54% respectively outperforming Multilayer Perceptron (96.13%), Logistic Regression (93.54%), and Support Vector Machines (92.60%) |
Tasks | |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04537v1 |
http://arxiv.org/pdf/1811.04537v1.pdf | |
PWC | https://paperswithcode.com/paper/identification-of-internal-faults-in-indirect |
Repo | |
Framework | |
Investigating Human + Machine Complementarity for Recidivism Predictions
Title | Investigating Human + Machine Complementarity for Recidivism Predictions |
Authors | Sarah Tan, Julius Adebayo, Kori Inkpen, Ece Kamar |
Abstract | When might human input help (or not) when assessing risk in fairness domains? Dressel and Farid (2018) asked Mechanical Turk workers to evaluate a subset of defendants in the ProPublica COMPAS data for risk of recidivism, and concluded that COMPAS predictions were no more accurate or fair than predictions made by humans. We delve deeper into this claim to explore differences in human and algorithmic decision making. We construct a Human Risk Score based on the predictions made by multiple Turk workers, characterize the features that determine agreement and disagreement between COMPAS and Human Scores, and construct hybrid Human+Machine models to predict recidivism. Our key finding is that on this data set, Human and COMPAS decision making differed, but not in ways that could be leveraged to significantly improve ground-truth prediction. We present the results of our analyses and suggestions for data collection best practices to leverage complementary strengths of human and machines in the fairness domain. |
Tasks | Decision Making |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09123v2 |
http://arxiv.org/pdf/1808.09123v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-human-machine-complementarity |
Repo | |
Framework | |
Intention Oriented Image Captions with Guiding Objects
Title | Intention Oriented Image Captions with Guiding Objects |
Authors | Yue Zheng, Yali Li, Shengjin Wang |
Abstract | Although existing image caption models can produce promising results using recurrent neural networks (RNNs), it is difficult to guarantee that an object we care about is contained in generated descriptions, for example in the case that the object is inconspicuous in the image. Problems become even harder when these objects did not appear in training stage. In this paper, we propose a novel approach for generating image captions with guiding objects (CGO). The CGO constrains the model to involve a human-concerned object when the object is in the image. CGO ensures that the object is in the generated description while maintaining fluency. Instead of generating the sequence from left to right, we start the description with a selected object and generate other parts of the sequence based on this object. To achieve this, we design a novel framework combining two LSTMs in opposite directions. We demonstrate the characteristics of our method on MSCOCO where we generate descriptions for each detected object in the images. With CGO, we can extend the ability of description to the objects being neglected in image caption labels and provide a set of more comprehensive and diverse descriptions for an image. CGO shows advantages when applied to the task of describing novel objects. We show experimental results on both MSCOCO and ImageNet datasets. Evaluations show that our method outperforms the state-of-the-art models in the task with average F1 75.8, leading to better descriptions in terms of both content accuracy and fluency. |
Tasks | Image Captioning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07662v2 |
http://arxiv.org/pdf/1811.07662v2.pdf | |
PWC | https://paperswithcode.com/paper/intention-oriented-image-captions-with |
Repo | |
Framework | |