Paper Group AWR 285
Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information. Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders. Improved Image Segmentation via Cost Minimization of Multiple Hypotheses. Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning. …
Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information
Title | Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information |
Authors | Efthymios Tzinis, Shrikant Venkataramani, Paris Smaragdis |
Abstract | We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information. We use a deep clustering approach which trains on multi-channel mixtures and learns to project spectrogram bins to source clusters that correlate with various spatial features. We show that using such a training process we can obtain separation performance that is as good as making use of ground truth separation information. Once trained, this system is capable of performing sound separation on monophonic inputs, despite having learned how to do so using multi-channel recordings. |
Tasks | Multi-Speaker Source Separation, Speech Separation |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01531v2 |
http://arxiv.org/pdf/1811.01531v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-deep-clustering-for-source |
Repo | https://github.com/etzinis/unsupervised_spatial_dc |
Framework | none |
Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders
Title | Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders |
Authors | Xiaoran Chen, Ender Konukoglu |
Abstract | Lesion detection in brain Magnetic Resonance Images (MRI) remains a challenging task. State-of-the-art approaches are mostly based on supervised learning making use of large annotated datasets. Human beings, on the other hand, even non-experts, can detect most abnormal lesions after seeing a handful of healthy brain images. Replicating this capability of using prior information on the appearance of healthy brain structure to detect lesions can help computers achieve human level abnormality detection, specifically reducing the need for numerous labeled examples and bettering generalization of previously unseen lesions. To this end, we study detection of lesion regions in an unsupervised manner by learning data distribution of brain MRI of healthy subjects using auto-encoder based methods. We hypothesize that one of the main limitations of the current models is the lack of consistency in latent representation. We propose a simple yet effective constraint that helps mapping of an image bearing lesion close to its corresponding healthy image in the latent space. We use the Human Connectome Project dataset to learn distribution of healthy-appearing brain MRI and report improved detection, in terms of AUC, of the lesions in the BRATS challenge dataset. |
Tasks | Anomaly Detection |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.04972v1 |
http://arxiv.org/pdf/1806.04972v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-detection-of-lesions-in-brain |
Repo | https://github.com/aubreychen9012/cAAE |
Framework | tf |
Improved Image Segmentation via Cost Minimization of Multiple Hypotheses
Title | Improved Image Segmentation via Cost Minimization of Multiple Hypotheses |
Authors | Marc Bosch, Christopher M. Gifford, Austin G. Dress, Clare W. Lau, Jeffrey G. Skibo, Gordon A. Christie |
Abstract | Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm’s performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses. |
Tasks | Semantic Segmentation |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1802.00088v1 |
http://arxiv.org/pdf/1802.00088v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-image-segmentation-via-cost |
Repo | https://github.com/pubgeo/cmmh_segmentation |
Framework | none |
Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning
Title | Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning |
Authors | Hung Le, Truyen Tran, Svetha Venkatesh |
Abstract | One of the core tasks in multi-view learning is to capture relations among views. For sequential data, the relations not only span across views, but also extend throughout the view length to form long-term intra-view and inter-view interactions. In this paper, we present a new memory augmented neural network model that aims to model these complex interactions between two asynchronous sequential views. Our model uses two encoders for reading from and writing to two external memories for encoding input views. The intra-view interactions and the long-term dependencies are captured by the use of memories during this encoding process. There are two modes of memory accessing in our system: late-fusion and early-fusion, corresponding to late and early inter-view interactions. In the late-fusion mode, the two memories are separated, containing only view-specific contents. In the early-fusion mode, the two memories share the same addressing space, allowing cross-memory accessing. In both cases, the knowledge from the memories will be combined by a decoder to make predictions over the output space. The resulting dual memory neural computer is demonstrated on a comprehensive set of experiments, including a synthetic task of summing two sequences and the tasks of drug prescription and disease progression in healthcare. The results demonstrate competitive performance over both traditional algorithms and deep learning methods designed for multi-view problems. |
Tasks | MULTI-VIEW LEARNING |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00662v2 |
http://arxiv.org/pdf/1802.00662v2.pdf | |
PWC | https://paperswithcode.com/paper/dual-memory-neural-computer-for-asynchronous |
Repo | https://github.com/thaihungle/DMNC |
Framework | tf |
Geometric Constellation Shaping for Fiber Optic Communication Systems via End-to-end Learning
Title | Geometric Constellation Shaping for Fiber Optic Communication Systems via End-to-end Learning |
Authors | Rasmus T. Jones, Tobias A. Eriksson, Metodi P. Yankov, Benjamin J. Puttnam, Georg Rademacher, Ruben S. Luis, Darko Zibar |
Abstract | In this paper, an unsupervised machine learning method for geometric constellation shaping is investigated. By embedding a differentiable fiber channel model within two neural networks, the learning algorithm is optimizing for a geometric constellation shape. The learned constellations yield improved performance to state-of-the-art geometrically shaped constellations, and include an implicit trade-off between amplification noise and nonlinear effects. Further, the method allows joint optimization of system parameters, such as the optimal launch power, simultaneously with the constellation shape. An experimental demonstration validates the findings. Improved performances are reported, up to 0.13 bit/4D in simulation and experimentally up to 0.12 bit/4D. |
Tasks | |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00774v1 |
http://arxiv.org/pdf/1810.00774v1.pdf | |
PWC | https://paperswithcode.com/paper/geometric-constellation-shaping-for-fiber |
Repo | https://github.com/Rassibassi/claude |
Framework | tf |
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
Title | Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks |
Authors | Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, Jiawei Han |
Abstract | Heterogeneous information networks (HINs) are ubiquitous in real-world applications. In the meantime, network embedding has emerged as a convenient tool to mine and learn from networked data. As a result, it is of interest to develop HIN embedding methods. However, the heterogeneity in HINs introduces not only rich information but also potentially incompatible semantics, which poses special challenges to embedding learning in HINs. With the intention to preserve the rich yet potentially incompatible information in HIN embedding, we propose to study the problem of comprehensive transcription of heterogeneous information networks. The comprehensive transcription of HINs also provides an easy-to-use approach to unleash the power of HINs, since it requires no additional supervision, expertise, or feature engineering. To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics. To corroborate the efficacy of HEER, we conducted experiments on two large-scale real-words datasets with an edge reconstruction task and multiple case studies. Experiment results demonstrate the effectiveness of the proposed HEER model and the utility of edge representations and heterogeneous metrics. The code and data are available at https://github.com/GentleZhu/HEER. |
Tasks | Feature Engineering, Network Embedding |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03490v1 |
http://arxiv.org/pdf/1807.03490v1.pdf | |
PWC | https://paperswithcode.com/paper/easing-embedding-learning-by-comprehensive |
Repo | https://github.com/GentleZhu/HEER |
Framework | pytorch |
Investigating Rumor News Using Agreement-Aware Search
Title | Investigating Rumor News Using Agreement-Aware Search |
Authors | Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han |
Abstract | Recent years have witnessed a widespread increase of rumor news generated by humans and machines. Therefore, tools for investigating rumor news have become an urgent necessity. One useful function of such tools is to see ways a specific topic or event is represented by presenting different points of view from multiple sources. In this paper, we propose Maester, a novel agreement-aware search framework for investigating rumor news. Given an investigative question, Maester will retrieve related articles to that question, assign and display top articles from agree, disagree, and discuss categories to users. Splitting the results into these three categories provides the user a holistic view towards the investigative question. We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences. Accordingly, we use gradient boosting tree models with keyword/entity matching features for relatedness detection, and leverage recurrent neural network to infer the level of agreement. Our experiments on the Fake News Challenge (FNC) dataset demonstrate up to an order of magnitude improvement of Maester over the original FNC winning solution, for agreement-aware search. |
Tasks | |
Published | 2018-02-21 |
URL | https://arxiv.org/abs/1802.07398v2 |
https://arxiv.org/pdf/1802.07398v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-rumor-news-using-agreement |
Repo | https://github.com/shangjingbo1226/Maester |
Framework | none |
Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning
Title | Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning |
Authors | Charles Schaff, David Yunis, Ayan Chakrabarti, Matthew R. Walter |
Abstract | The physical design of a robot and the policy that controls its motion are inherently coupled, and should be determined according to the task and environment. In an increasing number of applications, data-driven and learning-based approaches, such as deep reinforcement learning, have proven effective at designing control policies. For most tasks, the only way to evaluate a physical design with respect to such control policies is empirical–i.e., by picking a design and training a control policy for it. Since training these policies is time-consuming, it is computationally infeasible to train separate policies for all possible designs as a means to identify the best one. In this work, we address this limitation by introducing a method that performs simultaneous joint optimization of the physical design and control network. Our approach maintains a distribution over designs and uses reinforcement learning to optimize a control policy to maximize expected reward over the design distribution. We give the controller access to design parameters to allow it to tailor its policy to each design in the distribution. Throughout training, we shift the distribution towards higher-performing designs, eventually converging to a design and control policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel designs and walking gaits, outperforming baselines in both performance and efficiency. |
Tasks | |
Published | 2018-01-04 |
URL | http://arxiv.org/abs/1801.01432v3 |
http://arxiv.org/pdf/1801.01432v3.pdf | |
PWC | https://paperswithcode.com/paper/jointly-learning-to-construct-and-control |
Repo | https://github.com/wangzizhao/Jointly-Learning-to-Construct-and-Control-Agents-using-Deep-Reinforcement-Learning |
Framework | tf |
HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification
Title | HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification |
Authors | Amos Sironi, Manuele Brambilla, Nicolas Bourdis, Xavier Lagorce, Ryad Benosman |
Abstract | Event-based cameras have recently drawn the attention of the Computer Vision community thanks to their advantages in terms of high temporal resolution, low power consumption and high dynamic range, compared to traditional frame-based cameras. These properties make event-based cameras an ideal choice for autonomous vehicles, robot navigation or UAV vision, among others. However, the accuracy of event-based object classification algorithms, which is of crucial importance for any reliable system working in real-world conditions, is still far behind their frame-based counterparts. Two main reasons for this performance gap are: 1. The lack of effective low-level representations and architectures for event-based object classification and 2. The absence of large real-world event-based datasets. In this paper we address both problems. First, we introduce a novel event-based feature representation together with a new machine learning architecture. Compared to previous approaches, we use local memory units to efficiently leverage past temporal information and build a robust event-based representation. Second, we release the first large real-world event-based dataset for object classification. We compare our method to the state-of-the-art with extensive experiments, showing better classification performance and real-time computation. |
Tasks | Autonomous Vehicles, Object Classification, Robot Navigation |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.07913v1 |
http://arxiv.org/pdf/1803.07913v1.pdf | |
PWC | https://paperswithcode.com/paper/hats-histograms-of-averaged-time-surfaces-for |
Repo | https://github.com/muzishen/awesome-vehicle_reid-dataset |
Framework | none |
Targeted Syntactic Evaluation of Language Models
Title | Targeted Syntactic Evaluation of Language Models |
Authors | Rebecca Marvin, Tal Linzen |
Abstract | We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM’s accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model. |
Tasks | CCG Supertagging, Language Modelling |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.09031v1 |
http://arxiv.org/pdf/1808.09031v1.pdf | |
PWC | https://paperswithcode.com/paper/targeted-syntactic-evaluation-of-language |
Repo | https://github.com/icewing1996/bert-syntax |
Framework | none |
A Unified Model with Structured Output for Fashion Images Classification
Title | A Unified Model with Structured Output for Fashion Images Classification |
Authors | Beatriz Quintino Ferreira, Luís Baía, João Faria, Ricardo Gamelas Sousa |
Abstract | A picture is worth a thousand words. Albeit a clich'e, for the fashion industry, an image of a clothing piece allows one to perceive its category (e.g., dress), sub-category (e.g., day dress) and properties (e.g., white colour with floral patterns). The seasonal nature of the fashion industry creates a highly dynamic and creative domain with evermore data, making it unpractical to manually describe a large set of images (of products). In this paper, we explore the concept of visual recognition for fashion images through an end-to-end architecture embedding the hierarchical nature of the annotations directly into the model. Towards that goal, and inspired by the work of [7], we have modified and adapted the original architecture proposal. Namely, we have removed the message passing layer symmetry to cope with Farfetch category tree, added extra layers for hierarchy level specificity, and moved the message passing layer into an enriched latent space. We compare the proposed unified architecture against state-of-the-art models and demonstrate the performance advantage of our model for structured multi-level categorization on a dataset of about 350k fashion product images. |
Tasks | |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09445v1 |
http://arxiv.org/pdf/1806.09445v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-model-with-structured-output-for |
Repo | https://github.com/dpaddon/product_image_categorisation |
Framework | none |
ViTac: Feature Sharing between Vision and Tactile Sensing for Cloth Texture Recognition
Title | ViTac: Feature Sharing between Vision and Tactile Sensing for Cloth Texture Recognition |
Authors | Shan Luo, Wenzhen Yuan, Edward Adelson, Anthony G. Cohn, Raul Fuentes |
Abstract | Vision and touch are two of the important sensing modalities for humans and they offer complementary information for sensing the environment. Robots could also benefit from such multi-modal sensing ability. In this paper, addressing for the first time (to the best of our knowledge) texture recognition from tactile images and vision, we propose a new fusion method named Deep Maximum Covariance Analysis (DMCA) to learn a joint latent space for sharing features through vision and tactile sensing. The features of camera images and tactile data acquired from a GelSight sensor are learned by deep neural networks. But the learned features are of a high dimensionality and are redundant due to the differences between the two sensing modalities, which deteriorates the perception performance. To address this, the learned features are paired using maximum covariance analysis. Results of the algorithm on a newly collected dataset of paired visual and tactile data relating to cloth textures show that a good recognition performance of greater than 90% can be achieved by using the proposed DMCA framework. In addition, we find that the perception performance of either vision or tactile sensing can be improved by employing the shared representation space, compared to learning from unimodal data. |
Tasks | |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07490v2 |
http://arxiv.org/pdf/1802.07490v2.pdf | |
PWC | https://paperswithcode.com/paper/vitac-feature-sharing-between-vision-and |
Repo | https://github.com/jettdlee/vis_tac_cross_modal |
Framework | tf |
SAI, a Sensible Artificial Intelligence that plays Go
Title | SAI, a Sensible Artificial Intelligence that plays Go |
Authors | Francesco Morandin, Gianluca Amato, Rosa Gini, Carlo Metta, Maurizio Parton, Gian-Carlo Pascutto |
Abstract | We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero paradigm. The winrate as a function of the komi is modeled with a two-parameters sigmoid function, so that the neural network must predict just one more variable to assess the winrate for all komi values. A second novel feature is that training is based on self-play games that occasionally branch – with changed komi – when the position is uneven. With this setting, reinforcement learning is showed to work on 7x7 Go, obtaining very strong playing agents. As a useful byproduct, the sigmoid parameters given by the network allow to estimate the score difference on the board, and to evaluate how much the game is decided. |
Tasks | |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.03928v2 |
http://arxiv.org/pdf/1809.03928v2.pdf | |
PWC | https://paperswithcode.com/paper/sai-a-sensible-artificial-intelligence-that |
Repo | https://github.com/sai-dev/sai |
Framework | none |
Tangent-Space Regularization for Neural-Network Models of Dynamical Systems
Title | Tangent-Space Regularization for Neural-Network Models of Dynamical Systems |
Authors | Fredrik Bagge Carlson, Rolf Johansson, Anders Robertsson |
Abstract | This work introduces the concept of tangent space regularization for neural-network models of dynamical systems. The tangent space to the dynamics function of many physical systems of interest in control applications exhibits useful properties, e.g., smoothness, motivating regularization of the model Jacobian along system trajectories using assumptions on the tangent space of the dynamics. Without assumptions, large amounts of training data are required for a neural network to learn the full non-linear dynamics without overfitting. We compare different network architectures on one-step prediction and simulation performance and investigate the propensity of different architectures to learn models with correct input-output Jacobian. Furthermore, the influence of $L_2$ weight regularization on the learned Jacobian eigenvalue spectrum, and hence system stability, is investigated. |
Tasks | |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09919v1 |
http://arxiv.org/pdf/1806.09919v1.pdf | |
PWC | https://paperswithcode.com/paper/tangent-space-regularization-for-neural |
Repo | https://github.com/baggepinnen/JacProp.jl |
Framework | none |
Image Completion on CIFAR-10
Title | Image Completion on CIFAR-10 |
Authors | Mason Swofford |
Abstract | This project performed image completion on CIFAR-10, a dataset of 60,000 32x32 RGB images, using three different neural network architectures: fully convolutional networks, convolutional networks with fully connected layers, and encoder-decoder convolutional networks. The highest performing model was a deep fully convolutional network, which was able to achieve a mean squared error of .015 when comparing the original image pixel values with the predicted pixel values. As well, this network was able to output in-painted images which appeared real to the human eye. |
Tasks | |
Published | 2018-10-07 |
URL | http://arxiv.org/abs/1810.03213v1 |
http://arxiv.org/pdf/1810.03213v1.pdf | |
PWC | https://paperswithcode.com/paper/image-completion-on-cifar-10 |
Repo | https://github.com/mswoff/Image-Completion-on-CIFAR-10 |
Framework | tf |