February 1, 2020

3393 words 16 mins read

Paper Group AWR 107

MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection. YOLO Nano: a Highly Compact You Only Look Once Convolutional Neural Network for Object Detection. Gesture-to-Gesture Translation in the Wild via Category-Independent Conditional Maps. Bag of Tricks and A Strong Baseline for Deep Person Re-identification …

MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection


Title	MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection
Authors	Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, Yohei Kawaguchi
Abstract	Factory machinery is prone to failure or breakdown, resulting in significant expenses for companies. Hence, there is a rising interest in machine monitoring using different sensors including microphones. In the scientific community, the emergence of public datasets has led to advancements in acoustic detection and classification of scenes and events, but there are no public datasets that focus on the sound of industrial machines under normal and anomalous operating conditions in real factory environments. In this paper, we present a new dataset of industrial machine sounds that we call a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). Normal sounds were recorded for different types of industrial machines (i.e., valves, pumps, fans, and slide rails), and to resemble a real-life scenario, various anomalous sounds were recorded (e.g., contamination, leakage, rotating unbalance, and rail damage). The purpose of releasing the MIMII dataset is to assist the machine-learning and signal-processing community with their development of automated facility maintenance. The MIMII dataset is freely available for download at: https://zenodo.org/record/3384388
Tasks
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09347v1
PDF	https://arxiv.org/pdf/1909.09347v1.pdf
PWC	https://paperswithcode.com/paper/mimii-dataset-sound-dataset-for
Repo	https://github.com/MIMII-hitachi/mimii_baseline
Framework	tf

YOLO Nano: a Highly Compact You Only Look Once Convolutional Neural Network for Object Detection


Title	YOLO Nano: a Highly Compact You Only Look Once Convolutional Neural Network for Object Detection
Authors	Alexander Wong, Mahmoud Famuori, Mohammad Javad Shafiee, Francis Li, Brendan Chwyl, Jonathan Chung
Abstract	Object detection remains an active area of research in the field of computer vision, and considerable advances and successes has been achieved in this area through the design of deep convolutional neural networks for tackling object detection. Despite these successes, one of the biggest challenges to widespread deployment of such object detection networks on edge and mobile scenarios is the high computational and memory requirements. As such, there has been growing research interest in the design of efficient deep neural network architectures catered for edge and mobile usage. In this study, we introduce YOLO Nano, a highly compact deep convolutional neural network for the task of object detection. A human-machine collaborative design strategy is leveraged to create YOLO Nano, where principled network design prototyping, based on design principles from the YOLO family of single-shot object detection network architectures, is coupled with machine-driven design exploration to create a compact network with highly customized module-level macroarchitecture and microarchitecture designs tailored for the task of embedded object detection. The proposed YOLO Nano possesses a model size of ~4.0MB (>15.1x and >8.3x smaller than Tiny YOLOv2 and Tiny YOLOv3, respectively) and requires 4.57B operations for inference (>34% and ~17% lower than Tiny YOLOv2 and Tiny YOLOv3, respectively) while still achieving an mAP of ~69.1% on the VOC 2007 dataset (~12% and ~10.7% higher than Tiny YOLOv2 and Tiny YOLOv3, respectively). Experiments on inference speed and power efficiency on a Jetson AGX Xavier embedded module at different power budgets further demonstrate the efficacy of YOLO Nano for embedded scenarios.
Tasks	Object Detection
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01271v1
PDF	https://arxiv.org/pdf/1910.01271v1.pdf
PWC	https://paperswithcode.com/paper/yolo-nano-a-highly-compact-you-only-look-once
Repo	https://github.com/david8862/keras-YOLOv3-model-set
Framework	tf

Gesture-to-Gesture Translation in the Wild via Category-Independent Conditional Maps


Title	Gesture-to-Gesture Translation in the Wild via Category-Independent Conditional Maps
Authors	Yahui Liu, Marco De Nadai, Gloria Zen, Nicu Sebe, Bruno Lepri
Abstract	Recent works have shown Generative Adversarial Networks (GANs) to be particularly effective in image-to-image translations. However, in tasks such as body pose and hand gesture translation, existing methods usually require precise annotations, e.g. key-points or skeletons, which are time-consuming to draw. In this work, we propose a novel GAN architecture that decouples the required annotations into a category label - that specifies the gesture type - and a simple-to-draw category-independent conditional map - that expresses the location, rotation and size of the hand gesture. Our architecture synthesizes the target gesture while preserving the background context, thus effectively dealing with gesture translation in the wild. To this aim, we use an attention module and a rolling guidance approach, which loops the generated images back into the network and produces higher quality images compared to competing works. Thus, our GAN learns to generate new images from simple annotations without requiring key-points or skeleton labels. Results on two public datasets show that our method outperforms state of the art approaches both quantitatively and qualitatively. To the best of our knowledge, no work so far has addressed the gesture-to-gesture translation in the wild by requiring user-friendly annotations.
Tasks	Gesture-to-Gesture Translation
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05916v3
PDF	https://arxiv.org/pdf/1907.05916v3.pdf
PWC	https://paperswithcode.com/paper/gesture-to-gesture-translation-in-the-wild
Repo	https://github.com/yhlleo/TriangleGAN
Framework	pytorch

Bag of Tricks and A Strong Baseline for Deep Person Re-identification


Title	Bag of Tricks and A Strong Baseline for Deep Person Re-identification
Authors	Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, Wei Jiang
Abstract	This paper explores a simple and efficient baseline for person re-identification (ReID). Person re-identification (ReID) with deep neural networks has made progress and achieved high performance in recent years. However, many state-of-the-arts methods design complex network structure and concatenate multi-branch features. In the literature, some effective training tricks are briefly appeared in several papers or source codes. This paper will collect and evaluate these effective training tricks in person ReID. By combining these tricks together, the model achieves 94.5% rank-1 and 85.9% mAP on Market1501 with only using global features. Our codes and models are available at https://github.com/michuanhaohao/reid-strong-baseline.
Tasks	Person Re-Identification
Published	2019-03-17
URL	http://arxiv.org/abs/1903.07071v3
PDF	http://arxiv.org/pdf/1903.07071v3.pdf
PWC	https://paperswithcode.com/paper/bags-of-tricks-and-a-strong-baseline-for-deep
Repo	https://github.com/mangye16/ReID-Survey
Framework	pytorch

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images


Title	The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images
Authors	Ángela Casado-García, César Domínguez, Jónathan Heras, Eloy Mata, Vico Pascual
Abstract	A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection and recognition), and fine-tune them for several heterogeneous table detection datasets. Using this approach, we considerably improve the accuracy of the detection models fine-tuned from natural images (in mean a 17%, and, in the best case, up to a 60%).
Tasks	Object Detection, Table Detection
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05846v1
PDF	https://arxiv.org/pdf/1912.05846v1.pdf
PWC	https://paperswithcode.com/paper/the-benefits-of-close-domain-fine-tuning-for
Repo	https://github.com/holms-ur/fine-tuning
Framework	none


Title	SCARLET-NAS: Bridging the gap between Stability and Scalability in Weight-sharing Neural Architecture Search
Authors	Xiangxiang Chu, Bo Zhang, Jixiang Li, Qingyuan Li, Ruijun Xu
Abstract	To discover compact models of great power is the goal of neural architecture search. Previous one-shot approaches are limited by fixed-depth search spaces. Simply paralleling skip connections with other choices can make depths variable. Unfortunately, it creates a large range of perturbation for supernet training, which makes it difficult to evaluate models. In this paper, we unveil its root cause under single-path settings and tackle the problem by imposing an equivariant learnable stabilizer on each skip connection. It has threefold benefits: improved convergence, more reliable evaluation, and retained equivalence. The third benefit is of the uttermost importance for scalability. As appending stabilizers to a model doesn’t change its representational capacity, we can now evaluate the stabilized counterpart as an identical proxy. With an evolutionary search backend that treats the supernet as an evaluator, we derive a family of state-of-the-art architectures, the SCARLET (SCAlable supeRnet with Learnable Equivariant sTablizer)series, at a tremendously reduced cost compared with EfficientNet.The models and evaluation code are released online https://github.com/xiaomi-automl/ScarletNAS .
Tasks	AutoML, Image Classification, Neural Architecture Search
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06022v4
PDF	https://arxiv.org/pdf/1908.06022v4.pdf
PWC	https://paperswithcode.com/paper/scarletnas-bridging-the-gap-between
Repo	https://github.com/xiaomi-automl/SCARLET-NAS
Framework	pytorch

i-RIM applied to the fastMRI challenge


Title	i-RIM applied to the fastMRI challenge
Authors	Patrick Putzky, Dimitrios Karkalousos, Jonas Teuwen, Nikita Miriakov, Bart Bakker, Matthan Caan, Max Welling
Abstract	We, team AImsterdam, summarize our submission to the fastMRI challenge (Zbontar et al., 2018). Our approach builds on recent advances in invertible learning to infer models as presented in Putzky and Welling (2019). Both, our single-coil and our multi-coil model share the same basic architecture.
Tasks
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08952v1
PDF	https://arxiv.org/pdf/1910.08952v1.pdf
PWC	https://paperswithcode.com/paper/i-rim-applied-to-the-fastmri-challenge
Repo	https://github.com/pputzky/irim_fastMRI
Framework	pytorch

Learning Sample-Specific Models with Low-Rank Personalized Regression


Title	Learning Sample-Specific Models with Low-Rank Personalized Regression
Authors	Benjamin Lengerich, Bryon Aragam, Eric P. Xing
Abstract	Modern applications of machine learning (ML) deal with increasingly heterogeneous datasets comprised of data collected from overlapping latent subpopulations. As a result, traditional models trained over large datasets may fail to recognize highly predictive localized effects in favour of weakly predictive global patterns. This is a problem because localized effects are critical to developing individualized policies and treatment plans in applications ranging from precision medicine to advertising. To address this challenge, we propose to estimate sample-specific models that tailor inference and prediction at the individual level. In contrast to classical ML models that estimate a single, complex model (or only a few complex models), our approach produces a model personalized to each sample. These sample-specific models can be studied to understand subgroup dynamics that go beyond coarse-grained class labels. Crucially, our approach does not assume that relationships between samples (e.g. a similarity network) are known a priori. Instead, we use unmodeled covariates to learn a latent distance metric over the samples. We apply this approach to financial, biomedical, and electoral data as well as simulated data and show that sample-specific models provide fine-grained interpretations of complicated phenomena without sacrificing predictive accuracy compared to state-of-the-art models such as deep neural networks.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06939v1
PDF	https://arxiv.org/pdf/1910.06939v1.pdf
PWC	https://paperswithcode.com/paper/learning-sample-specific-models-with-low-rank
Repo	https://github.com/blengerich/personalized_regression
Framework	none

Estimating Causal Effects of Tone in Online Debates


Title	Estimating Causal Effects of Tone in Online Debates
Authors	Dhanya Sridhar, Lise Getoor
Abstract	Statistical methods applied to social media posts shed light on the dynamics of online dialogue. For example, users’ wording choices predict their persuasiveness and users adopt the language patterns of other dialogue participants. In this paper, we estimate the causal effect of reply tones in debates on linguistic and sentiment changes in subsequent responses. The challenge for this estimation is that a reply’s tone and subsequent responses are confounded by the users’ ideologies on the debate topic and their emotions. To overcome this challenge, we learn representations of ideology using generative models of text. We study debates from 4Forums and compare annotated tones of replying such as emotional versus factual, or reasonable versus attacking. We show that our latent confounder representation reduces bias in ATE estimation. Our results suggest that factual and asserting tones affect dialogue and provide a methodology for estimating causal effects from text.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04177v2
PDF	https://arxiv.org/pdf/1906.04177v2.pdf
PWC	https://paperswithcode.com/paper/estimating-causal-effects-of-tone-in-online
Repo	https://github.com/dsridhar91/debate-causal-effects
Framework	none

The Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit


Title	The Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit
Authors	El-Mahdi El-Mhamdi, Rachid Guerraoui, Andrei Kucharavy, Sergei Volodin
Abstract	The loss of a few neurons in a brain rarely results in any visible loss of function. However, the insight into what “few” means in this context is unclear. How many random neuron failures will it take to lead to a visible loss of function? In this paper, we address the fundamental question of the impact of the crash of a random subset of neurons on the overall computation of a neural network and the error in the output it produces. We study fault tolerance of neural networks subject to small random neuron/weight crash failures in a probabilistic setting. We give provable guarantees on the robustness of the network to these crashes. Our main contribution is a bound on the error in the output of a network under small random Bernoulli crashes proved by using a Taylor expansion in the continuous limit, where close-by neurons at a layer are similar. The failure mode we adopt in our model is characteristic of neuromorphic hardware, a promising technology to speed up artificial neural networks, as well as of biological networks. We show that our theoretical bounds can be used to compare the fault tolerance of different architectures and to design a regularizer improving the fault tolerance of a given architecture. We design an algorithm achieving fault tolerance using a reasonable number of neurons. In addition to the theoretical proof, we also provide experimental validation of our results and suggest a connection to the generalization capacity problem.
Tasks
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01686v2
PDF	https://arxiv.org/pdf/1902.01686v2.pdf
PWC	https://paperswithcode.com/paper/fatal-brain-damage
Repo	https://github.com/LPD-EPFL/FatalBrainDamage
Framework	tf

GOGGLES: Automatic Image Labeling with Affinity Coding


Title	GOGGLES: Automatic Image Labeling with Affinity Coding
Authors	Nilaksh Das, Sanya Chaba, Renzhi Wu, Sakshi Gandhi, Duen Horng Chau, Xu Chu
Abstract	Generating large labeled training data is becoming the biggest bottleneck in building and deploying supervised machine learning models. Recently, the data programming paradigm has been proposed to reduce the human cost in labeling training data. However, data programming relies on designing labeling functions which still requires significant domain expertise. Also, it is prohibitively difficult to write labeling functions for image datasets as it is hard to express domain knowledge using raw features for images (pixels). We propose affinity coding, a new domain-agnostic paradigm for automated training data labeling. The core premise of affinity coding is that the affinity scores of instance pairs belonging to the same class on average should be higher than those of pairs belonging to different classes, according to some affinity functions. We build the GOGGLES system that implements affinity coding for labeling image datasets by designing a novel set of reusable affinity functions for images, and propose a novel hierarchical generative model for class inference using a small development set. We compare GOGGLES with existing data programming systems on 5 image labeling tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a minimum of 71% to a maximum of 98% without requiring any extensive human annotation. In terms of end-to-end performance, GOGGLES outperforms the state-of-the-art data programming system Snuba by 21% and a state-of-the-art few-shot learning technique by 5%, and is only 7% away from the fully supervised upper bound.
Tasks	Few-Shot Learning
Published	2019-03-11
URL	https://arxiv.org/abs/1903.04552v3
PDF	https://arxiv.org/pdf/1903.04552v3.pdf
PWC	https://paperswithcode.com/paper/goggles-automatic-training-data-generation
Repo	https://github.com/chu-data-lab/GOGGLES
Framework	none


Title	Adaptive Cross-Modal Few-Shot Learning
Authors	Chen Xing, Negar Rostamzadeh, Boris N. Oreshkin, Pedro O. Pinheiro
Abstract	Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for others, the inverse might be true. Moreover, when the support from visual information is limited in image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on these two intuitions, we propose a mechanism that can adaptively combine information from both modalities according to new image categories to be learned. Through a series of experiments, we show that by this adaptive combination of the two modalities, our model outperforms current uni-modality few-shot learning methods and modality-alignment methods by a large margin on all benchmarks and few-shot scenarios tested. Experiments also show that our model can effectively adjust its focus on the two modalities. The improvement in performance is particularly large when the number of shots is very small.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Image Classification, Meta-Learning
Published	2019-02-19
URL	https://arxiv.org/abs/1902.07104v3
PDF	https://arxiv.org/pdf/1902.07104v3.pdf
PWC	https://paperswithcode.com/paper/adaptive-cross-modal-few-shot-learning
Repo	https://github.com/ElementAI/am3
Framework	tf


Title	Explainable Authorship Verification in Social Media via Attention-based Similarity Learning
Authors	Benedikt Boenninghoff, Steffen Hessler, Dorothea Kolossa, Robert M. Nickel
Abstract	Authorship verification is the task of analyzing the linguistic patterns of two or more texts to determine whether they were written by the same author or not. The analysis is traditionally performed by experts who consider linguistic features, which include spelling mistakes, grammatical inconsistencies, and stylistics for example. Machine learning algorithms, on the other hand, can be trained to accomplish the same, but have traditionally relied on so-called stylometric features. The disadvantage of such features is that their reliability is greatly diminished for short and topically varied social media texts. In this interdisciplinary work, we propose a substantial extension of a recently published hierarchical Siamese neural network approach, with which it is feasible to learn neural features and to visualize the decision-making process. For this purpose, a new large-scale corpus of short Amazon reviews for text comparison research is compiled and we show that the Siamese network topologies outperform state-of-the-art approaches that were built up on stylometric features. Our linguistic analysis of the internal attention weights of the network shows that the proposed method is indeed able to latch on to some traditional linguistic categories.
Tasks	Decision Making
Published	2019-10-17
URL	https://arxiv.org/abs/1910.08144v2
PDF	https://arxiv.org/pdf/1910.08144v2.pdf
PWC	https://paperswithcode.com/paper/explainable-authorship-verification-in-social
Repo	https://github.com/boenninghoff/AdHominem
Framework	tf

Inducing brain-relevant bias in natural language processing models


Title	Inducing brain-relevant bias in natural language processing models
Authors	Dan Schwartz, Mariya Toneva, Leila Wehbe
Abstract	Progress in natural language processing (NLP) models that estimate representations of word sequences has recently been leveraged to improve the understanding of language processing in the brain. However, these models have not been specifically designed to capture the way the brain represents language meaning. We hypothesize that fine-tuning these models to predict recordings of brain activity of people reading text will lead to representations that encode more brain-activity-relevant language information. We demonstrate that a version of BERT, a recently introduced and powerful language model, can improve the prediction of brain activity after fine-tuning. We show that the relationship between language and brain activity learned by BERT during this fine-tuning transfers across multiple participants. We also show that, for some participants, the fine-tuned representations learned from both magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) are better for predicting fMRI than the representations learned from fMRI alone, indicating that the learned representations capture brain-activity-relevant information that is not simply an artifact of the modality. While changes to language representations help the model predict brain activity, they also do not harm the model’s ability to perform downstream NLP tasks. Our findings are notable for research on language understanding in the brain.
Tasks	Language Modelling
Published	2019-10-29
URL	https://arxiv.org/abs/1911.03268v1
PDF	https://arxiv.org/pdf/1911.03268v1.pdf
PWC	https://paperswithcode.com/paper/inducing-brain-relevant-bias-in-natural
Repo	https://github.com/danrsc/bert_brain_neurips_2019
Framework	pytorch

RecVAE: a New Variational Autoencoder for Top-N Recommendations with Implicit Feedback


Title	RecVAE: a New Variational Autoencoder for Top-N Recommendations with Implicit Feedback
Authors	Ilya Shenbin, Anton Alekseev, Elena Tutubalina, Valentin Malykh, Sergey I. Nikolenko
Abstract	Recent research has shown the advantages of using autoencoders based on deep neural networks for collaborative filtering. In particular, the recently proposed Mult-VAE model, which used the multinomial likelihood variational autoencoders, has shown excellent results for top-N recommendations. In this work, we propose the Recommender VAE (RecVAE) model that originates from our research on regularization techniques for variational autoencoders. RecVAE introduces several novel ideas to improve Mult-VAE, including a novel composite prior distribution for the latent codes, a new approach to setting the $\beta$ hyperparameter for the $\beta$-VAE framework, and a new approach to training based on alternating updates. In experimental evaluation, we show that RecVAE significantly outperforms previously proposed autoencoder-based models, including Mult-VAE and RaCT, across classical collaborative filtering datasets, and present a detailed ablation study to assess our new developments. Code and models are available at https://github.com/ilya-shenbin/RecVAE.
Tasks	Recommendation Systems
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11160v1
PDF	https://arxiv.org/pdf/1912.11160v1.pdf
PWC	https://paperswithcode.com/paper/recvae-a-new-variational-autoencoder-for-top
Repo	https://github.com/ilya-shenbin/RecVAE
Framework	pytorch

Paper Group AWR 107

MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection

YOLO Nano: a Highly Compact You Only Look Once Convolutional Neural Network for Object Detection

Gesture-to-Gesture Translation in the Wild via Category-Independent Conditional Maps

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

SCARLET-NAS: Bridging the gap between Stability and Scalability in Weight-sharing Neural Architecture Search

i-RIM applied to the fastMRI challenge

Learning Sample-Specific Models with Low-Rank Personalized Regression

Estimating Causal Effects of Tone in Online Debates

The Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit

GOGGLES: Automatic Image Labeling with Affinity Coding

Adaptive Cross-Modal Few-Shot Learning

Explainable Authorship Verification in Social Media via Attention-based Similarity Learning

Inducing brain-relevant bias in natural language processing models

RecVAE: a New Variational Autoencoder for Top-N Recommendations with Implicit Feedback

Paper Group AWR 236

Paper Group AWR 146

Paper Group AWR 175