February 1, 2020

3288 words 16 mins read

Paper Group AWR 361

ART: A machine learning Automated Recommendation Tool for synthetic biology. Localization of Fake News Detection via Multitask Transfer Learning. AdaOja: Adaptive Learning Rates for Streaming PCA. MixConv: Mixed Depthwise Convolutional Kernels. GhostNet: More Features from Cheap Operations. A Causal Inference Method for Reducing Gender Bias in Word …

ART: A machine learning Automated Recommendation Tool for synthetic biology


Title	ART: A machine learning Automated Recommendation Tool for synthetic biology
Authors	Tijana Radivojević, Zak Costello, Kenneth Workman, Hector Garcia Martin
Abstract	Biology has changed radically in the last two decades, transitioning from a descriptive science into a design science. Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, and fatty acids. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11091v2
PDF	https://arxiv.org/pdf/1911.11091v2.pdf
PWC	https://paperswithcode.com/paper/art-a-machine-learning-automated
Repo	https://github.com/JBEI/ART
Framework	none

Localization of Fake News Detection via Multitask Transfer Learning


Title	Localization of Fake News Detection via Multitask Transfer Learning
Authors	Jan Christian Blaise Cruz, Julianne Agatha Tan, Charibeth Cheng
Abstract	The use of the internet as a fast medium of spreading fake news reinforces the need for computational tools that combat it. Techniques that train fake news classifiers exist, but they all assume an abundance of resources including large labeled datasets and expert-curated corpora, which low-resource languages may not have. In this paper, we show that Transfer Learning (TL) can be used to train robust fake news classifiers from little data, achieving 91% accuracy on a fake news dataset in the low-resourced Filipino language, reducing the error by 14% compared to established few-shot baselines. Furthermore, lifting ideas from multitask learning, we show that augmenting transformer-based transfer techniques with auxiliary language modeling losses improves their performance by adapting to stylometry. Using this, we improve TL performance by 4-6%, achieving an accuracy of 96% on our best model. We perform ablations that establish the causality of attention-based TL techniques to state-of-the-art results, as well as the model’s capability to learn and predict via stylometry. Lastly, we show that our method generalizes well to different types of news articles, including political news, entertainment news, and opinion articles.
Tasks	Fake News Detection, Language Modelling, Transfer Learning
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09295v2
PDF	https://arxiv.org/pdf/1910.09295v2.pdf
PWC	https://paperswithcode.com/paper/localization-of-fake-news-detection-via
Repo	https://github.com/jcblaisecruz02/Tagalog-fake-news
Framework	none

AdaOja: Adaptive Learning Rates for Streaming PCA


Title	AdaOja: Adaptive Learning Rates for Streaming PCA
Authors	Amelia Henriksen, Rachel Ward
Abstract	Oja’s algorithm has been the cornerstone of streaming methods in Principal Component Analysis (PCA) since it was first proposed in 1982. However, Oja’s algorithm does not have a standardized choice of learning rate (step size) that both performs well in practice and truly conforms to the online streaming setting. In this paper, we propose a new learning rate scheme for Oja’s method called AdaOja. This new algorithm requires only a single pass over the data and does not depend on knowing properties of the data set a priori. AdaOja is a novel variation of the Adagrad algorithm to Oja’s algorithm in the single eigenvector case and extended to the multiple eigenvector case. We demonstrate for dense synthetic data, sparse real-world data and dense real-world data that AdaOja outperforms common learning rate choices for Oja’s method. We also show that AdaOja performs comparably to state-of-the-art algorithms (History PCA and Streaming Power Method) in the same streaming PCA setting.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12115v2
PDF	https://arxiv.org/pdf/1905.12115v2.pdf
PWC	https://paperswithcode.com/paper/adaoja-adaptive-learning-rates-for-streaming
Repo	https://github.com/aamcbee/AdaOja
Framework	none

MixConv: Mixed Depthwise Convolutional Kernels


Title	MixConv: Mixed Depthwise Convolutional Kernels
Authors	Mingxing Tan, Quoc V. Le
Abstract	Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. To demonstrate the effectiveness of MixConv, we integrate it into AutoML search space and develop a new family of models, named as MixNets, which outperform previous mobile models including MobileNetV2 [20] (ImageNet top-1 accuracy +4.2%), ShuffleNetV2 [16] (+3.5%), MnasNet [26] (+1.3%), ProxylessNAS [2] (+2.2%), and FBNet [27] (+2.0%). In particular, our MixNet-L achieves a new state-of-the-art 78.9% ImageNet top-1 accuracy under typical mobile settings (<600M FLOPS). Code is at https://github.com/ tensorflow/tpu/tree/master/models/official/mnasnet/mixnet
Tasks	AutoML, Image Classification, Object Detection
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09595v3
PDF	https://arxiv.org/pdf/1907.09595v3.pdf
PWC	https://paperswithcode.com/paper/mixnet-mixed-depthwise-convolutional-kernels
Repo	https://github.com/zsef123/MixNet-PyTorch
Framework	pytorch

GhostNet: More Features from Cheap Operations


Title	GhostNet: More Features from Cheap Operations
Authors	Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu
Abstract	Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight GhostNet can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. $75.7%$ top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https://github.com/huawei-noah/ghostnet
Tasks	Image Classification
Published	2019-11-27
URL	https://arxiv.org/abs/1911.11907v2
PDF	https://arxiv.org/pdf/1911.11907v2.pdf
PWC	https://paperswithcode.com/paper/ghostnet-more-features-from-cheap-operations
Repo	https://github.com/iamhankai/ghostnet
Framework	tf

A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations


Title	A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations
Authors	Zekun Yang, Juan Feng
Abstract	Word embedding has become essential for natural language processing as it boosts empirical performances of various tasks. However, recent research discovers that gender bias is incorporated in neural word embeddings, and downstream tasks that rely on these biased word vectors also produce gender-biased results. While some word-embedding gender-debiasing methods have been developed, these methods mainly focus on reducing gender bias associated with gender direction and fail to reduce the gender bias presented in word embedding relations. In this paper, we design a causal and simple approach for mitigating gender bias in word vector relation by utilizing the statistical dependency between gender-definition word embeddings and gender-biased word embeddings. Our method attains state-of-the-art results on gender-debiasing tasks, lexical- and sentence-level evaluation tasks, and downstream coreference resolution tasks.
Tasks	Causal Inference, Coreference Resolution, Word Embeddings
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10787v1
PDF	https://arxiv.org/pdf/1911.10787v1.pdf
PWC	https://paperswithcode.com/paper/a-causal-inference-method-for-reducing-gender
Repo	https://github.com/KunkunYang/GenderBiasHSR
Framework	tf

Backprop with Approximate Activations for Memory-efficient Network Training


Title	Backprop with Approximate Activations for Memory-efficient Network Training
Authors	Ayan Chakrabarti, Benjamin Moseley
Abstract	Training convolutional neural network models is memory intensive since back-propagation requires storing activations of all intermediate layers. This presents a practical concern when seeking to deploy very deep architectures in production, especially when models need to be frequently re-trained on updated datasets. In this paper, we propose a new implementation for back-propagation that significantly reduces memory usage, by enabling the use of approximations with negligible computational cost and minimal effect on training performance. The algorithm reuses common buffers to temporarily store full activations and compute the forward pass exactly. It also stores approximate per-layer copies of activations, at significant memory savings, that are used in the backward pass. Compared to simply approximating activations within standard back-propagation, our method limits accumulation of errors across layers. This allows the use of much lower-precision approximations without affecting training accuracy. Experiments on CIFAR-10, CIFAR-100, and ImageNet show that our method yields performance close to exact training, while storing activations compactly with as low as 4-bit precision.
Tasks
Published	2019-01-23
URL	https://arxiv.org/abs/1901.07988v2
PDF	https://arxiv.org/pdf/1901.07988v2.pdf
PWC	https://paperswithcode.com/paper/backprop-with-approximate-activations-for
Repo	https://github.com/ayanc/blpa
Framework	tf


Title	A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams
Authors	Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez
Abstract	With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people’s lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF’s eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.
Tasks	Text Classification
Published	2019-05-18
URL	https://arxiv.org/abs/1905.08772v1
PDF	https://arxiv.org/pdf/1905.08772v1.pdf
PWC	https://paperswithcode.com/paper/a-text-classification-framework-for-simple
Repo	https://github.com/sergioburdisso/pyss3
Framework	none

Salient Object Detection in the Deep Learning Era: An In-Depth Survey


Title	Salient Object Detection in the Deep Learning Era: An In-Depth Survey
Authors	Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, Ruigang Yang
Abstract	As an important problem in computer vision, salient object detection (SOD) from images has been attracting an increasing amount of research effort over the years. Recent advances in SOD, not surprisingly, are dominantly led by deep learning-based solutions (named deep SOD) and reflected by hundreds of published papers. To facilitate the in-depth understanding of deep SODs, in this paper we provide a comprehensive survey covering various aspects ranging from algorithm taxonomy to unsolved open issues. In particular, we first review deep SOD algorithms from different perspectives including network architecture, level of supervision, learning paradigm and object/instance level detection. Following that, we summarize existing SOD evaluation datasets and metrics. Then, we carefully compile a thorough benchmark results of SOD methods based on previous work, and provide detailed analysis of the comparison results. Moreover, we study the performance of SOD algorithms under different attributes, which have been barely explored previously, by constructing a novel SOD dataset with rich attribute annotations. We further analyze, for the first time in the field, the robustness and transferability of deep SOD models w.r.t. adversarial attacks. We also look into the influence of input perturbations, and the generalization and hardness of existing SOD datasets. Finally, we discuss several open issues and challenges of SOD, and point out possible research directions in future. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are made publicly available at https://github.com/wenguanwang/SODsurvey.
Tasks	Object Detection, Saliency Prediction, Salient Object Detection
Published	2019-04-19
URL	https://arxiv.org/abs/1904.09146v3
PDF	https://arxiv.org/pdf/1904.09146v3.pdf
PWC	https://paperswithcode.com/paper/salient-object-detection-in-the-deep-learning
Repo	https://github.com/wenguanwang/SODsurvey
Framework	none

Pedestrian Detection in Thermal Images using Saliency Maps


Title	Pedestrian Detection in Thermal Images using Saliency Maps
Authors	Debasmita Ghose, Shasvat Mukeshkumar Desai, Sneha Bhattacharya, Deep Chakraborty, Madalina Fiterau, Tauhidur Rahman
Abstract	Thermal images are mainly used to detect the presence of people at night or in bad lighting conditions, but perform poorly at daytime. To solve this problem, most state-of-the-art techniques employ a fusion network that uses features from paired thermal and color images. Instead, we propose to augment thermal images with their saliency maps, to serve as an attention mechanism for the pedestrian detector especially during daytime. We investigate how such an approach results in improved performance for pedestrian detection using only thermal images, eliminating the need for paired color images. For our experiments, we train the Faster R-CNN for pedestrian detection and report the added effect of saliency maps generated using static and deep methods (PiCA-Net and R3-Net). Our best performing model results in an absolute reduction of miss rate by 13.4% and 19.4% over the baseline in day and night images respectively. We also annotate and release pixel level masks of pedestrians on a subset of the KAIST Multispectral Pedestrian Detection dataset, which is a first publicly available dataset for salient pedestrian detection.
Tasks	Pedestrian Detection, Salient Object Detection
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06859v1
PDF	http://arxiv.org/pdf/1904.06859v1.pdf
PWC	https://paperswithcode.com/paper/pedestrian-detection-in-thermal-images-using
Repo	https://github.com/Information-Fusion-Lab-Umass/Salient-Pedestrian-Detection
Framework	pytorch

MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation


Title	MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Authors	Yazan Abu Farha, Juergen Gall
Abstract	Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional approaches follow a two-step pipeline, by generating frame-wise probabilities and then feeding them to high-level temporal models, recent approaches use temporal convolutions to directly classify the video frames. In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. Each stage features a set of dilated temporal convolutions to generate an initial prediction that is refined by the next one. This architecture is trained using a combination of a classification loss and a proposed smoothing loss that penalizes over-segmentation errors. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our model achieves state-of-the-art results on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.
Tasks	action segmentation
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01945v2
PDF	http://arxiv.org/pdf/1903.01945v2.pdf
PWC	https://paperswithcode.com/paper/ms-tcn-multi-stage-temporal-convolutional
Repo	https://github.com/yabufarha/ms-tcn
Framework	pytorch

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks


Title	The Benefits of Over-parameterization at Initialization in Deep ReLU Networks
Authors	Devansh Arpit, Yoshua Bengio
Abstract	It has been noted in existing literature that over-parameterization in ReLU networks generally improves performance. While there could be several factors involved behind this, we prove some desirable theoretical properties at initialization which may be enjoyed by ReLU networks. Specifically, it is known that He initialization in deep ReLU networks asymptotically preserves variance of activations in the forward pass and variance of gradients in the backward pass for infinitely wide networks, thus preserving the flow of information in both directions. Our paper goes beyond these results and shows novel properties that hold under He initialization: i) the norm of hidden activation of each layer is equal to the norm of the input, and, ii) the norm of weight gradient of each layer is equal to the product of norm of the input vector and the error at output layer. These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound. As we show, this lower bound depends on the depth of the network and the number of samples, and by the virtue of being a lower bound, over-parameterized ReLU networks are endowed with these desirable properties. For the aforementioned hidden activation norm property under He initialization, we further extend our theory and show that this property holds for a finite width network even when the number of data samples is infinite. Thus we overcome several limitations of existing papers, and show new properties of deep ReLU networks at initialization.
Tasks
Published	2019-01-11
URL	https://arxiv.org/abs/1901.03611v3
PDF	https://arxiv.org/pdf/1901.03611v3.pdf
PWC	https://paperswithcode.com/paper/the-benefits-of-over-parameterization-at
Repo	https://github.com/devansharpit/overparametrization_benefits
Framework	none

Infrastructure-Agnostic Hypertext


Title	Infrastructure-Agnostic Hypertext
Authors	Jakob Voß
Abstract	This paper presents a novel and formal interpretation of the original vision of hypertext: infrastructure-agnostic hypertext is independent from specific standards such as data formats and network protocols. Its model is illustrated with examples and references to existing technologies that allow for implementation and integration in current information infrastructures such as the Internet.
Tasks
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00259v1
PDF	https://arxiv.org/pdf/1907.00259v1.pdf
PWC	https://paperswithcode.com/paper/infrastructure-agnostic-hypertext
Repo	https://github.com/jakobib/hypertext2019
Framework	none

Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning


Title	Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning
Authors	Michael Lutter, Christian Ritter, Jan Peters
Abstract	Deep learning has achieved astonishing results on many tasks with large amounts of data and generalization within the proximity of training data. For many important real-world applications, these requirements are unfeasible and additional prior knowledge on the task domain is required to overcome the resulting problems. In particular, learning physics models for model-based control requires robust extrapolation from fewer samples - often collected online in real-time - and model errors may lead to drastic damages of the system. Directly incorporating physical insight has enabled us to obtain a novel deep model learning approach that extrapolates well while requiring fewer samples. As a first example, we propose Deep Lagrangian Networks (DeLaN) as a deep network structure upon which Lagrangian Mechanics have been imposed. DeLaN can learn the equations of motion of a mechanical system (i.e., system dynamics) with a deep network efficiently while ensuring physical plausibility. The resulting DeLaN network performs very well at robot tracking control. The proposed method did not only outperform previous model learning approaches at learning speed but exhibits substantially improved and more robust extrapolation to novel trajectories and learns online in real-time
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04490v1
PDF	https://arxiv.org/pdf/1907.04490v1.pdf
PWC	https://paperswithcode.com/paper/deep-lagrangian-networks-using-physics-as-1
Repo	https://github.com/powertj/EECS545_Project_DeLaN
Framework	none

PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English


Title	PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English
Authors	Kelechi Ogueji, Orevaoghene Ahia
Abstract	Over 800 languages are spoken across West Africa. Despite the obvious diversity among people who speak these languages, one language significantly unifies them all - West African Pidgin English. There are at least 80 million speakers of West African Pidgin English. However, there is no known natural language processing (NLP) work on this language. In this work, we perform the first NLP work on the most popular variant of the language, providing three major contributions. First, the provision of a Pidgin corpus of over 56000 sentences, which is the largest we know of. Secondly, the training of the first ever cross-lingual embedding between Pidgin and English. This aligned embedding will be helpful in the performance of various downstream tasks between English and Pidgin. Thirdly, the training of an Unsupervised Neural Machine Translation model between Pidgin and English which achieves BLEU scores of 7.93 from Pidgin to English, and 5.18 from English to Pidgin. In all, this work greatly reduces the barrier of entry for future NLP works on West African Pidgin English.
Tasks	Machine Translation
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03444v1
PDF	https://arxiv.org/pdf/1912.03444v1.pdf
PWC	https://paperswithcode.com/paper/pidginunmt-unsupervised-neural-machine
Repo	https://github.com/Kelechukwu1/PidginUNMT
Framework	pytorch