Paper Group AWR 361
ART: A machine learning Automated Recommendation Tool for synthetic biology. Localization of Fake News Detection via Multitask Transfer Learning. AdaOja: Adaptive Learning Rates for Streaming PCA. MixConv: Mixed Depthwise Convolutional Kernels. GhostNet: More Features from Cheap Operations. A Causal Inference Method for Reducing Gender Bias in Word …
ART: A machine learning Automated Recommendation Tool for synthetic biology
Title | ART: A machine learning Automated Recommendation Tool for synthetic biology |
Authors | Tijana Radivojević, Zak Costello, Kenneth Workman, Hector Garcia Martin |
Abstract | Biology has changed radically in the last two decades, transitioning from a descriptive science into a design science. Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, and fatty acids. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing. |
Tasks | |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11091v2 |
https://arxiv.org/pdf/1911.11091v2.pdf | |
PWC | https://paperswithcode.com/paper/art-a-machine-learning-automated |
Repo | https://github.com/JBEI/ART |
Framework | none |
Localization of Fake News Detection via Multitask Transfer Learning
Title | Localization of Fake News Detection via Multitask Transfer Learning |
Authors | Jan Christian Blaise Cruz, Julianne Agatha Tan, Charibeth Cheng |
Abstract | The use of the internet as a fast medium of spreading fake news reinforces the need for computational tools that combat it. Techniques that train fake news classifiers exist, but they all assume an abundance of resources including large labeled datasets and expert-curated corpora, which low-resource languages may not have. In this paper, we show that Transfer Learning (TL) can be used to train robust fake news classifiers from little data, achieving 91% accuracy on a fake news dataset in the low-resourced Filipino language, reducing the error by 14% compared to established few-shot baselines. Furthermore, lifting ideas from multitask learning, we show that augmenting transformer-based transfer techniques with auxiliary language modeling losses improves their performance by adapting to stylometry. Using this, we improve TL performance by 4-6%, achieving an accuracy of 96% on our best model. We perform ablations that establish the causality of attention-based TL techniques to state-of-the-art results, as well as the model’s capability to learn and predict via stylometry. Lastly, we show that our method generalizes well to different types of news articles, including political news, entertainment news, and opinion articles. |
Tasks | Fake News Detection, Language Modelling, Transfer Learning |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09295v2 |
https://arxiv.org/pdf/1910.09295v2.pdf | |
PWC | https://paperswithcode.com/paper/localization-of-fake-news-detection-via |
Repo | https://github.com/jcblaisecruz02/Tagalog-fake-news |
Framework | none |
AdaOja: Adaptive Learning Rates for Streaming PCA
Title | AdaOja: Adaptive Learning Rates for Streaming PCA |
Authors | Amelia Henriksen, Rachel Ward |
Abstract | Oja’s algorithm has been the cornerstone of streaming methods in Principal Component Analysis (PCA) since it was first proposed in 1982. However, Oja’s algorithm does not have a standardized choice of learning rate (step size) that both performs well in practice and truly conforms to the online streaming setting. In this paper, we propose a new learning rate scheme for Oja’s method called AdaOja. This new algorithm requires only a single pass over the data and does not depend on knowing properties of the data set a priori. AdaOja is a novel variation of the Adagrad algorithm to Oja’s algorithm in the single eigenvector case and extended to the multiple eigenvector case. We demonstrate for dense synthetic data, sparse real-world data and dense real-world data that AdaOja outperforms common learning rate choices for Oja’s method. We also show that AdaOja performs comparably to state-of-the-art algorithms (History PCA and Streaming Power Method) in the same streaming PCA setting. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.12115v2 |
https://arxiv.org/pdf/1905.12115v2.pdf | |
PWC | https://paperswithcode.com/paper/adaoja-adaptive-learning-rates-for-streaming |
Repo | https://github.com/aamcbee/AdaOja |
Framework | none |
MixConv: Mixed Depthwise Convolutional Kernels
Title | MixConv: Mixed Depthwise Convolutional Kernels |
Authors | Mingxing Tan, Quoc V. Le |
Abstract | Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. To demonstrate the effectiveness of MixConv, we integrate it into AutoML search space and develop a new family of models, named as MixNets, which outperform previous mobile models including MobileNetV2 [20] (ImageNet top-1 accuracy +4.2%), ShuffleNetV2 [16] (+3.5%), MnasNet [26] (+1.3%), ProxylessNAS [2] (+2.2%), and FBNet [27] (+2.0%). In particular, our MixNet-L achieves a new state-of-the-art 78.9% ImageNet top-1 accuracy under typical mobile settings (<600M FLOPS). Code is at https://github.com/ tensorflow/tpu/tree/master/models/official/mnasnet/mixnet |
Tasks | AutoML, Image Classification, Object Detection |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09595v3 |
https://arxiv.org/pdf/1907.09595v3.pdf | |
PWC | https://paperswithcode.com/paper/mixnet-mixed-depthwise-convolutional-kernels |
Repo | https://github.com/zsef123/MixNet-PyTorch |
Framework | pytorch |
GhostNet: More Features from Cheap Operations
Title | GhostNet: More Features from Cheap Operations |
Authors | Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu |
Abstract | Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design. This paper proposes a novel Ghost module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. Ghost bottlenecks are designed to stack Ghost modules, and then the lightweight GhostNet can be easily established. Experiments conducted on benchmarks demonstrate that the proposed Ghost module is an impressive alternative of convolution layers in baseline models, and our GhostNet can achieve higher recognition performance (e.g. $75.7%$ top-1 accuracy) than MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012 classification dataset. Code is available at https://github.com/huawei-noah/ghostnet |
Tasks | Image Classification |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.11907v2 |
https://arxiv.org/pdf/1911.11907v2.pdf | |
PWC | https://paperswithcode.com/paper/ghostnet-more-features-from-cheap-operations |
Repo | https://github.com/iamhankai/ghostnet |
Framework | tf |
A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations
Title | A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations |
Authors | Zekun Yang, Juan Feng |
Abstract | Word embedding has become essential for natural language processing as it boosts empirical performances of various tasks. However, recent research discovers that gender bias is incorporated in neural word embeddings, and downstream tasks that rely on these biased word vectors also produce gender-biased results. While some word-embedding gender-debiasing methods have been developed, these methods mainly focus on reducing gender bias associated with gender direction and fail to reduce the gender bias presented in word embedding relations. In this paper, we design a causal and simple approach for mitigating gender bias in word vector relation by utilizing the statistical dependency between gender-definition word embeddings and gender-biased word embeddings. Our method attains state-of-the-art results on gender-debiasing tasks, lexical- and sentence-level evaluation tasks, and downstream coreference resolution tasks. |
Tasks | Causal Inference, Coreference Resolution, Word Embeddings |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10787v1 |
https://arxiv.org/pdf/1911.10787v1.pdf | |
PWC | https://paperswithcode.com/paper/a-causal-inference-method-for-reducing-gender |
Repo | https://github.com/KunkunYang/GenderBiasHSR |
Framework | tf |
Backprop with Approximate Activations for Memory-efficient Network Training
Title | Backprop with Approximate Activations for Memory-efficient Network Training |
Authors | Ayan Chakrabarti, Benjamin Moseley |
Abstract | Training convolutional neural network models is memory intensive since back-propagation requires storing activations of all intermediate layers. This presents a practical concern when seeking to deploy very deep architectures in production, especially when models need to be frequently re-trained on updated datasets. In this paper, we propose a new implementation for back-propagation that significantly reduces memory usage, by enabling the use of approximations with negligible computational cost and minimal effect on training performance. The algorithm reuses common buffers to temporarily store full activations and compute the forward pass exactly. It also stores approximate per-layer copies of activations, at significant memory savings, that are used in the backward pass. Compared to simply approximating activations within standard back-propagation, our method limits accumulation of errors across layers. This allows the use of much lower-precision approximations without affecting training accuracy. Experiments on CIFAR-10, CIFAR-100, and ImageNet show that our method yields performance close to exact training, while storing activations compactly with as low as 4-bit precision. |
Tasks | |
Published | 2019-01-23 |
URL | https://arxiv.org/abs/1901.07988v2 |
https://arxiv.org/pdf/1901.07988v2.pdf | |
PWC | https://paperswithcode.com/paper/backprop-with-approximate-activations-for |
Repo | https://github.com/ayanc/blpa |
Framework | tf |
A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams
Title | A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams |
Authors | Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez |
Abstract | With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people’s lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF’s eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale. |
Tasks | Text Classification |
Published | 2019-05-18 |
URL | https://arxiv.org/abs/1905.08772v1 |
https://arxiv.org/pdf/1905.08772v1.pdf | |
PWC | https://paperswithcode.com/paper/a-text-classification-framework-for-simple |
Repo | https://github.com/sergioburdisso/pyss3 |
Framework | none |
Salient Object Detection in the Deep Learning Era: An In-Depth Survey
Title | Salient Object Detection in the Deep Learning Era: An In-Depth Survey |
Authors | Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, Ruigang Yang |
Abstract | As an important problem in computer vision, salient object detection (SOD) from images has been attracting an increasing amount of research effort over the years. Recent advances in SOD, not surprisingly, are dominantly led by deep learning-based solutions (named deep SOD) and reflected by hundreds of published papers. To facilitate the in-depth understanding of deep SODs, in this paper we provide a comprehensive survey covering various aspects ranging from algorithm taxonomy to unsolved open issues. In particular, we first review deep SOD algorithms from different perspectives including network architecture, level of supervision, learning paradigm and object/instance level detection. Following that, we summarize existing SOD evaluation datasets and metrics. Then, we carefully compile a thorough benchmark results of SOD methods based on previous work, and provide detailed analysis of the comparison results. Moreover, we study the performance of SOD algorithms under different attributes, which have been barely explored previously, by constructing a novel SOD dataset with rich attribute annotations. We further analyze, for the first time in the field, the robustness and transferability of deep SOD models w.r.t. adversarial attacks. We also look into the influence of input perturbations, and the generalization and hardness of existing SOD datasets. Finally, we discuss several open issues and challenges of SOD, and point out possible research directions in future. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are made publicly available at https://github.com/wenguanwang/SODsurvey. |
Tasks | Object Detection, Saliency Prediction, Salient Object Detection |
Published | 2019-04-19 |
URL | https://arxiv.org/abs/1904.09146v3 |
https://arxiv.org/pdf/1904.09146v3.pdf | |
PWC | https://paperswithcode.com/paper/salient-object-detection-in-the-deep-learning |
Repo | https://github.com/wenguanwang/SODsurvey |
Framework | none |
Pedestrian Detection in Thermal Images using Saliency Maps
Title | Pedestrian Detection in Thermal Images using Saliency Maps |
Authors | Debasmita Ghose, Shasvat Mukeshkumar Desai, Sneha Bhattacharya, Deep Chakraborty, Madalina Fiterau, Tauhidur Rahman |
Abstract | Thermal images are mainly used to detect the presence of people at night or in bad lighting conditions, but perform poorly at daytime. To solve this problem, most state-of-the-art techniques employ a fusion network that uses features from paired thermal and color images. Instead, we propose to augment thermal images with their saliency maps, to serve as an attention mechanism for the pedestrian detector especially during daytime. We investigate how such an approach results in improved performance for pedestrian detection using only thermal images, eliminating the need for paired color images. For our experiments, we train the Faster R-CNN for pedestrian detection and report the added effect of saliency maps generated using static and deep methods (PiCA-Net and R3-Net). Our best performing model results in an absolute reduction of miss rate by 13.4% and 19.4% over the baseline in day and night images respectively. We also annotate and release pixel level masks of pedestrians on a subset of the KAIST Multispectral Pedestrian Detection dataset, which is a first publicly available dataset for salient pedestrian detection. |
Tasks | Pedestrian Detection, Salient Object Detection |
Published | 2019-04-15 |
URL | http://arxiv.org/abs/1904.06859v1 |
http://arxiv.org/pdf/1904.06859v1.pdf | |
PWC | https://paperswithcode.com/paper/pedestrian-detection-in-thermal-images-using |
Repo | https://github.com/Information-Fusion-Lab-Umass/Salient-Pedestrian-Detection |
Framework | pytorch |
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
Title | MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation |
Authors | Yazan Abu Farha, Juergen Gall |
Abstract | Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional approaches follow a two-step pipeline, by generating frame-wise probabilities and then feeding them to high-level temporal models, recent approaches use temporal convolutions to directly classify the video frames. In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. Each stage features a set of dilated temporal convolutions to generate an initial prediction that is refined by the next one. This architecture is trained using a combination of a classification loss and a proposed smoothing loss that penalizes over-segmentation errors. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our model achieves state-of-the-art results on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. |
Tasks | action segmentation |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.01945v2 |
http://arxiv.org/pdf/1903.01945v2.pdf | |
PWC | https://paperswithcode.com/paper/ms-tcn-multi-stage-temporal-convolutional |
Repo | https://github.com/yabufarha/ms-tcn |
Framework | pytorch |
The Benefits of Over-parameterization at Initialization in Deep ReLU Networks
Title | The Benefits of Over-parameterization at Initialization in Deep ReLU Networks |
Authors | Devansh Arpit, Yoshua Bengio |
Abstract | It has been noted in existing literature that over-parameterization in ReLU networks generally improves performance. While there could be several factors involved behind this, we prove some desirable theoretical properties at initialization which may be enjoyed by ReLU networks. Specifically, it is known that He initialization in deep ReLU networks asymptotically preserves variance of activations in the forward pass and variance of gradients in the backward pass for infinitely wide networks, thus preserving the flow of information in both directions. Our paper goes beyond these results and shows novel properties that hold under He initialization: i) the norm of hidden activation of each layer is equal to the norm of the input, and, ii) the norm of weight gradient of each layer is equal to the product of norm of the input vector and the error at output layer. These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound. As we show, this lower bound depends on the depth of the network and the number of samples, and by the virtue of being a lower bound, over-parameterized ReLU networks are endowed with these desirable properties. For the aforementioned hidden activation norm property under He initialization, we further extend our theory and show that this property holds for a finite width network even when the number of data samples is infinite. Thus we overcome several limitations of existing papers, and show new properties of deep ReLU networks at initialization. |
Tasks | |
Published | 2019-01-11 |
URL | https://arxiv.org/abs/1901.03611v3 |
https://arxiv.org/pdf/1901.03611v3.pdf | |
PWC | https://paperswithcode.com/paper/the-benefits-of-over-parameterization-at |
Repo | https://github.com/devansharpit/overparametrization_benefits |
Framework | none |
Infrastructure-Agnostic Hypertext
Title | Infrastructure-Agnostic Hypertext |
Authors | Jakob Voß |
Abstract | This paper presents a novel and formal interpretation of the original vision of hypertext: infrastructure-agnostic hypertext is independent from specific standards such as data formats and network protocols. Its model is illustrated with examples and references to existing technologies that allow for implementation and integration in current information infrastructures such as the Internet. |
Tasks | |
Published | 2019-06-29 |
URL | https://arxiv.org/abs/1907.00259v1 |
https://arxiv.org/pdf/1907.00259v1.pdf | |
PWC | https://paperswithcode.com/paper/infrastructure-agnostic-hypertext |
Repo | https://github.com/jakobib/hypertext2019 |
Framework | none |
Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning
Title | Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning |
Authors | Michael Lutter, Christian Ritter, Jan Peters |
Abstract | Deep learning has achieved astonishing results on many tasks with large amounts of data and generalization within the proximity of training data. For many important real-world applications, these requirements are unfeasible and additional prior knowledge on the task domain is required to overcome the resulting problems. In particular, learning physics models for model-based control requires robust extrapolation from fewer samples - often collected online in real-time - and model errors may lead to drastic damages of the system. Directly incorporating physical insight has enabled us to obtain a novel deep model learning approach that extrapolates well while requiring fewer samples. As a first example, we propose Deep Lagrangian Networks (DeLaN) as a deep network structure upon which Lagrangian Mechanics have been imposed. DeLaN can learn the equations of motion of a mechanical system (i.e., system dynamics) with a deep network efficiently while ensuring physical plausibility. The resulting DeLaN network performs very well at robot tracking control. The proposed method did not only outperform previous model learning approaches at learning speed but exhibits substantially improved and more robust extrapolation to novel trajectories and learns online in real-time |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04490v1 |
https://arxiv.org/pdf/1907.04490v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-lagrangian-networks-using-physics-as-1 |
Repo | https://github.com/powertj/EECS545_Project_DeLaN |
Framework | none |
PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English
Title | PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English |
Authors | Kelechi Ogueji, Orevaoghene Ahia |
Abstract | Over 800 languages are spoken across West Africa. Despite the obvious diversity among people who speak these languages, one language significantly unifies them all - West African Pidgin English. There are at least 80 million speakers of West African Pidgin English. However, there is no known natural language processing (NLP) work on this language. In this work, we perform the first NLP work on the most popular variant of the language, providing three major contributions. First, the provision of a Pidgin corpus of over 56000 sentences, which is the largest we know of. Secondly, the training of the first ever cross-lingual embedding between Pidgin and English. This aligned embedding will be helpful in the performance of various downstream tasks between English and Pidgin. Thirdly, the training of an Unsupervised Neural Machine Translation model between Pidgin and English which achieves BLEU scores of 7.93 from Pidgin to English, and 5.18 from English to Pidgin. In all, this work greatly reduces the barrier of entry for future NLP works on West African Pidgin English. |
Tasks | Machine Translation |
Published | 2019-12-07 |
URL | https://arxiv.org/abs/1912.03444v1 |
https://arxiv.org/pdf/1912.03444v1.pdf | |
PWC | https://paperswithcode.com/paper/pidginunmt-unsupervised-neural-machine |
Repo | https://github.com/Kelechukwu1/PidginUNMT |
Framework | pytorch |