Paper Group ANR 241
Convolutional Tensor-Train LSTM for Spatio-temporal Learning. SlideImages: A Dataset for Educational Image Classification. Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL. Multimodal Semantic Transfer from Text to Image. Fine-Grained Image Classification by Distributional Semantics. IPBoost …
Convolutional Tensor-Train LSTM for Spatio-temporal Learning
Title | Convolutional Tensor-Train LSTM for Spatio-temporal Learning |
Authors | Jiahao Su, Wonmin Byeon, Furong Huang, Jan Kautz, Animashree Anandkumar |
Abstract | Higher-order Recurrent Neural Networks (RNNs) are effective for long-term forecasting since such architectures can model higher-order correlations and long-term dynamics more effectively. However, higher-order models are expensive and require exponentially more parameters and operations compared with their first-order counterparts. This problem is particularly pronounced in multidimensional data such as videos. To address this issue, we propose Convolutional Tensor-Train Decomposition (CTTD), a novel tensor decomposition with convolutional operations. With CTTD, we construct Convolutional Tensor-Train LSTM (Conv-TT-LSTM) to capture higher-order space-time correlations in videos. We demonstrate that the proposed model outperforms the conventional (first-order) Convolutional LSTM (ConvLSTM) as well as the state-of-the-art ConvLSTM-based approaches in pixel-level video prediction tasks on Moving-MNIST and KTH action datasets, but with much fewer parameters. |
Tasks | Video Prediction |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09131v3 |
https://arxiv.org/pdf/2002.09131v3.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-tensor-train-lstm-for-spatio |
Repo | |
Framework | |
SlideImages: A Dataset for Educational Image Classification
Title | SlideImages: A Dataset for Educational Image Classification |
Authors | David Morris, Eric Müller-Budack, Ralph Ewerth |
Abstract | In the past few years, convolutional neural networks (CNNs) have achieved impressive results in computer vision tasks, which however mainly focus on photos with natural scene content. Besides, non-sensor derived images such as illustrations, data visualizations, figures, etc. are typically used to convey complex information or to explore large datasets. However, this kind of images has received little attention in computer vision. CNNs and similar techniques use large volumes of training data. Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. We have reserved all the actual educational images as a test dataset in order to ensure that the approaches using this dataset generalize well to new educational images, and potentially other domains. Furthermore, we present a baseline system using a standard deep neural architecture and discuss dealing with the challenge of limited training data. |
Tasks | Image Classification |
Published | 2020-01-19 |
URL | https://arxiv.org/abs/2001.06823v1 |
https://arxiv.org/pdf/2001.06823v1.pdf | |
PWC | https://paperswithcode.com/paper/slideimages-a-dataset-for-educational-image |
Repo | |
Framework | |
Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL
Title | Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL |
Authors | M. Zakaria Kurdi |
Abstract | The goal of this work is to build a classifier that can identify text complexity within the context of teaching reading to English as a Second Language (ESL) learners. To present language learners with texts that are suitable to their level of English, a set of features that can describe the phonological, morphological, lexical, syntactic, discursive, and psychological complexity of a given text were identified. Using a corpus of 6171 texts, which had already been classified into three different levels of difficulty by ESL experts, different experiments were conducted with five machine learning algorithms. The results showed that the adopted linguistic features provide a good overall classification performance (F-Score = 0.97). A scalability evaluation was conducted to test if such a classifier could be used within real applications, where it can be, for example, plugged into a search engine or a web-scraping module. In this evaluation, the texts in the test set are not only different from those from the training set but also of different types (ESL texts vs. children reading texts). Although the overall performance of the classifier decreased significantly (F-Score = 0.65), the confusion matrix shows that most of the classification errors are between the classes two and three (the middle-level classes) and that the system has a robust performance in categorizing texts of class one and four. This behavior can be explained by the difference in classification criteria between the two corpora. Hence, the observed results confirm the usability of such a classifier within a real-world application. |
Tasks | |
Published | 2020-01-07 |
URL | https://arxiv.org/abs/2001.01863v1 |
https://arxiv.org/pdf/2001.01863v1.pdf | |
PWC | https://paperswithcode.com/paper/text-complexity-classification-based-on |
Repo | |
Framework | |
Multimodal Semantic Transfer from Text to Image. Fine-Grained Image Classification by Distributional Semantics
Title | Multimodal Semantic Transfer from Text to Image. Fine-Grained Image Classification by Distributional Semantics |
Authors | Simon Donig, Maria Christoforaki, Bernhard Bermeitinger, Siegfried Handschuh |
Abstract | In the last years, image classification processes like neural networks in the area of art-history and Heritage Informatics have experienced a broad distribution (Lang and Ommer 2018). These methods face several challenges, including the handling of comparatively small amounts of data as well as high-dimensional data in the Digital Humanities. Here, a Convolutional Neural Network (CNN) is used that output is not, as usual, a series of flat text labels but a series of semantically loaded vectors. These vectors result from a Distributional Semantic Model (DSM) which is generated from an in-domain text corpus. —– In den letzten Jahren hat die Verwendung von Bildklassifizierungsverfahren wie neuronalen Netzwerken auch im Bereich der historischen Bildwissenschaften und der Heritage Informatics weite Verbreitung gefunden (Lang und Ommer 2018). Diese Verfahren stehen dabei vor einer Reihe von Herausforderungen, darunter dem Umgangmit den vergleichsweise kleinen Datenmengen sowie zugleich hochdimensionalen Da-tenr"aumen in den digitalen Geisteswissenschaften. Meist bilden diese Methoden dieKlassifizierung auf einen vergleichsweise flachen Raum ab. Dieser flache Zugang verliert im Bem"uhen um ontologische Eindeutigkeit eine Reihe von relevanten Dimensionen, darunter taxonomische, mereologische und assoziative Beziehungen zwischenden Klassen beziehungsweise dem nicht formalisierten Kontext. Dabei wird ein Convolutional Neural Network (CNN) genutzt, dessen Ausgabe im Trainingsprozess, anders als herk"ommlich, nicht auf einer Serie flacher Textlabel beruht, sondern auf einer Serie von Vektoren. Diese Vektoren resultieren aus einem Distributional Semantic Model (DSM), welches aus einem Dom"ane-Textkorpus generiert wird. |
Tasks | Fine-Grained Image Classification, Image Classification |
Published | 2020-01-07 |
URL | https://arxiv.org/abs/2001.02372v1 |
https://arxiv.org/pdf/2001.02372v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-semantic-transfer-from-text-to |
Repo | |
Framework | |
IPBoost – Non-Convex Boosting via Integer Programming
Title | IPBoost – Non-Convex Boosting via Integer Programming |
Authors | Marc E. Pfetsch, Sebastian Pokutta |
Abstract | Recently non-convex optimization approaches for solving machine learning problems have gained significant attention. In this paper we explore non-convex boosting in classification by means of integer programming and demonstrate real-world practicability of the approach while circumventing shortcomings of convex boosting approaches. We report results that are comparable to or better than the current state-of-the-art. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04679v1 |
https://arxiv.org/pdf/2002.04679v1.pdf | |
PWC | https://paperswithcode.com/paper/ipboost-non-convex-boosting-via-integer |
Repo | |
Framework | |
Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study
Title | Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study |
Authors | Po-Ming Law, Sana Malik, Fan Du, Moumita Sinha |
Abstract | Machine learning models often make predictions that bias against certain subgroups of input data. When undetected, machine learning biases can constitute significant financial and ethical implications. Semi-automated tools that involve humans in the loop could facilitate bias detection. Yet, little is known about the considerations involved in their design. In this paper, we report on an interview study with 11 machine learning practitioners for investigating the needs surrounding semi-automated bias detection tools. Based on the findings, we highlight four considerations in designing to guide system designers who aim to create future tools for bias detection. |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.07680v2 |
https://arxiv.org/pdf/2003.07680v2.pdf | |
PWC | https://paperswithcode.com/paper/designing-tools-for-semi-automated-detection |
Repo | |
Framework | |
A Survey On 3D Inner Structure Prediction from its Outer Shape
Title | A Survey On 3D Inner Structure Prediction from its Outer Shape |
Authors | Mohamed Mejri, Antoine Richard, Cédric Pradalier |
Abstract | The analysis of the internal structure of trees is highly important for both forest experts, biological scientists, and the wood industry. Traditionally, CT-scanners are considered as the most efficient way to get an accurate inner representation of the tree. However, this method requires an important investment and reduces the cost-effectiveness of this operation. Our goal is to design neural-network-based methods to predict the internal density of the tree from its external bark shape. This paper compares different image-to-image(2D), volume-to-volume(3D) and Convolutional Long Short Term Memory based neural network architectures in the context of the prediction of the defect distribution inside trees from their external bark shape. Those models are trained on a synthetic dataset of 1800 CT-scanned look-like volumetric structures of the internal density of the trees and their corresponding external surface. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04571v1 |
https://arxiv.org/pdf/2002.04571v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-3d-inner-structure-prediction |
Repo | |
Framework | |
Closure Properties for Private Classification and Online Prediction
Title | Closure Properties for Private Classification and Online Prediction |
Authors | Noga Alon, Amos Beimel, Shay Moran, Uri Stemmer |
Abstract | Let H be a class of boolean functions and consider acomposed class H’ that is derived from H using some arbitrary aggregation rule (for example, H’ may be the class of all 3-wise majority votes of functions in H). We upper bound the Littlestone dimension of H’ in terms of that of H. The bounds are proved using combinatorial arguments that exploit a connection between the Littlestone dimension and Thresholds. As a corollary, we derive closure properties for online learning and private PAC learning. The derived bounds on the Littlestone dimension exhibit an undesirable super-exponential dependence. For private learning, we prove close to optimal bounds that circumvents this suboptimal dependency. The improved bounds on the sample complexity of private learning are derived algorithmically via transforming a private learner for the original class H to a private learner for the composed class H’. Using the same ideas we show that any (proper or improper) private algorithm that learns a class of functions H in the realizable case (i.e., when the examples are labeled by some function in the class) can be transformed to a private algorithm that learns the class H in the agnostic case. |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04509v2 |
https://arxiv.org/pdf/2003.04509v2.pdf | |
PWC | https://paperswithcode.com/paper/closure-properties-for-private-classification |
Repo | |
Framework | |
An Optimal Procedure to Check Pareto-Optimality in House Markets with Single-Peaked Preferences
Title | An Optimal Procedure to Check Pareto-Optimality in House Markets with Single-Peaked Preferences |
Authors | Aurélie Beynier, Nicolas Maudet, Simon Rey, Parham Shams |
Abstract | Recently, the problem of allocating one resource per agent with initial endowments (house markets) has seen a renewed interest: indeed, while in the domain of strict preferences the Top Trading Cycle algorithm is known to be the only procedure guaranteeing Pareto-optimality, individual rationality, and strategy proofness. However, the situation differs in the single-peaked domain. Indeed, Bade presented the Crawler, an alternative procedure enjoying the same properties, with the additional advantage of being implementable in obviously dominant strategies. In this paper we further investigate the Crawler and propose the Diver, a variant which checks optimally whether an allocation is Pareto-optimal for single-peaked preferences, thus improving over known techniques used for checking Pareto-optimality in more general domains. We also prove that the Diver is asymptotically optimal in terms of communication complexity. |
Tasks | |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.11660v1 |
https://arxiv.org/pdf/2002.11660v1.pdf | |
PWC | https://paperswithcode.com/paper/an-optimal-procedure-to-check-pareto |
Repo | |
Framework | |
Adversarial Attacks on Machine Learning Systems for High-Frequency Trading
Title | Adversarial Attacks on Machine Learning Systems for High-Frequency Trading |
Authors | Micah Goldblum, Avi Schwarzschild, Ankit B. Patel, Tom Goldstein |
Abstract | Algorithmic trading systems are often completely automated, and deep learning is increasingly receiving attention in this domain. Nonetheless, little is known about the robustness properties of these models. We study valuation models for algorithmic trading from the perspective of adversarial machine learning. We introduce new attacks specific to this domain with size constraints that minimize attack costs. We further discuss how these attacks can be used as an analysis tool to study and evaluate the robustness properties of financial models. Finally, we investigate the feasibility of realistic adversarial attacks in which an adversarial trader fools automated trading systems into making inaccurate predictions. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09565v2 |
https://arxiv.org/pdf/2002.09565v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-attacks-on-machine-learning |
Repo | |
Framework | |
BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization
Title | BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization |
Authors | Henry B. Moss, Vatsal Aggarwal, Nishant Prateek, Javier González, Roberto Barra-Chicote |
Abstract | We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. Here, the task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. We demonstrate that there does not exist a one-size-fits-all adaptation strategy, with convincing synthesis requiring a corpus-specific configuration of the hyper-parameters that control fine-tuning. By using Bayesian optimization to efficiently optimize these hyper-parameter values for a target speaker, we are able to perform adaptation with an average 30% improvement in speaker similarity over standard techniques. Results indicate, across multiple corpora, that BOFFIN TTS can learn to synthesize new speakers using less than ten minutes of audio, achieving the same naturalness as produced for the speakers used to train the base model. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01953v1 |
https://arxiv.org/pdf/2002.01953v1.pdf | |
PWC | https://paperswithcode.com/paper/boffin-tts-few-shot-speaker-adaptation-by |
Repo | |
Framework | |
Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms
Title | Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms |
Authors | Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi |
Abstract | We characterize Bayesian regret in a stochastic multi-armed bandit problem with a large but finite number of arms. In particular, we assume the number of arms $k$ is $T^{\alpha}$, where $T$ is the time-horizon and $\alpha$ is in $(0,1)$. We consider a Bayesian setting where the reward distribution of each arm is drawn independently from a common prior, and provide a complete analysis of expected regret with respect to this prior. Our results exhibit a sharp distinction around $\alpha = 1/2$. When $\alpha < 1/2$, the fundamental lower bound on regret is $\Omega(k)$; and it is achieved by a standard UCB algorithm. When $\alpha > 1/2$, the fundamental lower bound on regret is $\Omega(\sqrt{T})$, and it is achieved by an algorithm that first subsamples $\sqrt{T}$ arms uniformly at random, then runs UCB on just this subset. Interestingly, we also find that a sufficiently large number of arms allows the decision-maker to benefit from “free” exploration if she simply uses a greedy algorithm. In particular, this greedy algorithm exhibits a regret of $\tilde{O}(\max(k,T/\sqrt{k}))$, which translates to a {\em sublinear} (though not optimal) regret in the time horizon. We show empirically that this is because the greedy algorithm rapidly disposes of underperforming arms, a beneficial trait in the many-armed regime. Technically, our analysis of the greedy algorithm involves a novel application of the Lundberg inequality, an upper bound for the ruin probability of a random walk; this approach may be of independent interest. |
Tasks | Multi-Armed Bandits |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10121v1 |
https://arxiv.org/pdf/2002.10121v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-and-greedy-algorithms-for-multi-armed |
Repo | |
Framework | |
Multi-source Domain Adaptation for Visual Sentiment Classification
Title | Multi-source Domain Adaptation for Visual Sentiment Classification |
Authors | Chuang Lin, Sicheng Zhao, Lei Meng, Tat-Seng Chua |
Abstract | Existing domain adaptation methods on visual sentiment classification typically are investigated under the single-source scenario, where the knowledge learned from a source domain of sufficient labeled data is transferred to the target domain of loosely labeled or unlabeled data. However, in practice, data from a single source domain usually have a limited volume and can hardly cover the characteristics of the target domain. In this paper, we propose a novel multi-source domain adaptation (MDA) method, termed Multi-source Sentiment Generative Adversarial Network (MSGAN), for visual sentiment classification. To handle data from multiple source domains, it learns to find a unified sentiment latent space where data from both the source and target domains share a similar distribution. This is achieved via cycle consistent adversarial learning in an end-to-end manner. Extensive experiments conducted on four benchmark datasets demonstrate that MSGAN significantly outperforms the state-of-the-art MDA approaches for visual sentiment classification. |
Tasks | Domain Adaptation, Sentiment Analysis |
Published | 2020-01-12 |
URL | https://arxiv.org/abs/2001.03886v1 |
https://arxiv.org/pdf/2001.03886v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-source-domain-adaptation-for-visual |
Repo | |
Framework | |
End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines
Title | End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines |
Authors | Zhuolin Yang, Zhikuan Zhao, Hengzhi Pei, Boxin Wang, Bojan Karlas, Ji Liu, Heng Guo, Bo Li, Ce Zhang |
Abstract | As machine learning (ML) being applied to many mission-critical scenarios, certifying ML model robustness becomes increasingly important. Many previous works focuses on the robustness of independent ML and ensemble models, and can only certify a very small magnitude of the adversarial perturbation. In this paper, we take a different viewpoint and improve learning robustness by going beyond independent ML and ensemble models. We aim at promoting the generic Sensing-Reasoning machine learning pipeline which contains both the sensing (e.g. deep neural networks) and reasoning (e.g. Markov logic networks (MLN)) components enriched with domain knowledge. Can domain knowledge help improve learning robustness? Can we formally certify the end-to-end robustness of such an ML pipeline? We first theoretically analyze the computational complexity of checking the provable robustness in the reasoning component. We then derive the provable robustness bound for several concrete reasoning components. We show that for reasoning components such as MLN and a specific family of Bayesian networks it is possible to certify the robustness of the whole pipeline even with a large magnitude of perturbation which cannot be certified by existing work. Finally, we conduct extensive real-world experiments on large scale datasets to evaluate the certified robustness for Sensing-Reasoning ML pipelines. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2003.00120v2 |
https://arxiv.org/pdf/2003.00120v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-robustness-for-sensing-reasoning |
Repo | |
Framework | |
Channels’ Confirmation and Predictions’ Confirmation: from the Medical Test to the Raven Paradox
Title | Channels’ Confirmation and Predictions’ Confirmation: from the Medical Test to the Raven Paradox |
Authors | Chenguang Lu |
Abstract | After long arguments between positivism and falsificationism, the verification of universal hypotheses was replaced with the confirmation of uncertain major premises. Unfortunately, Hemple discovered the Raven Paradox (RP). Then, Carnap used the logical probability increment as the confirmation measure. So far, many confirmation measures have been proposed. Measure F among them proposed by Kemeny and Oppenheim possesses symmetries and asymmetries proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and normalizing property suggested by many researchers. Based on the semantic information theory, a measure b* similar to F is derived from the medical test. Like the likelihood ratio, b* and F can only indicate the quality of channels or the testing means instead of the quality of probability predictions. And, it is still not easy to use b*, F, or another measure to clarify the RP. For this reason, measure c* similar to the correct rate is derived. The c* has the simple form: (a-c)/max(a, c); it supports the Nicod Criterion and undermines the Equivalence Condition, and hence, can be used to eliminate the RP. Some examples are provided to show why it is difficult to use one of popular confirmation measures to eliminate the RP. Measure F, b*, and c* indicate that fewer counterexamples’ existence is more essential than more positive examples’ existence, and hence, are compatible with Popper’s falsification thought. |
Tasks | |
Published | 2020-01-17 |
URL | https://arxiv.org/abs/2001.07566v1 |
https://arxiv.org/pdf/2001.07566v1.pdf | |
PWC | https://paperswithcode.com/paper/channels-confirmation-and-predictions |
Repo | |
Framework | |