Paper Group ANR 163
Joint Multiclass Debiasing of Word Embeddings. Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback. Constraining the recent star formation history of galaxies : an Approximate Bayesian Computation approach. Teacher-Student chain for efficient semi-supervised histology image classification. One Explanation Does Not Fit Al …
Joint Multiclass Debiasing of Word Embeddings
Title | Joint Multiclass Debiasing of Word Embeddings |
Authors | Radomir Popović, Florian Lemmerich, Markus Strohmaier |
Abstract | Bias in Word Embeddings has been a subject of recent interest, along with efforts for its reduction. Current approaches show promising progress towards debiasing single bias dimensions such as gender or race. In this paper, we present a joint multiclass debiasing approach that is capable of debiasing multiple bias dimensions simultaneously. In that direction, we present two approaches, HardWEAT and SoftWEAT, that aim to reduce biases by minimizing the scores of the Word Embeddings Association Test (WEAT). We demonstrate the viability of our methods by debiasing Word Embeddings on three classes of biases (religion, gender and race) in three different publicly available word embeddings and show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings. Our work strengthens the foundation for more unbiased neural representations of textual data. |
Tasks | Word Embeddings |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.11520v1 |
https://arxiv.org/pdf/2003.11520v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-multiclass-debiasing-of-word-embeddings |
Repo | |
Framework | |
Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback
Title | Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback |
Authors | Binyamin Manela |
Abstract | Learning optimal policies from sparse feedback is a known challenge in reinforcement learning. Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm that comes to solve such tasks. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode and then generalizes from that virtual goal to real goals. HER has known flaws and is limited to relatively simple tasks. In this thesis, we present three algorithms based on the existing HER algorithm that improves its performances. First, we prioritize virtual goals from which the agent will learn more valuable information. We call this property the \textit{instructiveness} of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we designed a filtering process that detects and removes misleading samples that may induce bias throughout the learning process. Lastly, we enable the learning of complex, sequential, tasks using a form of curriculum learning combined with HER. We call this algorithm \textit{Curriculum HER}. To test our algorithms, we built three challenging manipulation environments with sparse reward functions. Each environment has three levels of complexity. Our empirical results show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm. |
Tasks | Multi-Goal Reinforcement Learning |
Published | 2020-01-12 |
URL | https://arxiv.org/abs/2001.03877v1 |
https://arxiv.org/pdf/2001.03877v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-complex-1 |
Repo | |
Framework | |
Constraining the recent star formation history of galaxies : an Approximate Bayesian Computation approach
Title | Constraining the recent star formation history of galaxies : an Approximate Bayesian Computation approach |
Authors | G. Aufort, L. Ciesla, P. Pudlo, V. Buat |
Abstract | [Abridged] Although galaxies are found to follow a tight relation between their star formation rate and stellar mass, they are expected to exhibit complex star formation histories (SFH), with short-term fluctuations. The goal of this pilot study is to present a method that will identify galaxies that are undergoing a strong variation of star formation activity in the last tens to hundreds Myr. In other words, the proposed method will determine whether a variation in the last few hundreds of Myr of the SFH is needed to properly model the SED rather than a smooth normal SFH. To do so, we analyze a sample of COSMOS galaxies using high signal-to-noise ratio broad band photometry. We apply Approximate Bayesian Computation, a state-of-the-art statistical method to perform model choice, associated to machine learning algorithms to provide the probability that a flexible SFH is preferred based on the observed flux density ratios of galaxies. We present the method and test it on a sample of simulated SEDs. The input information fed to the algorithm is a set of broadband UV to NIR (rest-frame) flux ratios for each galaxy. The method has an error rate of 21% in recovering the right SFH and is sensitive to SFR variations larger than 1 dex. A more traditional SED fitting method using CIGALE is tested to achieve the same goal, based on fits comparisons through Bayesian Information Criterion but the best error rate obtained is higher, 28%. We apply our new method to the COSMOS galaxies sample. The stellar mass distribution of galaxies with a strong to decisive evidence against the smooth delayed-$\tau$ SFH peaks at lower M* compared to galaxies where the smooth delayed-$\tau$ SFH is preferred. We discuss the fact that this result does not come from any bias due to our training. Finally, we argue that flexible SFHs are needed to be able to cover that largest SFR-M* parameter space possible. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07815v1 |
https://arxiv.org/pdf/2002.07815v1.pdf | |
PWC | https://paperswithcode.com/paper/constraining-the-recent-star-formation |
Repo | |
Framework | |
Teacher-Student chain for efficient semi-supervised histology image classification
Title | Teacher-Student chain for efficient semi-supervised histology image classification |
Authors | Shayne Shaw, Maciej Pajak, Aneta Lisowska, Sotirios A Tsaftaris, Alison Q O’Neil |
Abstract | Deep learning shows great potential for the domain of digital pathology. An automated digital pathology system could serve as a second reader, perform initial triage in large screening studies, or assist in reporting. However, it is expensive to exhaustively annotate large histology image databases, since medical specialists are a scarce resource. In this paper, we apply the semi-supervised teacher-student knowledge distillation technique proposed by Yalniz et al. (2019) to the task of quantifying prognostic features in colorectal cancer. We obtain accuracy improvements through extending this approach to a chain of students, where each student’s predictions are used to train the next student i.e. the student becomes the teacher. Using the chain approach, and only 0.5% labelled data (the remaining 99.5% in the unlabelled pool), we match the accuracy of training on 100% labelled data. At lower percentages of labelled data, similar gains in accuracy are seen, allowing some recovery of accuracy even from a poor initial choice of labelled training set. In conclusion, this approach shows promise for reducing the annotation burden, thus increasing the affordability of automated digital pathology systems. |
Tasks | Image Classification |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.08797v2 |
https://arxiv.org/pdf/2003.08797v2.pdf | |
PWC | https://paperswithcode.com/paper/teacher-student-chain-for-efficient-semi |
Repo | |
Framework | |
One Explanation Does Not Fit All: The Promise of Interactive Explanations for Machine Learning Transparency
Title | One Explanation Does Not Fit All: The Promise of Interactive Explanations for Machine Learning Transparency |
Authors | Kacper Sokol, Peter Flach |
Abstract | The need for transparency of predictive systems based on Machine Learning algorithms arises as a consequence of their ever-increasing proliferation in the industry. Whenever black-box algorithmic predictions influence human affairs, the inner workings of these algorithms should be scrutinised and their decisions explained to the relevant stakeholders, including the system engineers, the system’s operators and the individuals whose case is being decided. While a variety of interpretability and explainability methods is available, none of them is a panacea that can satisfy all diverse expectations and competing objectives that might be required by the parties involved. We address this challenge in this paper by discussing the promises of Interactive Machine Learning for improved transparency of black-box systems using the example of contrastive explanations – a state-of-the-art approach to Interpretable Machine Learning. Specifically, we show how to personalise counterfactual explanations by interactively adjusting their conditional statements and extract additional explanations by asking follow-up “What if?” questions. Our experience in building, deploying and presenting this type of system allowed us to list desired properties as well as potential limitations, which can be used to guide the development of interactive explainers. While customising the medium of interaction, i.e., the user interface comprising of various communication channels, may give an impression of personalisation, we argue that adjusting the explanation itself and its content is more important. To this end, properties such as breadth, scope, context, purpose and target of the explanation have to be considered, in addition to explicitly informing the explainee about its limitations and caveats… |
Tasks | Interpretable Machine Learning |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09734v1 |
https://arxiv.org/pdf/2001.09734v1.pdf | |
PWC | https://paperswithcode.com/paper/one-explanation-does-not-fit-all-the-promise |
Repo | |
Framework | |
Robustness Verification for Transformers
Title | Robustness Verification for Transformers |
Authors | Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh |
Abstract | Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding model behavior and obtaining safety guarantees. However, previous methods can usually only handle neural networks with relatively simple architectures. In this paper, we consider the robustness verification problem for Transformers. Transformers have complex self-attention layers that pose many challenges for verification, including cross-nonlinearity and cross-position dependency, which have not been discussed in previous works. We resolve these challenges and develop the first robustness verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound Propagation. These bounds also shed light on interpreting Transformers as they consistently reflect the importance of different words in sentiment analysis. |
Tasks | Sentiment Analysis |
Published | 2020-02-16 |
URL | https://arxiv.org/abs/2002.06622v1 |
https://arxiv.org/pdf/2002.06622v1.pdf | |
PWC | https://paperswithcode.com/paper/robustness-verification-for-transformers-1 |
Repo | |
Framework | |
Likelihood-free inference of experimental Neutrino Oscillations using Neural Spline Flows
Title | Likelihood-free inference of experimental Neutrino Oscillations using Neural Spline Flows |
Authors | Sebastian Pina-Otey, Federico Sánchez, Vicens Gaitan |
Abstract | We discuss the application of Neural Spline Flows, a neural density estimation algorithm, to the likelihood-free inference problem of the measurement of neutrino oscillation parameters in Long Base Line neutrino experiments. A method adapted to physics parameter inference is developed and applied to the case of the disappearance muon neutrino analysis at the T2K experiment. |
Tasks | Density Estimation |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09436v1 |
https://arxiv.org/pdf/2002.09436v1.pdf | |
PWC | https://paperswithcode.com/paper/likelihood-free-inference-of-experimental |
Repo | |
Framework | |
EEG fingerprinting: subject specific signature based on the aperiodic component of power spectrum
Title | EEG fingerprinting: subject specific signature based on the aperiodic component of power spectrum |
Authors | Matteo Demuru, Matteo Fraschini |
Abstract | During the last few years, there has been growing interest in the effects induced by individual variability on activation patterns and brain connectivity. The practical implications of individual variability is of basic relevance for both group level and subject level studies. The Electroencephalogram (EEG), still represents one of the most used recording techniques to investigate a wide range of brain related features. In this work, we aim to estimate the effect of individual variability on a set of very simple and easily interpretable features extracted from the EEG power spectra. In particular, in an identification scenario, we investigated how the aperiodic (1/f background) component of the EEG power spectra can accurately identify subjects from a large EEG dataset. The results of this study show that the aperiodic component of the EEG signal is characterized by strong subject-specific properties, that this feature is consistent across different experimental conditions (eyes-open and eyes-closed) and outperforms the canonically-defined frequency bands. These findings suggest that the simple features (slope and offset) extracted from the aperiodic component of the EEG signal are sensitive to individual traits and may help to characterize and make inferences at single subject level. |
Tasks | EEG |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09424v1 |
https://arxiv.org/pdf/2001.09424v1.pdf | |
PWC | https://paperswithcode.com/paper/eeg-fingerprinting-subject-specific-signature |
Repo | |
Framework | |
Detecting Deepfakes with Metric Learning
Title | Detecting Deepfakes with Metric Learning |
Authors | Akash Kumar, Arnav Bhavsar |
Abstract | With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenario and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable. |
Tasks | Face Swapping, Metric Learning |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08645v1 |
https://arxiv.org/pdf/2003.08645v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-deepfakes-with-metric-learning |
Repo | |
Framework | |
Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation
Title | Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation |
Authors | Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits |
Abstract | State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on domains with little supervised data. As data collection is expensive and infeasible in many cases, unsupervised domain adaptation methods are needed. We apply an Iterative Back Translation (IBT) training scheme on in-domain monolingual data, which repeatedly uses a Transformer-based NMT model to create in-domain pseudo-parallel sentence pairs in one translation direction on the fly and then use them to train the model in the other direction. Evaluated on three domains of German-to-English translation task with no supervised data, this simple technique alone (without any out-of-domain parallel data) can already surpass all previous domain adaptation methods—up to +9.48 BLEU over the strongest previous method, and up to +27.77 BLEU over the unadapted baseline. Moreover, given available supervised out-of-domain data on German-to-English and Romanian-to-English language pairs, we can further enhance the performance and obtain up to +19.31 BLEU improvement over the strongest baseline, and +47.69 BLEU increment against the unadapted model. |
Tasks | Domain Adaptation, Machine Translation, Unsupervised Domain Adaptation |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.08140v1 |
https://arxiv.org/pdf/2001.08140v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-domain-adaptation-for-neural-2 |
Repo | |
Framework | |
Parallel Machine Translation with Disentangled Context Transformer
Title | Parallel Machine Translation with Disentangled Context Transformer |
Authors | Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu |
Abstract | State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 directions with varying data sizes demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average. |
Tasks | Machine Translation |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05136v1 |
https://arxiv.org/pdf/2001.05136v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-machine-translation-with |
Repo | |
Framework | |
GraphBGS: Background Subtraction via Recovery of Graph Signals
Title | GraphBGS: Background Subtraction via Recovery of Graph Signals |
Authors | Jhony H. Giraldo, Thierry Bouwmans |
Abstract | Graph-based algorithms have been successful approaching the problems of unsupervised and semi-supervised learning. Recently, the theory of graph signal processing and semi-supervised learning have been combined leading to new developments and insights in the field of machine learning. In this paper, concepts of recovery of graph signals and semi-supervised learning are introduced in the problem of background subtraction. We propose a new algorithm named GraphBGS, this method uses a Mask R-CNN for instances segmentation; temporal median filter for background initialization; motion, texture, color, and structural features for representing the nodes of a graph; k-nearest neighbors for the construction of the graph; and finally a semi-supervised method inspired from the theory of recovery of graph signals to solve the problem of background subtraction. The method is evaluated on the publicly available change detection, and scene background initialization databases. Experimental results show that GraphBGS outperforms unsupervised background subtraction algorithms in some challenges of the change detection dataset. And most significantly, this method outperforms generative adversarial networks in unseen videos in some sequences of the scene background initialization database. |
Tasks | |
Published | 2020-01-17 |
URL | https://arxiv.org/abs/2001.06404v1 |
https://arxiv.org/pdf/2001.06404v1.pdf | |
PWC | https://paperswithcode.com/paper/graphbgs-background-subtraction-via-recovery |
Repo | |
Framework | |
Explore and Exploit with Heterotic Line Bundle Models
Title | Explore and Exploit with Heterotic Line Bundle Models |
Authors | Magdalena Larfors, Robin Schneider |
Abstract | We use deep reinforcement learning to explore a class of heterotic $SU(5)$ GUT models constructed from line bundle sums over Complete Intersection Calabi Yau (CICY) manifolds. We perform several experiments where A3C agents are trained to search for such models. These agents significantly outperform random exploration, in the most favourable settings by a factor of 1700 when it comes to finding unique models. Furthermore, we find evidence that the trained agents also outperform random walkers on new manifolds. We conclude that the agents detect hidden structures in the compactification data, which is partly of general nature. The experiments scale well with $h^{(1,1)}$, and may thus provide the key to model building on CICYs with large $h^{(1,1)}$. |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04817v1 |
https://arxiv.org/pdf/2003.04817v1.pdf | |
PWC | https://paperswithcode.com/paper/explore-and-exploit-with-heterotic-line |
Repo | |
Framework | |
Deep Convolutional Neural Network Model for Short-Term Electricity Price Forecasting
Title | Deep Convolutional Neural Network Model for Short-Term Electricity Price Forecasting |
Authors | Hsu-Yung Cheng, Ping-Huan Kuo, Yamin Shen, Chiou-Jye Huang |
Abstract | In the modern power market, electricity trading is an extremely competitive industry. More accurate price forecast is crucial to help electricity producers and traders make better decisions. In this paper, a novel method of convolutional neural network (CNN) is proposed to rapidly provide hourly forecasting in the energy market. To improve prediction accuracy, we divide the annual electricity price data into four categories by seasons and conduct training and forecasting for each category respectively. By comparing the proposed method with other existing methods, we find that the proposed model has achieved outstanding results, the mean absolute percentage error (MAPE) and root mean square error (RMSE) for each category are about 5.5% and 3, respectively. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.07202v1 |
https://arxiv.org/pdf/2003.07202v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-neural-network-model-for |
Repo | |
Framework | |
Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning
Title | Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning |
Authors | Liad Pollak Zuckerman, Shai Bagon, Eyal Naor, George Pisha, Michal Irani |
Abstract | When a very fast dynamic event is recorded with a low-framerate camera, the resulting video suffers from severe motion blur (due to exposure time) and motion aliasing (due to low sampling rate in time). True Temporal Super-Resolution (TSR) is more than just Temporal-Interpolation (increasing framerate). It can also recover new high temporal frequencies beyond the temporal Nyquist limit of the input video, thus resolving both motion-blur and motion-aliasing effects that temporal frame interpolation (as sophisticated as it maybe) cannot undo. In this paper we propose a “Deep Internal Learning” approach for true TSR. We train a video-specific CNN on examples extracted directly from the low-framerate input video. Our method exploits the strong recurrence of small space-time patches inside a single video sequence, both within and across different spatio-temporal scales of the video. We further observe (for the first time) that small space-time patches recur also across-dimensions of the video sequence - i.e., by swapping the spatial and temporal dimensions. In particular, the higher spatial resolution of video frames provides strong examples as to how to increase the temporal resolution of that video. Such internal video-specific examples give rise to strong self-supervision, requiring no data but the input video itself. This results in Zero-Shot Temporal-SR of complex videos, which removes both motion blur and motion aliasing, outperforming previous supervised methods trained on external video datasets. |
Tasks | Super-Resolution |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08872v1 |
https://arxiv.org/pdf/2003.08872v1.pdf | |
PWC | https://paperswithcode.com/paper/across-scales-across-dimensions-temporal |
Repo | |
Framework | |