January 25, 2020

2922 words 14 mins read

Paper Group ANR 1774

Fair Predictors under Distribution Shift. End-to-end Hand Mesh Recovery from a Monocular RGB Image. A New Class of Time Dependent Latent Factor Models with Applications. Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition. Albanian Language Identification in Text Documents. Light Field Synthesis by Training Deep Network in the R …

Fair Predictors under Distribution Shift


Title	Fair Predictors under Distribution Shift
Authors	Harvineet Singh, Rina Singh, Vishwali Mhasawade, Rumi Chunara
Abstract	Recent work on fair machine learning adds to a growing set of algorithmic safeguards required for deployment in high societal impact areas. A fundamental concern with model deployment is to guarantee stable performance under changes in data distribution. Extensive work in domain adaptation addresses this concern, albeit with the notion of stability limited to that of predictive performance. We provide conditions under which a stable model both in terms of prediction and fairness performance can be trained. Building on the problem setup of causal domain adaptation, we select a subset of features for training predictors with fairness constraints such that risk with respect to an unseen target data distribution is minimized. Advantages of the approach are demonstrated on synthetic datasets and on the task of diagnosing acute kidney injury in a real-world dataset under an instance of measurement policy shift and selection bias.
Tasks	Domain Adaptation
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00677v1
PDF	https://arxiv.org/pdf/1911.00677v1.pdf
PWC	https://paperswithcode.com/paper/fair-predictors-under-distribution-shift
Repo
Framework

End-to-end Hand Mesh Recovery from a Monocular RGB Image


Title	End-to-end Hand Mesh Recovery from a Monocular RGB Image
Authors	Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, Wen Zheng
Abstract	In this paper, we present a HAnd Mesh Recovery (HAMR) framework to tackle the problem of reconstructing the full 3D mesh of a human hand from a single RGB image. In contrast to existing research on 2D or 3D hand pose estimation from RGB or/and depth image data, HAMR can provide a more expressive and useful mesh representation for monocular hand image understanding. In particular, the mesh representation is achieved by parameterizing a generic 3D hand model with shape and relative 3D joint angles. By utilizing this mesh representation, we can easily compute the 3D joint locations via linear interpolations between the vertexes of the mesh, while obtain the 2D joint locations with a projection of the 3D joints.To this end, a differentiable re-projection loss can be defined in terms of the derived representations and the ground-truth labels, thus making our framework end-to-end trainable.Qualitative experiments show that our framework is capable of recovering appealing 3D hand mesh even in the presence of severe occlusions.Quantitatively, our approach also outperforms the state-of-the-art methods for both 2D and 3D hand pose estimation from a monocular RGB image on several benchmark datasets.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2019-02-25
URL	https://arxiv.org/abs/1902.09305v3
PDF	https://arxiv.org/pdf/1902.09305v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-hand-mesh-recovery-from-a
Repo
Framework

A New Class of Time Dependent Latent Factor Models with Applications


Title	A New Class of Time Dependent Latent Factor Models with Applications
Authors	Sinead A. Williamson, Michael Minyi Zhang, Paul Damien
Abstract	In many applications, observed data are influenced by some combination of latent causes. For example, suppose sensors are placed inside a building to record responses such as temperature, humidity, power consumption and noise levels. These random, observed responses are typically affected by many unobserved, latent factors (or features) within the building such as the number of individuals, the turning on and off of electrical devices, power surges, etc. These latent factors are usually present for a contiguous period of time before disappearing; further, multiple factors could be present at a time. This paper develops new probabilistic methodology and inference methods for random object generation influenced by latent features exhibiting temporal persistence. Every datum is associated with subsets of a potentially infinite number of hidden, persistent features that account for temporal dynamics in an observation. The ensuing class of dynamic models constructed by adapting the Indian Buffet Process — a probability measure on the space of random, unbounded binary matrices — finds use in a variety of applications arising in operations, signal processing, biomedicine, marketing, image analysis, etc. Illustrations using synthetic and real data are provided.
Tasks
Published	2019-04-18
URL	http://arxiv.org/abs/1904.08548v1
PDF	http://arxiv.org/pdf/1904.08548v1.pdf
PWC	https://paperswithcode.com/paper/a-new-class-of-time-dependent-latent-factor
Repo
Framework

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition


Title	Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Authors	Johannes Michael, Roger Labahn, Tobias Grüning, Jochen Zöllner
Abstract	Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end, we propose an attention-based sequence-to-sequence model. It combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent neural network to decode the actual character sequence. We make experimental comparisons between various attention mechanisms and positional encodings, in order to find an appropriate alignment between the input and output sequence. The model can be trained end-to-end and the optional integration of a hybrid loss allows the encoder to retain an interpretable and usable output, if desired. We achieve competitive results on the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without the use of a language model, and we significantly improve over any recent sequence-to-sequence approaches.
Tasks	Image Captioning, Keyword Spotting, Language Modelling, Machine Translation, Speech Recognition
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07377v2
PDF	https://arxiv.org/pdf/1903.07377v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-sequence-to-sequence-models-for
Repo
Framework

Albanian Language Identification in Text Documents


Title	Albanian Language Identification in Text Documents
Authors	Klesti Hoxha, Artur Baxhaku
Abstract	In this work we investigate the accuracy of standard and state-of-the-art language identification methods in identifying Albanian in written text documents. A dataset consisting of news articles written in Albanian has been constructed for this purpose. We noticed a considerable decrease of accuracy when using test documents that miss the Albanian alphabet letters " "E " and " \c{C} " and created a custom training corpus that solved this problem by achieving an accuracy of more than 99%. Based on our experiments, the most performing language identification methods for Albanian use a na"ive Bayes classifier and n-gram based classification features.
Tasks	Language Identification
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04216v1
PDF	http://arxiv.org/pdf/1901.04216v1.pdf
PWC	https://paperswithcode.com/paper/albanian-language-identification-in-text
Repo
Framework

Light Field Synthesis by Training Deep Network in the Refocused Image Domain


Title	Light Field Synthesis by Training Deep Network in the Refocused Image Domain
Authors	Chang-Le Liu, Kuang-Tsu Shih, Homer H. Chen
Abstract	Light field imaging, which captures spatio-angular information of incident light on image sensor, enables many interesting applications like image refocusing and augmented reality. However, due to the limited sensor resolution, a trade-off exists between the spatial and angular resolution. To increase the angular resolution, view synthesis techniques have been adopted to generate new views from existing views. However, traditional learning-based view synthesis mainly considers the image quality of each view of the light field and neglects the quality of the refocused images. In this paper, we propose a new loss function called refocused image error (RIE) to address the issue. The main idea is that the image quality of the synthesized light field should be optimized in the refocused image domain because it is where the light field is perceived. We analyze the behavior of RIL in the spectral domain and test the performance of our approach against previous approaches on both real and software-rendered light field datasets using objective assessment metrics such as MSE, MAE, PSNR, SSIM, and GMSD. Experimental results show that the light field generated by our method results in better refocused images than previous methods.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06072v4
PDF	https://arxiv.org/pdf/1910.06072v4.pdf
PWC	https://paperswithcode.com/paper/light-field-synthesis-by-training-deep
Repo
Framework

MetalGAN: Multi-Domain Label-Less Image Synthesis Using cGANs and Meta-Learning


Title	MetalGAN: Multi-Domain Label-Less Image Synthesis Using cGANs and Meta-Learning
Authors	Tomaso Fontanini, Eleonora Iotti, Luca Donati, Andrea Prati
Abstract	Image synthesis is currently one of the most addressed image processing topic in computer vision and deep learning fields of study. Researchers have tackled this problem focusing their efforts on its several challenging problems, e.g. image quality and size, domain and pose changing, architecture of the networks, and so on. Above all, producing images belonging to different domains by using a single architecture is a very relevant goal for image generation. In fact, a single multi-domain network would allow greater flexibility and robustness in the image synthesis task than other approaches. This paper proposes a novel architecture and a training algorithm, which are able to produce multi-domain outputs using a single network. A small portion of a dataset is intentionally used, and there are no hard-coded labels (or classes). This is achieved by combining a conditional Generative Adversarial Network (cGAN) for image generation and a Meta-Learning algorithm for domain switch, and we called our approach MetalGAN. The approach has proved to be appropriate for solving the multi-domain problem and it is validated on facial attribute transfer, using CelebA dataset.
Tasks	Image Generation, Meta-Learning
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02494v1
PDF	https://arxiv.org/pdf/1912.02494v1.pdf
PWC	https://paperswithcode.com/paper/metalgan-multi-domain-label-less-image
Repo
Framework

Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion


Title	Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion
Authors	Chieh Wu, Jared Miller, Yale Chang, Mario Sznaier, Jennifer Dy
Abstract	The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence measure that has applications in various aspects of machine learning. Conveniently, the objectives of different dimensionality reduction applications using HSIC often reduce to the same optimization problem. However, the nonconvexity of the objective function arising from non-linear kernels poses a serious challenge to optimization efficiency and limits the potential of HSIC-based formulations. As a result, only linear kernels have been computationally tractable in practice. This paper proposes a spectral-based optimization algorithm that extends beyond the linear kernel. The algorithm identifies a family of suitable kernels and provides the first and second-order local guarantees when a fixed point is reached. Furthermore, we propose a principled initialization strategy, thereby removing the need to repeat the algorithm at random initialization points. Compared to state-of-the-art optimization algorithms, our empirical results on real data show a run-time improvement by as much as a factor of $10^5$ while consistently achieving lower cost and classification/clustering errors. The implementation source code is publicly available on https://github.com/endsley.
Tasks	Dimensionality Reduction
Published	2019-09-06
URL	https://arxiv.org/abs/1909.05097v1
PDF	https://arxiv.org/pdf/1909.05097v1.pdf
PWC	https://paperswithcode.com/paper/spectral-non-convex-optimization-for
Repo
Framework

Communication-Efficient and Byzantine-Robust Distributed Learning


Title	Communication-Efficient and Byzantine-Robust Distributed Learning
Authors	Avishek Ghosh, Raj Kumar Maity, Swanand Kadhe, Arya Mazumdar, Kannan Ramchandran
Abstract	We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinate-wise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of {\delta}-approximate compressors from [KRSJ19] that encompasses sign-based compressors and top-k sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or non-convex) smooth loss function. We show that, in the regime when the compression factor {\delta} is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (least-square regression) and non-convex (neural network training) problems.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09721v1
PDF	https://arxiv.org/pdf/1911.09721v1.pdf
PWC	https://paperswithcode.com/paper/communication-efficient-and-byzantine-robust
Repo
Framework

Fairness with Dynamics


Title	Fairness with Dynamics
Authors	Min Wen, Osbert Bastani, Ufuk Topcu
Abstract	It has recently been shown that if feedback effects of decisions are ignored, then imposing fairness constraints such as demographic parity or equality of opportunity can actually exacerbate unfairness. We propose to address this challenge by modeling feedback effects as the dynamics of a Markov decision processes (MDPs). First, we define analogs of fairness properties that have been proposed for supervised learning. Second, we propose algorithms for learning fair decision-making policies for MDPs. We also explore extensions to reinforcement learning, where parts of the dynamical system are unknown and must be learned without violating fairness. Finally, we demonstrate the need to account for dynamical effects using simulations on a loan applicant MDP.
Tasks	Decision Making
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08568v1
PDF	http://arxiv.org/pdf/1901.08568v1.pdf
PWC	https://paperswithcode.com/paper/fairness-with-dynamics
Repo
Framework

Probing the Information Encoded in X-vectors


Title	Probing the Information Encoded in X-vectors
Authors	Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur
Abstract	Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.
Tasks	Data Augmentation, Speaker Recognition, Speaker Verification, Text-Independent Speaker Recognition
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06351v2
PDF	https://arxiv.org/pdf/1909.06351v2.pdf
PWC	https://paperswithcode.com/paper/probing-the-information-encoded-in-x-vectors
Repo
Framework


Title	Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models
Authors	Tae-hoon Kim, Dongmin Kang, Kari Pulli, Jonghyun Choi
Abstract	High-performance visual recognition systems generally require a large collection of labeled images to train. The expensive data curation can be an obstacle for improving recognition performance. Sharing more data allows training for better models. But personal and private information in the data prevent such sharing. To promote sharing visual data for learning a recognition model, we propose to obfuscate the images so that humans are not able to recognize their detailed contents, while machines can still utilize them to train new models. We validate our approach by comprehensive experiments on three challenging visual recognition tasks; image classification, attribute classification, and facial landmark detection on several datasets including SVHN, CIFAR10, Pascal VOC 2012, CelebA, and MTFL. Our method successfully obfuscates the images from humans recognition, but a machine model trained with them performs within about 1% margin (up to 0.48%) of the performance of a model trained with the original, non-obfuscated data.
Tasks	Facial Landmark Detection, Image Classification
Published	2019-01-01
URL	https://arxiv.org/abs/1901.00098v2
PDF	https://arxiv.org/pdf/1901.00098v2.pdf
PWC	https://paperswithcode.com/paper/training-with-the-invisibles-obfuscating
Repo
Framework

CrevNet: Conditionally Reversible Video Prediction


Title	CrevNet: Conditionally Reversible Video Prediction
Authors	Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler
Abstract	Applying resolution-preserving blocks is a common practice to maximize information preservation in video prediction, yet their high memory consumption greatly limits their application scenarios. We propose CrevNet, a Conditionally Reversible Network that uses reversible architectures to build a bijective two-way autoencoder and its complementary recurrent predictor. Our model enjoys the theoretically guaranteed property of no information loss during the feature extraction, much lower memory consumption and computational efficiency.
Tasks	Video Prediction
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11577v1
PDF	https://arxiv.org/pdf/1910.11577v1.pdf
PWC	https://paperswithcode.com/paper/crevnet-conditionally-reversible-video
Repo
Framework

The Power of Batching in Multiple Hypothesis Testing


Title	The Power of Batching in Multiple Hypothesis Testing
Authors	Tijana Zrnic, Daniel L. Jiang, Aaditya Ramdas, Michael I. Jordan
Abstract	One important partition of algorithms for controlling the false discovery rate (FDR) in multiple testing is into offline and online algorithms. The first generally achieve significantly higher power of discovery, while the latter allow making decisions sequentially as well as adaptively formulating hypotheses based on past observations. Using existing methodology, it is unclear how one could trade off the benefits of these two broad families of algorithms, all the while preserving their formal FDR guarantees. To this end, we introduce $\text{Batch}{\text{BH}}$ and $\text{Batch}{\text{St-BH}}$, algorithms for controlling the FDR when a possibly infinite sequence of batches of hypotheses is tested by repeated application of one of the most widely used offline algorithms, the Benjamini-Hochberg (BH) method or Storey’s improvement of the BH method. We show that our algorithms interpolate between existing online and offline methodology, thus trading off the best of both worlds.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04968v2
PDF	https://arxiv.org/pdf/1910.04968v2.pdf
PWC	https://paperswithcode.com/paper/the-power-of-batching-in-multiple-hypothesis
Repo
Framework

Impact of Artificial Intelligence on Businesses: from Research, Innovation, Market Deployment to Future Shifts in Business Models


Title	Impact of Artificial Intelligence on Businesses: from Research, Innovation, Market Deployment to Future Shifts in Business Models
Authors	Neha Soni, Enakshi Khular Sharma, Narotam Singh, Amita Kapoor
Abstract	The fast pace of artificial intelligence (AI) and automation is propelling strategists to reshape their business models. This is fostering the integration of AI in the business processes but the consequences of this adoption are underexplored and need attention. This paper focuses on the overall impact of AI on businesses - from research, innovation, market deployment to future shifts in business models. To access this overall impact, we design a three-dimensional research model, based upon the Neo-Schumpeterian economics and its three forces viz. innovation, knowledge, and entrepreneurship. The first dimension deals with research and innovation in AI. In the second dimension, we explore the influence of AI on the global market and the strategic objectives of the businesses and finally, the third dimension examines how AI is shaping business contexts. Additionally, the paper explores AI implications on actors and its dark sides.
Tasks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.02092v1
PDF	https://arxiv.org/pdf/1905.02092v1.pdf
PWC	https://paperswithcode.com/paper/impact-of-artificial-intelligence-on
Repo
Framework