source: weareworldquant.com

A probabilistic programming language (PPL) is created with the goal of defining probabilistic models and operating interference in them. Even though these programming languages are closely connected to Bayesian networks they are however more adaptable and revealing.

There are three basic aims of PPLs:

  • To allow users to write down a Bayesian model which consists of a generative model, unfamiliar model parameters and previous assumption regarding those parameters
  • To permit the users to determine the information that should be used
  • To automatically calculate the results using model parameters and data provided

Probabilistic programming languages should be able to accurately compute a new Bayesian model, however, there are a number of difficulties.

source: darpa.mil

Bayesian interference requires an ample amount of resources such as time, memory, processing power, and so, especially when it comes to big data or big models (a lot of unknown parameters). Many strategies have been developed to solve this problem such as.

First of all, there was an attempt to design algorithms that will work more efficiently for big datasets. These new interference algorithms would be able to perform better without becoming too computationally expensive.

Secondly, multiple computers could be used. It is also necessary to design parallel algorithms which will allow these computers to work simultaneously. They would allow for learning in “data-distributed settings” meaning that data are divided into groups or categories because of their size and each algorithm would process one of these groups.

The third strategy includes changing the goals. This means striving for approximate results instead of accurate ones since they can be easily achieved. This strategy also involves studies into different kinds of approximations including work on amortized interference and variational interference.

Interest in complex Bayesian models has been rising together with the number of tasks that have to be performed and the number of computational resources. Nowadays, when illustrating probabilistic programming language or Bayesian model, there are few methods to use new kinds of model components such as:

  • Simulators
  • Deep neural networks
  • Visual graphics engines
  • Any other computer program
source: paperswithcode.com

Still, previously designed algorithms for general use, which are included in PPLs, cannot produce accurate results of these advanced Bayesian models. Since they cannot always be applied, there is a need for developing new techniques. Due to this, probabilistic programming languages are constantly being developed so they could combine both old and new methods for sophisticated models.

Moreover, there is another problem. It is difficult to confirm the accuracy and quality of results produced by PPLs. Sometimes it is even hard to define what accurate or accurate-enough results are. In order for probabilistic programming languages to be considered a powerful part of a computational pipeline, it is necessary that they produce correct results over a certain period of time.

Lastly, let’s see how PPLs can be used. These systems can be employed to create easy-to-handle frameworks for designing new models. This is important because creating models does not only require a lot of time, but also a certain set of skills that only computer scientist or statisticians possess. It is also beneficial if PPLs are designed to be a part of an extensive machine learning system, instead of being independent systems.

To conclude, these are some main features of probabilistic programming. If you care to learn more, check out this website.