Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diagrams for rand_distr #1

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

MichaelOwenDyer
Copy link
Member

@MichaelOwenDyer MichaelOwenDyer commented Apr 10, 2024

Summary

This PR adds some plots to the documentation of distributions in rand_distr to illustrate their behavior.

Motivation

Make the distributions in rand easier to understand via visual aids.

Details

The distributions folder contains a separate Python file for each plot being created.
main.py is a script which saves all of the distribution plots into the charts folder.

Used by rust-random/rand#1434.

Charts ready for review:

  • binomial::Binomial
  • cauchy::Cauchy
  • dirichlet::Dirichlet
  • exponential::Exp
  • exponential::Exp1
  • frechet::Frechet
  • gamma::Gamma
  • gamma::ChiSquared
  • gamma::FisherF
  • gamma::StudentT
  • gamma::Beta
  • geometric::Geometric
  • geometric::StandardGeometric
  • gumbel::Gumbel
  • hypergeometric::Hypergeometric
  • inverse_gaussian::InverseGaussian
  • normal::Normal
  • normal::StandardNormal
  • normal::LogNormal
  • normal_inverse_gaussian::NormalInverseGaussian
  • pareto::Pareto
  • pert::Pert
  • poisson::Poisson
  • skew_normal::SkewNormal
  • triangular::Triangular
  • unit_ball::UnitBall
  • unit_circle::UnitCircle
  • unit_disc::UnitDisc
  • unit_sphere::UnitSphere
  • weibull::Weibull
  • zipf::Zeta
  • zipf::Zipf

@newpavlov
Copy link
Member

newpavlov commented Apr 10, 2024

Unfortunately, it does not look like browsers can open SVGZ files (tested in Firefox and Chromium) without messing with Content-Encoding, so we probably have to use plain SVGs. Fortunately, compression should be applied by default while transferring the images over HTTP(S), so the only drawback will be larger diffs and size of the repository.

@newpavlov
Copy link
Member

BTW it may be worth to change the license from Apache 2.0 to CC-BY or CC-0.

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

For the most part these plots look good.

Can you increase the default size of the SVGs? At lest, opening these stand-alone, they are small.

@dhardy
Copy link
Member

dhardy commented Apr 10, 2024

@newpavlov can you open a new issue regarding the licence? Also, we need a README, but not in this PR.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that we can now review the plots in the web interface.

The only issue I spotted at a glance is two identical plots, but let me know if you want a more complete inspection.

(Already, this is a huge improvement from what we had.)

distr/zeta.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
@MichaelOwenDyer
Copy link
Member Author

I have a few diagrams I know I still need to do / fix, which I'll work on today. I suggest the following: I will tick the checkboxes of any diagrams which are ready for review, and you (or I) can uncheck them again if there is potential for improvement 🤠

@MichaelOwenDyer
Copy link
Member Author

I also want to be sure, since these charts will be in the documentation, that they are consistent with the terminology used in the documentation with respect to naming (and potentially also examples). For instance, I noticed that the documentation for the Chi squared distribution calls the degrees of freedom parameter k and not df.

@MichaelOwenDyer
Copy link
Member Author

Also, I should ask: do we want plots of cumulative distribution functions as well, or would this be excessive? The workload would not be much more, and I'd be willing.

@dhardy
Copy link
Member

dhardy commented Apr 11, 2024

Wikipedia does often show CDFs as well as PDFs, but since one is a simple translation of the other I don't think there's much value in both. @vks thoughts?

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed:

  • Beta
  • Binomial
  • Cauchy
  • Chi squared
  • Exponential, exp1
  • Fisher
  • Frechet
  • Gamma: has six plots of uncoordinated colours; could e.g. saturation correspond to θ parameter?
  • Geometric, standard geometric
  • Gumbel
  • Hypergeometric
  • Inverse Gaussian
  • Log normal
  • Normal, standard normal
  • Normal Inverse Gaussian: our implementation does not use scale or location parameters
  • Pareto: three parametrisations is enough I think?
  • Poisson
  • Skew-normal
  • Student's T
  • Triangular
  • Unit ball
  • Unit circle: should not be solid
  • Unit disc: should be solid
  • Unit sphere: I suppose a wireframe is the best way to draw this...
  • Weibull: we use parameters λ=scale, k=shape
  • Zeta, Zipf

@MichaelOwenDyer
Copy link
Member Author

Thanks for the review. I've just started my Master's so I haven't had quite as much time to work on this but I'll implement your feedback and hopefully finally figure out how to get the Dirichlet code to work.

@MichaelOwenDyer MichaelOwenDyer marked this pull request as ready for review May 2, 2024 10:46
@MichaelOwenDyer
Copy link
Member Author

@dhardy I've updated the diagrams which you gave feedback on, have a look:

  • Gamma: Lowered the alpha values of the lines with θ = 2, and coordinated the colors. If you think the lines still look too similar to one another I can tweak it some more.
  • Normal Inverse Gaussian: Removed the scale and location parameters, chose a few different parameterizations
  • Pareto: Removed α = 4
  • Unit Circle/Disc: swapped plots
  • Weibull: Changed parameters to λ=scale, k=shape.

Also, I added the Dirichlet distribution. I couldn't figure out how to do it on my own, so I eventually gave up and used some code I found on the internet, which I cited in dirichlet.py. The plot looks a little different from the others, but I think it's not too bad. Also, I uploaded it in png format because the svg file was 22MB.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions:

  • It may be worth to enable grid for most plots. In my opinion, it makes plots easier to read.
  • I would prefer for origin points to be at zero, right now they are slightly shifted.
  • The Dirichlet plot lacks axis and color legend.
  • Ideally, the plotted distributions should be generated by rand/rand_distr. (It's not important and we can leave it for later)

@MichaelOwenDyer
Copy link
Member Author

Thanks for the review @newpavlov. I will add grid lines.
With your second point you are referring to the white margin between the axes and the actual plot right? I also considered removing this at one point. I recall that it made a plot a little bit harder to read, but I'll give it another try for all of them and we can decide together what looks nicest.
I'll see about adding axes and a legend to the Dirichlet plot.
What do you mean with your last point - that the Rust implementation should be generating the plots, not separate Python code? I do agree with that, but this is at least a start.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's all the major issues resolved, so I'll go ahead and approve. Thanks for your work on this.

from math import gamma


# Code source: https://blog.bogatron.net/blog/2014/02/02/visualizing-dirichlet-distributions/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice little article.


def save_to(directory: str, _: str):
extension = "png" # Hardcode png output format. SVG output for Dirichlet is ~22 MB, while png is ~115KB.
corners = np.array([[0, 0], [1, 0], [0.5, 0.75 ** 0.5]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that your plots use the same parameters as on Wikipedia but with a different coordinate layout. This is fine, but it would be nice to have some sort of labelling of axes on the plots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I haven't been able to figure out how to achieve that with matplotlib yet. I'll keep researching

@dhardy
Copy link
Member

dhardy commented May 3, 2024

  • Dirichlet: good (though lacking labels on the axes)
  • PERT: x-axis could be constrained to -1 .. 1, but not important

@dhardy
Copy link
Member

dhardy commented May 3, 2024

What do you mean with your last point - that the Rust implementation should be generating the plots, not separate Python code? I do agree with that, but this is at least a start.

Unlike scipy and statrs, our implementations do not implement PDF functions so this is not possible (aside from stochastically). We could do something like this later, but it falls under the title of "testing distributions", not "generating nice plots" (in my opinion).

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelOwenDyer is this ready? If so, please merge.

@MichaelOwenDyer
Copy link
Member Author

Sorry for the inactivity, I am still working on getting proper axes on the Dirichlet plot but have been so busy lately... will get it done soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants