**Why do not you get an estimation for the value of π? **

The only thing you should do is to draw parallel straight lines on the sheet of paper with the distance of the length of the needle you have. Then you have to drop the needle on the sheet. With some simplification let’s separate the outcome depending on the case when the dropped needle crosses or does not touch any line you just drew. If you repeat this experiment by a couple of thousand times then you have to calculate the ratio of outcomes with crossing a line against the total number of trials. This will your estimation for π.

This approach was originally proposed by **Georges-Louis Leclerc, Comte de Buffon** in the 18th century. This is probably the most famous demonstration of application of geometry in managing of probability theory (or statistics).

Anyway, you have to spend some with this to get close to the actual accuracy to the current existing estimation (which is calculated today up to 12.1 trillion digits by Alexander J. Yee and Shigeru Kondo – according to their update on Number of digits of pi). (If you do not use trillions every day: the actual value of π contains 10^12 digits!!!)

As it is described by Wikipedia that this experimental design is absolutely suitable to commit the so called confirmation bias, a

“a tendency to search for, interpret, favour and recall information in a way that confirms one’s pre-existing beliefs or hypotheses, while giving disproportionately less consideration to alternative possibilities.”

*Lazzarini, an Italian mathematician performed Buffon’s needle experiment with tossing a needle 3408 times and achieving the already “famous” approximation of 355/113 for π*. He artificially set-up such an environment where he could expect 113*n/213 as the estimation (n denotes the length of the trial, that is the number of needle drops). He had to repeat the whole experiment only 16 times to reproduce the magic 355/113. However, it is important to note, that Lazzarini did not do anything wrong or unethically. He committed a bias – he was imprudent in some measure – but to avoid confirmation bias is a real challenge for majority of the researchers.

If you are interested in this experiment, you can check this with the help of easy-to-access tools, like Java applets or Flash. Probably the nicest solution can be found at http://www.ventrella.com/buffon/ (by Jeffrey Ventrella), while a simpler, but still very impressive solution can be found at http://www.metablake.com/pi.swf .

## What is Data Visualization?

Data Visualization is a way of representing complex data and stats in a pleasing, visually-appealing way. Visual data may include components like pie and graph charts, maps or tables, and can be presented in different forms, such as infographics, videos, illustrations and interactive reports.

Why is it important? The answer is simple. Our brains absorb visual information better, faster, more easily.

### Benefits of Data Visualization

The benefits of visualizing data include:

- providing clearer information for clients
- making it easier to view and analyze patterns and trends
- enabling interaction with the data
- allowing for more information to be absorbed, and more quickly
- better identify peaks and troughs.

Sitepoint’s arcticle is going to assess how a new tool, Google Data Studio, can help us build beautiful and interactive reports.

## Google Data Studio

Google Data Studio (GDS) is a new tool by Google that makes it easy to create beautiful, engaging, responsive, branded and interactive reports. It does this by pulling metrics from Google’s properties, such as Google Analytics, Adwords and YouTube Analytics, as well as spreadsheets and SQL databases.

For the article, the author will be using Data Studio to create a visual report using Google Analytics data. To do this, you first need to have an active Google Analytics property that is properly integrated with the website.

The same applies to other reports. If you wish to pull the data from your Adwords or YouTube Analytics, make sure to sign in with an appropriate Google account that has that data.

Read the Setup Guide on Sitepoint: Here

The well-known quote from Andrew Lang reads as follows: „The statistician *uses statistics* as a drunken man *uses lamp posts*—for support rather than illumination.”. It is easy for a mathematician or a statistician to interpret the result of a statistical analysis with caution, but one, who is only interested in the result and less familiar with the mathematical background of the used methods, can easily jump to a wrong conclusion. The simplest example, which points out, why prudence is needed in the implementation of statsitical results, is the Simpson’s paradox (described firstly by Edward H. Simpson in 1951)

Consider the following study. A new drug is being tested on a group of 800 people (400 men and 400 women) with a particular disease. The aim is to establish whether there is a link between taking the drug and recovery from the disease. In a standard scenario half of the people (randomly selected) are given the drug and the other half are given placebo. The results in the following table show that, of the 400 given the drug, 200 (50 %) recover from the disease; this compares favourably with just 160 out of the 400 (40 %) given the placebo who recover.

Drug taken | No | Yes |

Recovered | ||

No | 240 | 200 |

Yes | 160 | 200 |

Recovery rate | 40% | 50% |

So clearly we can conclude that the drug has positive effect. Or can we? A more detailed look at the data results in exactly the opposite conclusion. Specifically, the following table shows the results when brokan down into male and female subjects.

Sex | Female | Male | ||

Drug taken | No | Yes | No | Yes |

Recovered | ||||

No | 210 | 80 | 30 | 120 |

Yes | 90 | 20 | 70 | 180 |

Recovery rate | 30 % | 20 % | 70 % | 60 % |

Focusing first on he men, we find that 70 % taking the palcebo recover, but only 60 % taking the drug recover. So, formen, the recovery rate is better without the drug. Similarly, with the women we find that 30 % taking the palcebo recover, but only 20 % taking the drug recover. So, for women, the recovery rate is also better without the drug. So we can conclude, in every subcategory the drug is worse than the placebo.

The process of drilling down into the data this way (in this case by looking at men and women separately) is called *stratification*. Simpson’s paradox is simply the observation that, on the same data, stratified versions of the data can produce the opposite result to non-stratified versions. Often, there is a *causal *explanation. In this case men are much more likely to recover naturally from this disease than women. Although an equal number of subjects overall were given the drug as were given the placebo, and although there were an equal number of men and women overall in the trial, the drug was ** not **equally distributed between men and women. More men than women were given the drug. Because of the men’s higher natural recovery rate, overall more people in the trial recovered when given the drug than when given the placebo.

Someone may ask the questions, ’Does this difficulty arise in more general case (e.g. if we stratify the data into more subgroup)? ’ or ’How can we avoid this kind of effects?’. For answers, an more details please refer the following articles: