Monter un histogramme avec du python

Question

J'ai un histogramme

H=hist(my_data,bins=my_bin,histtype='step',color='r')

Je peux voir que la forme est presque gaussienne mais je voudrais adapter cet histogramme à une fonction gaussienne et imprimer la valeur de la moyenne et du sigma que je reçois. Pouvez-vous m'aider?

joaquin · Answer

Ici vous avez un exemple travaillant sur py2.6 et py3.2:

from scipy.stats import norm import matplotlib.mlab as mlab import matplotlib.pyplot as plt # read data from a text file. One number per line Arch = "test/Log(2)_ACRatio.txt" datos = [] for item in open(Arch,'r'): item = item.strip() if item != '': try: datos.append(float(item)) except ValueError: pass # best fit of data (mu, sigma) = norm.fit(datos) # the histogram of the data n, bins, patches = plt.hist(datos, 60, normed=1, facecolor='green', alpha=0.75) # add a 'best fit' line y = mlab.normpdf( bins, mu, sigma) l = plt.plot(bins, y, 'r--', linewidth=2) #plot plt.xlabel('Smarts') plt.ylabel('Probability') plt.title(r'$\mathrm{Histogram\ of\ IQ:}\ \mu=%.3f,\ \sigma=%.3f$' %(mu, sigma)) plt.grid(True) plt.show()

enter image description here

Ralph · Answer

Voici un exemple qui utilise scipy.optimize pour s’adapter à des fonctions non linéaires comme une gaussienne, même lorsque les données figurent dans un histogramme dont la plage n’est pas bien définie, de sorte qu’une estimation moyenne simple échoue. Une constante de décalage entraînerait également l'échec des statistiques normales simples (il suffit de supprimer p [3] et c [3] pour les données gaussiennes simples).

from pylab import * from numpy import loadtxt from scipy.optimize import leastsq fitfunc = lambda p, x: p[0]*exp(-0.5*((x-p[1])/p[2])**2)+p[3] errfunc = lambda p, x, y: (y - fitfunc(p, x)) filename = "gaussdata.csv" data = loadtxt(filename,skiprows=1,delimiter=',') xdata = data[:,0] ydata = data[:,1] init = [1.0, 0.5, 0.5, 0.5] out = leastsq( errfunc, init, args=(xdata, ydata)) c = out[0] print "A exp[-0.5((x-mu)/sigma)^2] + k " print "Parent Coefficients:" print "1.000, 0.200, 0.300, 0.625" print "Fit Coefficients:" print c[0],c[1],abs(c[2]),c[3] plot(xdata, fitfunc(c, xdata)) plot(xdata, ydata) title(r'$A = %.3f\ \mu = %.3f\ \sigma = %.3f\ k = %.3f $' %(c[0],c[1],abs(c[2]),c[3])); show()

Sortie:

A exp[-0.5((x-mu)/sigma)^2] + k Parent Coefficients: 1.000, 0.200, 0.300, 0.625 Fit Coefficients: 0.961231625289 0.197254597618 0.293989275502 0.65370344131

gaussian plot with fit

Bouliech · Answer

Voici une autre solution utilisant uniquement les packages matplotlib.pyplot et numpy . Cela ne fonctionne que pour l’ajustement gaussien. Il est basé sur estimation du maximum de vraisemblance et a déjà été mentionné dans ce topic . Voici le code correspondant:

# Python version : 2.7.9 from __future__ import division import numpy as np from matplotlib import pyplot as plt # For the explanation, I simulate the data : N=1000 data = np.random.randn(N) # But in reality, you would read data from file, for example with : #data = np.loadtxt("data.txt") # Empirical average and variance are computed avg = np.mean(data) var = np.var(data) # From that, we know the shape of the fitted Gaussian. pdf_x = np.linspace(np.min(data),np.max(data),100) pdf_y = 1.0/np.sqrt(2*np.pi*var)*np.exp(-0.5*(pdf_x-avg)**2/var) # Then we plot : plt.figure() plt.hist(data,30,normed=True) plt.plot(pdf_x,pdf_y,'k--') plt.legend(("Fit","Data"),"best") plt.show()

et ici est la sortie.

Xavier Guihot · Answer

À partir de Python 3.8, la bibliothèque standard fournit l'objet NormalDist dans le cadre du module statistics .

L'objet NormalDist peut être construit à partir d'un ensemble de données avec la méthode NormalDist.from_samples et permet d'accéder à ses mean ( NormalDist.mean ) et à écart type ( NormalDist.stdev ) :

from statistics import NormalDist # data = [0.7237248252340628, 0.6402731706462489, -1.0616113628912391, -1.7796451823371144, -0.1475852030122049, 0.5617952240065559, -0.6371760932160501, -0.7257277223562687, 1.699633029946764, 0.2155375969350495, -0.33371076371293323, 0.1905125348631894, -0.8175477853425216, -1.7549449090704003, -0.512427115804309, 0.9720486316086447, 0.6248742504909869, 0.7450655841312533, -0.1451632129830228, -1.0252663611514108] norm = NormalDist.from_samples(data) # NormalDist(mu=-0.12836704320073597, sigma=0.9240861018557649) norm.mean # -0.12836704320073597 norm.stdev # 0.9240861018557649