Scikit Learn
線性回歸
程式碼
1 | import numpy as np |
函數圖形
### 程式介紹
首先先引入一些第三方套件,numpy和matplotlib幾乎是必備,今後不多做贅述,接著從sklearn.model_selection引入train_test_split,train_test_split主要功能是劃分資料,最後從linear_model引入LinearRegression方便稍後做線性回歸訓練。
接著建造假數據,x為0~5得數字,共500個,y為將x乘以1.2+0.8,且還須加上一個雜值,使數據更像真數據。然後我們將x和y資料做切割,分別代表x訓練資料x測試資料,y訓練資料,y測試資料,在train_test_split()參數裡先加入要切割的資料,在第三個參數加入每個資料的大小,範圍在0~1之間,而random_state是我們的亂數種子,可以固定我們切割資料的結果。然後我將x的訓練資料和測試資料變成n*1的矩陣,代表說我們每次輸入皆是一個資料。
接著我們的model是線性回歸,而我們使用model.fit()將此model
fit,意思是使此model在函數中找到代表的線,此線是最合適所有資料點的,也就是線性回歸在做的事,找尋最佳直線。接著進行預。,我們將一開始切割好的預測資料放入,此時為了測試效果,我在後面又多加兩個資料,記得自己增加資料時,要保持n*1的矩陣,最後將點點出來,並畫出回歸直線即完成。
## 波士頓房價預測(真實數據) ### 程式碼 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
sns.set()
boston_dataset = load_boston()
# print(boston_dataset.DESCR) # description
# print(boston_dataset.data[:5])
boston = pd.DataFrame(boston_dataset.data,columns = boston_dataset.feature_names)
# boston.head()
boston['MEDV'] = boston_dataset.target
boston.head()
sns.distplot(boston.MEDV, bins = 30)
corr_matrix = boston.corr().round(2)
plt.figure(figsize = (11.7, 8.27)) # 寬先寫
sns.heatmap(corr_matrix, annot = True) # draw
plt.figure(3)
boston.iloc[0]
x = boston.loc[:, "CRIM" : "LSTAT"].values
y = boston.MEDV.values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
model = LinearRegression()
model.fit(x_train, y_train)
y_predict = model.predict(x_test)
plt.scatter(y_test, y_predict)
plt.xlim(0, 55)
plt.ylim(0, 55)
plt.plot([0, 55], [0, 55], c = 'r')1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#from sklearn.datasets.samples_generator import make_blobs
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
x, y = make_blobs(n_samples = 200, centers = 3,n_features = 2,random_state = 8)
plt.scatter(x[:,0],x[:,1], alpha = 0.5, s = 100, c = y)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
clf = SVC()
clf.fit(x_train, y_train)
y_predict = clf.predict(x_test)
print(y_predict-y_test)
plt.scatter(x_test[:,0], x_test[:,1], c = y_predict-y_test)
x0 = np.arange(-10, 11, 0.02)
y0 = np.arange(-15, 15, 0.02)
X, Y = np.meshgrid(x0, y0)
P = np.c_[X.ravel(), Y.ravel()]
z = clf.predict(P)
Z = z.reshape(X.shape)
plt.contourf(X, Y, Z, alpha = 0.3)
plt.scatter(x[:,0],x[:,1], c = y)
plt.show()
鴛尾花(iris)
程式碼
1 | import numpy as np |
輸出圖片
### PCA ### 程式碼
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
iris = load_iris()
x = iris.data
y = iris.target
pca = PCA(n_components = 2)
pca.fit(x)
x = pca.transform(x)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)\
clf = SVC()
clf.fit(x_train, y_train)
y_predict = clf.predict(x_test)
x0 = np.linspace(-4, 8, 500)
y0 = np.linspace(-2, 3, 500)
X, Y = np.meshgrid(x0, y0)
P = np.c_[X.ravel(), Y.ravel()]
z = clf.predict(P)
Z = z.reshape(X.shape)
plt.contourf(X, Y, Z, alpha = 0.3)
plt.scatter(x[:,0], x[:, 1], c = y)
plt.show()