Max Coding blog

Scikit Learn

2021/08/27

Scikit Learn

線性回歸

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

x = np.linspace(0, 5, 200)
y = 1.2 * x + 0.8 + 0.5 * np.random.randn(200)

x_train, x_test, y_train, y_test = train_test_split(x, y,test_size = 0.2,random_state = 0)
x_train = x_train.reshape(len(x_train), 1)
x_test = x_test.reshape(len(x_test), 1)

model = LinearRegression()
model.fit(x_train, y_train)

y_predict = model.predict(x_test)
model.predict([[4.3],[2.5]])

plt.scatter(x_test, y_test)
plt.plot(x_test, y_predict,c = 'r')
plt.show()

函數圖形

### 程式介紹 首先先引入一些第三方套件,numpy和matplotlib幾乎是必備,今後不多做贅述,接著從sklearn.model_selection引入train_test_split,train_test_split主要功能是劃分資料,最後從linear_model引入LinearRegression方便稍後做線性回歸訓練。 接著建造假數據,x為0~5得數字,共500個,y為將x乘以1.2+0.8,且還須加上一個雜值,使數據更像真數據。然後我們將x和y資料做切割,分別代表x訓練資料x測試資料,y訓練資料,y測試資料,在train_test_split()參數裡先加入要切割的資料,在第三個參數加入每個資料的大小,範圍在0~1之間,而random_state是我們的亂數種子,可以固定我們切割資料的結果。然後我將x的訓練資料和測試資料變成n*1的矩陣,代表說我們每次輸入皆是一個資料。 接著我們的model是線性回歸,而我們使用model.fit()將此model fit,意思是使此model在函數中找到代表的線,此線是最合適所有資料點的,也就是線性回歸在做的事,找尋最佳直線。接著進行預。,我們將一開始切割好的預測資料放入,此時為了測試效果,我在後面又多加兩個資料,記得自己增加資料時,要保持n*1的矩陣,最後將點點出來,並畫出回歸直線即完成。 ## 波士頓房價預測(真實數據) ### 程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

sns.set()
boston_dataset = load_boston()
# print(boston_dataset.DESCR) # description
# print(boston_dataset.data[:5])

boston = pd.DataFrame(boston_dataset.data,columns = boston_dataset.feature_names)

# boston.head()

boston['MEDV'] = boston_dataset.target
boston.head()

sns.distplot(boston.MEDV, bins = 30)
corr_matrix = boston.corr().round(2)
plt.figure(figsize = (11.7, 8.27)) # 寬先寫
sns.heatmap(corr_matrix, annot = True) # draw
plt.figure(3)
boston.iloc[0]
x = boston.loc[:, "CRIM" : "LSTAT"].values
y = boston.MEDV.values

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
model = LinearRegression()
model.fit(x_train, y_train)

y_predict = model.predict(x_test)
plt.scatter(y_test, y_predict)
plt.xlim(0, 55)
plt.ylim(0, 55)
plt.plot([0, 55], [0, 55], c = 'r')
### 輸出圖形 ### 程式說明 這邊我們要畫的圖是matplotlib畫布出來的,所以這邊引用seaborn來做,在Scikit Learn裡的dataset非常豐富,這邊使用boston的房價,是真實數據,再來使用sns.set()將畫圖功能交給seaborn處理,並將boston的資料提取存在boston_dataset,接著運用DataFrame將資料存成像excel的表格,更容易觀看,然後我們要預測房價,所以我們多出一個項目"MEDV",代表房價,資料從boston_dataset.target中取出,且用boston.head()可以看到第一個資料,再來我們用直方圖顯現出boston房價的趨勢,其中bins代表每隔多少一個區間。 再來建造相關係數矩陣,這邊取到小數點後兩位,在將調整我們的figure視窗,並把相關係數矩陣畫出來,這邊會用深淺代表數據的趨勢,再來將我們要取的資料取出來,從"CRIM" 到 "LSTAT"所有的資料,再將此轉成numpy array的形式,y就是boston房價,也就是說我們利用"CRIM" 到 "LSTAT"的資料來預測y的房價。 再來就是切割資料和訓練資料,這邊和前面一樣,這邊就不多做贅述,最後做預測,並將圖做出來就完成了。 ## SVM ### 程式碼
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#from sklearn.datasets.samples_generator import make_blobs
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

x, y = make_blobs(n_samples = 200, centers = 3,n_features = 2,random_state = 8)
plt.scatter(x[:,0],x[:,1], alpha = 0.5, s = 100, c = y)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

clf = SVC()
clf.fit(x_train, y_train)

y_predict = clf.predict(x_test)
print(y_predict-y_test)
plt.scatter(x_test[:,0], x_test[:,1], c = y_predict-y_test)

x0 = np.arange(-10, 11, 0.02)
y0 = np.arange(-15, 15, 0.02)
X, Y = np.meshgrid(x0, y0)
P = np.c_[X.ravel(), Y.ravel()]
z = clf.predict(P)
Z = z.reshape(X.shape)
plt.contourf(X, Y, Z, alpha = 0.3)
plt.scatter(x[:,0],x[:,1], c = y)

plt.show()
### 輸出圖片 ### 程式介紹

鴛尾花(iris)

程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.datasets import load_iris

iris = load_iris()

x = iris.data
y = iris.target
x = x[:,2:]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

clf = SVC()
clf.fit(x_train, y_train)
y_predict = clf.predict(x_test)

x0 = np.linspace(0, 8, 500)
y0 = np.linspace(0, 3, 500)

X, Y = np.meshgrid(x0, y0)
P = np.c_[X.ravel(), Y.ravel()]
z = clf.predict(P)
Z = z.reshape(X.shape)

plt.contourf(X, Y, Z, alpha = 0.3)
plt.scatter(x[:,0], x[:, 1], c = y)

plt.show()

輸出圖片

### PCA ### 程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
%matplotlib inline  
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

iris = load_iris()

x = iris.data
y = iris.target

pca = PCA(n_components = 2)

pca.fit(x)
x = pca.transform(x)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)\

clf = SVC()
clf.fit(x_train, y_train)
y_predict = clf.predict(x_test)

x0 = np.linspace(-4, 8, 500)
y0 = np.linspace(-2, 3, 500)

X, Y = np.meshgrid(x0, y0)
P = np.c_[X.ravel(), Y.ravel()]
z = clf.predict(P)
Z = z.reshape(X.shape)

plt.contourf(X, Y, Z, alpha = 0.3)
plt.scatter(x[:,0], x[:, 1], c = y)

plt.show()
### 輸出圖片

by 中和高中 吳振榮
CATALOG
  1. 1. Scikit Learn
    1. 1.1. 線性回歸
      1. 1.1.1. 程式碼
      2. 1.1.2. 函數圖形
    2. 1.2. 鴛尾花(iris)
      1. 1.2.1. 程式碼
      2. 1.2.2. 輸出圖片
        1. 1.2.2.0.1. by 中和高中 吳振榮