MDS – Manufacturing Data Science 製造數據科學 Solved

Description

5/5 – (1 vote)

Assignment 3

Please solve the following questions and justify your answer. Show all your analysis result including equation/calculation or Python code in your report. Upload your “zip” file including MSWord/PDF report and Python code with 檔名: MDS_Assignment3_ID_Name.zip” to NTU

1. (25%)Nov. 12 資料科學應用案例演講
(a) (5%) 該演講對您來說印象深刻的主題為何？為什麼？請摘要此深刻的主題。其對您來說帶來的啟發為何？
(b) (5%) 該演講是否有翻轉/顛覆您對過去製造業的認知？如果有，什麼認知有了改變？如果無，什麼樣的認知跟您過去的既定印象一樣，是否有任何建議或可改善之處？
(c) (5%) 該演講讓您瞭解到製造業應用數據科學方法的困難與挑戰在於何處？為什麼？如何建議或解決？
(d) (5%) 演講內容中，是否有任何疑點？或想問講者的問題為何？
(e) (5%) 在演講內容中，是否有任何想給予建議的地方？例如問題切入點、問題本質、方法調整、驗證的省思等。

2. (40%) Decision Tree Algorithms
Data Source: https://www.kaggle.com/uciml/faulty-steel-plates
Dataset provided by Semeion, Research Center of Sciences of Communication, Via Sersale 117,
00128, Rome, Italy. www.semeion.it

This dataset comes from research by Semeion, Research Center of Sciences of Communication. The original aim of the research was to correctly classify the type of surface defects in stainless steel plates, with six types of possible defects (plus “other”). The Input vector was made up of
27 indicators that approximately describe the geometric shape of the defect and its outline.

There are 1941 plates with 34 variables. The first 27 columns (i.e. independent variables) describe some kind of steel plate faults seen in images, i.e., X1-X27, as
{X_Minimum, X_Maximum, Y_Minimum, Y_Maximum, Pixels_Areas, X_Perimeter, Y_Perimeter
SumofLuminosity, MinimumofLuminosity, MaximumofLuminosity, LengthofConveyer, TypeOfSteel_A300, TypeOfSteel_A400, SteelPlateThickness, Edges_Index,Empty_Index,
Square_Index, OutsideXIndex, EdgesXIndex, EdgesYIndex, OutsideGlobalIndex, LogOfAreas,
LogXIndex, LogYIndex, Orientation_Index, Luminosity_Index, SigmoidOfAreas}

The last seven columns (i.e. dependent variables) are one hot encoded classes, i.e. if the plate fault is classified as “Stains” there will be a 1 in that column and 0’s in the other columns.
{Pastry, Z_Scratch, K_Scatch, Stains, Dirtiness, Bumps, Other_Faults}

These data can be found in http://archive.ics.uci.edu/ml/datasets/steel+plates+faults, and are attached in the file MDS_Assignment3_Steelplates.xlsx.
(a) (5%) Construct a data science framework and show the data summary
(b) (5%) What is the problem about the dataset? Any identical column? Any redundant column? Any missing value? How to handle these issues?
(c) (5%) After data preprocessing, based on the prepared dataset, use the classification and regression tree (CART) to analyze the prepared dataset. Show the classification results by 10-fold cross validation with several metrics (eg. accuracy, area under ROC curve (AUC), and F1-score), and also list the hyperparameters you adjust.
(d) (5%) Suggest a method to address the data imbalance issue. Build a new balanced dataset.
(e) (5%) Based on the balanced dataset, use the classification and regression tree (CART) to analyze the balanced dataset. Show the classification results by 10-fold cross validation with several metrics (eg. accuracy, area under ROC curve (AUC), and F1-score), and also list the hyperparameters you adjust.
(f) (5%) Give a comparison between (c) and (e). Any suggestion or insight?
(g) (5%) Use “Random Forest” to solve both prepared dataset and balanced dataset, respectively. Give a comparison and provide your insight.
(h) (5%) Use “Gradient Boosting Decision Tree (GBDT)” to solve both prepared dataset and balanced dataset, respectively. Give a comparison and provide your insight.
3. (20%) Deep Learning
Use Python to build up Convolutional Neural Network (CNN) for “casting product image data for quality inspection”.
Data Source:
https://www.kaggle.com/ravirajsinh45/real-life-industrial-dataset-of-casting-product
Dataset provided by Pilot Technocast, Shapar, Rajkot https://pilottechnocast.com/

We decided to make the inspection process automatic and for this, we need to make deep learning classification model for this problem. These all photos are top view of submersible pump impeller (google search for better understanding). For capturing these images requires stable lighting, for this we made a special arrangement.
We focus on the dataset with Augmentation: the dataset contains total 7348 image data. These all are the size of (300*300) pixels grey-scaled images. In all images, augmentation already applied. Making classification model we already split data for training and testing into two folders. Both train and test folder contains deffront and okfront subfolders.
train:- deffront have 3758 and okfront have 2875 images test:- deffront have:- deffront have 453 and ok_front have 262 images

(The data set also includes the images size of 512×512 grayscale without Augmentation. This
contains 519 okfront and 781 deffront impeller images. We don’t focus on this data set.)

Any question, you can google it (keyword: casting product image data for quality inspection) or refer to the following linkage. https://www.youtube.com/watch?v=4sDfwS48p0A

4. (15%) Time-Series Prediction

Dataset could be found as follows. eg. Brent oil price: https://www.investing.com/commodities/brent-oil-historical-data
Commodity prices: https://fred.stlouisfed.org/categories/32217
Commodity prices: https://sdw.ecb.europa.eu/browse.do?node=9691219
The summary table of raw materials, https://just2.entrust.com.tw/z/ze/zeq/zeq.djhtm

(a) (10%) Prepare and transform the data to appropriate format (eg. use Data Generator in https://www.datacamp.com/community/tutorials/lstm-python-stock-market). Build LSTM model and show the prediction results via Time-series Nested Cross Validation.
(b) (5%) Visualize the time-rolling prediction as above diagram.

Note
1. Show all your work in detail. Innovative idea is encouraged.

Reviews

There are no reviews yet.

Be the first to review “MDS – Manufacturing Data Science 製造數據科學 Solved”

MDS – Manufacturing Data Science 製造數據科學 Solved

Description

Reviews

Related products