<h3>Decision Boundary Classifier</h3><p><span style="font-size: 18px; color: rgb(156, 156, 148);">Decision boundary is one of a critical topic being focused as a data science. Here, we will use d3.js to visualize the boundary sensitivity against given indicator - profit in the case - to show how d3 taylor made visualization will help this discussion (recommend to use PC for best visualization result).</span></p><p><br></p><h3><span style="color: inherit; font-family: inherit;">Boundary in two distributions</span><br></h3><p>Let's think Tesla will launch new model (let's call model A in the article), and their motor is knows as 50% yield, and major yield loss of the motor is reliability. Assuming that we could somehow create a reliability score for the motor, ranging from 0 to 100. Higher scores represent higher likelihood of good motor unit, while lower scores represent lower likelihood of bad motor unit. If we plot good units and bad units from motor manufacturing against reliability score, we see a 2 distributions, which has same amount (as we've set 50% yield). For a simplicity, we assume that distributions follow gausian distribution.</p><p><br></p><p>Assuming that we would like to set a threshold to either screen or pass the motor for a rest of manufacturing processes - assembly. The purpose is to avoid escapee of bad motor being asembled with other good parts which will end up being scrap.</p><p><br></p><p>Then, the question is how we should set the threshold? We are showing 2 scenario, either A) 2 distributions are clearly separated, or B) 2 distributions are overwrapped. If distributions are clearly separated, it would be straightforward, we could put threshold in between good and bad unit distribution - it would be ~55 scores in the graph below. However where we should put threshold for the scenario which has overwrapped distributions? Unfortunately this is more realistic case and this is the problem statement that we would like to visualize to help determine the threshold.</p><p><br></p><p><img src="/media/django-summernote/2020-12-14/c546cc4a-fbbf-4a87-825d-2b16bfdd293c.png" style="width: 100%;"></p><p><br></p><h3>Profit Assumptions</h3><p>Let's assume that Tesla new model A is aimed for mass audience, hence sales prices is set at $30k with 1/3 being as a cost (for a simplicity all cost is assumed as manufacturing cost). Now motor division is discussing if they should have this reliability test at the end of motor as motor is major contributor for entire manufacturing cost, say $5k being as motor manufacturing cost so motor price out of entire sales price $30k is $15k. Their concern is that scrap cost at test drive at final vhiecle inspection, if bad motor is passing and is assembled with other good parts and rejected as a whole car at final vheicle inspection, the company needs to scrap whole parts because of the mortor, total cost is $10k being scrapped which is entire manufacturing cost including $5k of bad motor and $5k for other good parts. Also assume life time volume of the vheicle demand being 100k units. In the case return on investment on the screening test at end of line motor manufacturing is given as;</p><pre><span style="color: rgb(115, 24, 66);">Profit = f_motorprofit(yield) - f_scrap(yield) - motor mfg cost</span></pre><p>where motor profit and scrap cost are function of threshold - which we will set along with reliability score to judge either screening out to avoid escapee or passing to rest of assembly manufacturing. Meanwhile motor manufacting cost is static aging given demond volume to be manufactured.</p><p><br></p><h3>Threshold Metrics</h3><p>Couple additional metric are set to see threshold sensitivity, such as correct/incorrect on a decision on threshold along with the score, and true positive rate and positive rate based on the threshold on the score.</p><p><br></p><p>Known problem statement is that most correct percent does not always give us best profit. In the example here, most correct % on threshold is 84% ranging from 48 to 52 of the score, and it gives us $220M to $280M of ROI. Meanwhile best profit is $285M with 47 of the score and 83% of correctness. Other things we might as well remember is that 100% of true positive rate is far from best profit and we need to compromise screening out good unit to avoid bad unit escapees. For example, best profict $285M is at 92% of true positive rate and we screen out 8% of known good</p><p><br></p><p>So, to answer where we should set threshold depends on which indicato we will prioritise, such as correctness of the threshold, maximizing profit, etc.</p><p><br></p><p>This is based on simple frame work, and the simulation could be complicated once we starts applying different distribution for good and bad product, of once we we think multiple group, such as different type of vheicle or location of manufacturing. Here we only limit to show fundamentals of decision threshold framing.</p><p><br></p><p><img src="/media/django-summernote/2020-12-14/1935e652-b269-4142-8eca-ecdd277d4909.png" style="width: 100%;"></p><p><br></p><h3>Potential Applications</h3><p>In manufacturing, this framework could be applied for a manufacturing spec definition, product binning against its performance. In reference article from google AI lab uses loan applicants either paying or default. Generally it could potentially be applied for any vague clasiffication problem</p><p><br></p><p><span style="color: rgb(99, 99, 99);">Demo Site : https://tak113.github.io/d3showcase/</span></p><p><span style="color: rgb(99, 99, 99);">Reference : https://research.google/teams/brain/pair/</span></p>
<< Back to Blog Posts
Back to Home