R vs Python consideration in 2021
Feb. 6, 2021, 9:59 p.m.

Short Description :
Personal considerations for R vs Python applied for data science as of Jan 2021
source : datak

<p>There has been a debate on R vs Python for a data science almost a decade, and here I would like to put my considerations as of Jan 2021 for my understanding. Your input would be appreciated through twitter by #datak!</p><h3>Python is better from programming perspective</h3><p>Since python is designed as generic language while R is designed for statistics. So if there is anything we would like to automate on machine learning processes from data manipulation through projecting response on top of a certain system, Python would be more useful. When we apply generic programming function for a data manipulation such as 'for', R is way slower and needs to use 'apply' related function to get tied.</p><p><br></p><h3>R is slightly better for exploratory data analysis</h3><p>For an EDA(exploratory data analysis), as long as we use Jupyter for Python, it'd be almost similar usage for me, but it still looks python requires more code to run same calculation than that of R in terms of EDA.<br>This also related to how easily explain result from an analysis. R as a default can do statistical analysis and plotting result, while python needs additional libraries like panda/numpy/matplotlib and setting up detail argument.</p><p><br></p><h3>Python is huge win for integration</h3><p>This is no doubt for python. If we would like to integrate machine learning application into existing application, python is way easier to find integration method, while R could still do but too complicated and less article online. It seems that both languages are not much fast on processing, but R requires more memory.&nbsp;</p><p><br></p><h3>Not sure about stochastic analysis (eg. Bayesian analysis)</h3><p>R has had wrapper library for stan which is operated under C++, hence I've been using R when I do stochastic analysis to get an uncertainty against point estimate of response. Meanwhile for python, there were PyMC and PyStan. For a stan/MC implementation, either language requires heavy lift to set up an environment. I could only do for R in the past. Meanwhile TensorFlow has now bayesian framework called TFP(TensorFlow Probability) which I have not used so far. I will report out once I use TFP.</p><p><br></p><h3>Python is huge win for Deep Learning</h3><p>Deep learning method has been invented in Python community, not in R. We could still use deep learning method in R, but there would be almost a wrapper library to be able to run python deep learning library as a backfround, also hence there would be lagging to implement to R.</p><p><br></p><h3 style="font-family: &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; color: rgb(0, 0, 0);">R is huge win to publish simple machine learning web application</h3><p>If you don't have much background on web application but would like to publish your data science application to the web, R is way easier than Python. Rshiny, web framework on R, does not require other language even UI side for a javascript. Meanwhile Python has several web framework such as flask and django but it requires literacy on basic html/css/java script as well as python. Plotly, graphing library, has offered dash, which is newer web framework for a python, makes way easier to create AI web app by Python, but it still requires complicated coding than R.</p><p><br></p><p>One note for Rshiny, as long as you use shinyapps.io for uploading cloud server, you don't need any DevOps knowledge but it's very limited extendibility. If you would like to create bigger web application by R, such as adding database, you would need to use docker compose/kubernetes and post to 3rd party cloud such as AWS and GCP, which R community is getting many resources recently.</p><p><br></p><p>Another note, streamlit for python is giving similar concept to Rshiny. It does not require any other knowledge other than python and create rich UI recently. Streamlit is still very limtied capability but eventually R would be getting less advantages on the field.</p><p><br></p><h3>Final thought</h3><p>Here is my thought. For a POC(proof of concept) level of machine learning application throughout web publish, I still prefer using R. For an EDA, I still prefer R. But I would like to brushing up my skillset for jupyter for EDA, and dash for simple web app by python, since</p><ol><li>my interest is now shifting from machine learning to deep learning and those library are the one to work on</li><li>there would be more project to integrate machine learning algo to web app going forward</li></ol><p><br></p><p>Reference :&nbsp;<a href="https://github.com/matloff/R-vs.-Python-for-Data-Science" target="_blank">matloff/R-vs.-Python-for-Data-Science</a></p>


<< Back to Blog Posts
Back to Home

Related Posts

Excel vs R/Python : Coexist
R
Sep 12 2020
Pro's and Con's and usecase between Excel and R
Use Explainable AI in Python/R Pt1
Feb 18 2021
How to set up SHAP library in Python with some usecases



© DATAK 2024