My Analytics Portfolio
Academic Data Projects
Technology Stack: python, seaborn, pandas, operator, matplotlib, scikit
• Ingested and preprocessed data, visualized data using heatmaps, boxplots, credit outliers and removed class imbalances.
• Predicted Credit Risk and Performed Feature Engineering using weight of evidence, trained logistics regression model which improved accuracy from 75% to 80% & increased recall from 19% to 22%.
Technological Stack: Jupyter, Scikit-Learn, Imblearn, Pickle
• Predicted fraud transactions using ML, performed principal component analysis, Minority oversampling, Features Transformation.
• Performed Algorithm Benchmarking of Support Vector, Isolation Forest, One vs Rest Classifier [87% accuracy, 83% recall].
Technology Stack/Concepts: RStudio, ggplot, rpart, CART, Random Forest, K-nearest
• Observed correlations influencing Share Trends by visualizing variables like weather, windspeed using R libraries
• Built train/test models using Linear Regression (83.8), Regression Trees, Classification, Random Forest(92.5)
• Calculated RMSE, MAE, MAPE for better fit of model against baseline & the most significant predictor variable
Skills
1) BI Visualization: Salesforce Analytics CRM, Power BI [Power Query, DAX], Tableau, Qlik Sense, Google Analytics, Plotly, D3.js, Looker, DOMO
2) Programming Language & tools: Python, R, C#, PowerPoint, MS Excel [Pivot Tables, Vlookup], GitHub, SAS, Pandas, NumPy, Matplotlib
3) Databases : Oracle SQL, SQL Server [Common Table Expressions, Indexing], AWS RedShift, Snowflake, Apache Hadoop, OLAP, Spark, Big Query
5)Machine Learning: Logistic & Linear Regression, Hypothesis Testing, Decision Tree, Random Forest, K-means Clustering, Time Series Forecasting
6) Coursework: Data Mining & Wrangling, Database Management, Engineering Statistics, Business Analytics, Agile Systems, Data Lakes & Warehousing
7) Certifications: Scaled Agile Framework 5 Practitioner [SAFe], Tableau Data Analyst, Google Business Intelligence Professional, KPMG Internship, R
Technology Stack/Concepts: CoNVO, SparkSQL, Pyspark, Hadoop, Jupyter, NoSQL, Tableau
• Performed business area segmentation for review data ingested in AWS Cloud Storage [S3 Bucket] using Databricks and Trifacta.
• Utilized Spark SQL to clean and merge multiple datasets and PySpark for data modeling, forecasting and visualization in the HDFS.
• Analyzed customer churn into praise/pan businesses using Box-Whisker Plot, Stacked Bar Graphs in Tableau.
Technology Stack/Concepts: NumPy, pandas, matplotlib, plotly, seaborn
• Visualized COVID confirmed cases worldwide by creating static & animated choropleths over time series analysis
• Predicted & visualized covid cases count by polynomial regression, Bayesian ridge regression & SVM predictions.