Kaggle Competition - Data exploration

Exploring data from Kaggle’s Survey competition
fastpages
jupyter
Author

Eric Vincent

Published

October 19, 2022

::: {.cell _cell_guid=‘b1076dfc-b9ad-4769-8c92-a6c4dae69d19’ _uuid=‘8f2839f25d086af736a60e9eeb907d3b93b6e0e5’ execution=‘{“iopub.execute_input”:“2022-10-19T00:04:23.670144Z”,“iopub.status.busy”:“2022-10-19T00:04:23.668711Z”,“iopub.status.idle”:“2022-10-19T00:04:23.681218Z”,“shell.execute_reply”:“2022-10-19T00:04:23.679720Z”,“shell.execute_reply.started”:“2022-10-19T00:04:23.670084Z”}’ trusted=‘true’ execution_count=102}

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
/kaggle/input/kaggle-survey-2022/kaggle_survey_2022_responses.csv
/kaggle/input/kaggle-survey-2022/Supplementary Data/kaggle_survey_2022_methodology.pdf
/kaggle/input/kaggle-survey-2022/Supplementary Data/kaggle_survey_2022_answer_choices.pdf

:::

df = pd.read_csv('/kaggle/input/kaggle-survey-2022/kaggle_survey_2022_responses.csv', low_memory=False)
df.head()
Duration (in seconds) Q2 Q3 Q4 Q5 Q6_1 Q6_2 Q6_3 Q6_4 Q6_5 ... Q44_3 Q44_4 Q44_5 Q44_6 Q44_7 Q44_8 Q44_9 Q44_10 Q44_11 Q44_12
0 Duration (in seconds) What is your age (# years)? What is your gender? - Selected Choice In which country do you currently reside? Are you currently a student? (high school, uni... On which platforms have you begun or completed... On which platforms have you begun or completed... On which platforms have you begun or completed... On which platforms have you begun or completed... On which platforms have you begun or completed... ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ... Who/what are your favorite media sources that ...
1 121 30-34 Man India No NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 462 30-34 Man Algeria No NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 293 18-21 Man Egypt Yes Coursera edX NaN DataCamp NaN ... NaN Kaggle (notebooks, forums, etc) NaN YouTube (Kaggle YouTube, Cloud AI Adventures, ... Podcasts (Chai Time Data Science, O’Reilly Dat... NaN NaN NaN NaN NaN
4 851 55-59 Man France No Coursera NaN Kaggle Learn Courses NaN NaN ... NaN Kaggle (notebooks, forums, etc) Course Forums (forums.fast.ai, Coursera forums... NaN NaN Blogs (Towards Data Science, Analytics Vidhya,... NaN NaN NaN NaN

5 rows × 296 columns

df.describe()
Duration (in seconds) Q2 Q3 Q4 Q5 Q6_1 Q6_2 Q6_3 Q6_4 Q6_5 ... Q44_3 Q44_4 Q44_5 Q44_6 Q44_7 Q44_8 Q44_9 Q44_10 Q44_11 Q44_12
count 23998 23998 23998 23998 23998 9700 2475 6629 3719 945 ... 2679 11182 4007 11958 2121 7767 3805 1727 1269 836
unique 3530 12 6 59 3 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2
top 272 18-21 Man India No Coursera edX Kaggle Learn Courses DataCamp Fast.ai ... Reddit (r/machinelearning, etc) Kaggle (notebooks, forums, etc) Course Forums (forums.fast.ai, Coursera forums... YouTube (Kaggle YouTube, Cloud AI Adventures, ... Podcasts (Chai Time Data Science, O’Reilly Dat... Blogs (Towards Data Science, Analytics Vidhya,... Journal Publications (peer-reviewed journals, ... Slack Communities (ods.ai, kagglenoobs, etc) None Other
freq 63 4559 18266 8792 12036 9699 2474 6628 3718 944 ... 2678 11181 4006 11957 2120 7766 3804 1726 1268 835

4 rows × 296 columns

questions = [question for question in df.iloc[0]]
questions
['Duration (in seconds)',
 'What is your age (# years)?',
 'What is your gender? - Selected Choice',
 'In which country do you currently reside?',
 'Are you currently a student? (high school, university, or graduate)',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Coursera',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - edX',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Kaggle Learn Courses',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - DataCamp',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Fast.ai',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udacity',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udemy',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - LinkedIn Learning',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Cloud-certification programs (direct from AWS, Azure, GCP, or similar)',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - University Courses (resulting in a university degree)',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - None',
 'On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Other',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - University courses',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - Online courses (Coursera, EdX, etc)',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - Social media platforms (Reddit, Twitter, etc)',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - Video platforms (YouTube, Twitch, etc)',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - Kaggle (notebooks, competitions, etc)',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - None / I do not study data science',
 'What products or platforms did you find to be most helpful when you first started studying data science?  (Select all that apply) - Selected Choice - Other',
 'What is the highest level of formal education that you have attained or plan to attain within the next 2 years?',
 'Have you ever published any academic research (papers, preprints, conference proceedings, etc)?',
 'Did your research make use of machine learning? - Yes, the research made advances related to some novel machine learning method (theoretical research)',
 'Did your research make use of machine learning? - Yes, the research made use of machine learning as a tool (applied research)',
 'Did your research make use of machine learning? - No',
 'For how many years have you been writing code and/or programming?',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C#',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - PHP',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Go',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - None',
 'What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Other',
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - JupyterLab ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  RStudio ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Visual Studio ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Visual Studio Code (VSCode) ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  PyCharm ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Spyder  ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Notepad++  ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Sublime Text  ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Vim / Emacs  ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  MATLAB ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Jupyter Notebook",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - IntelliJ",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - None",
 "Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - Other",
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Kaggle Notebooks',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice - Colab Notebooks',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice - Azure Notebooks',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Code Ocean ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  IBM Watson Studio ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Amazon Sagemaker Studio ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Amazon Sagemaker Studio Lab ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Amazon EMR Notebooks ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice - Google Cloud Vertex AI Workbench ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice - Hex Workspaces',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Noteable Notebooks ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Databricks Collaborative Notebooks ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Deepnote Notebooks ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice -  Gradient Notebooks ',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice - None',
 'Do you use any of the following hosted notebook products?  (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Matplotlib ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Seaborn ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Plotly / Plotly Express ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Ggplot / ggplot2 ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Shiny ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  D3 js ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Altair ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Bokeh ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Geoplotlib ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Leaflet / Folium ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Pygal ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Dygraphs ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice -  Highcharter ',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice - None',
 'Do you use any of the following data visualization libraries on a regular basis?  (Select all that apply) - Selected Choice - Other',
 'For how many years have you used machine learning methods?',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -   Scikit-learn ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -   TensorFlow ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Keras ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  PyTorch ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Fast.ai ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Xgboost ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  LightGBM ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  CatBoost ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Caret ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Tidymodels ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  JAX ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  PyTorch Lightning ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Huggingface ',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - None',
 'Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Other',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Linear or Logistic Regression',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Decision Trees or Random Forests',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Gradient Boosting Machines (xgboost, lightgbm, etc)',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Bayesian Approaches',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Evolutionary Approaches',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Dense Neural Networks (MLPs, etc)',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Generative Adversarial Networks',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Recurrent Neural Networks',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Transformer Networks (BERT, gpt-3, etc)',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Autoencoder Networks (DAE, VAE, etc)',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Graph Neural Networks',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - None',
 'Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Other',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - General purpose image/video tools (PIL, cv2, skimage, etc)',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Image segmentation methods (U-Net, Mask R-CNN, etc)',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Object detection methods (YOLOv6, RetinaNet, etc)',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc)',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Generative Networks (GAN, VAE, etc)',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Vision transformer networks (ViT, DeiT, BiT, BEiT, Swin, etc)',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - None',
 'Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Other',
 'Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Word embeddings/vectors (GLoVe, fastText, word2vec)',
 'Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Encoder-decoder models (seq2seq, vanilla transformers)',
 'Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Contextualized embeddings (ELMo, CoVe)',
 'Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Transformer language models (GPT-3, BERT, XLnet, etc)',
 'Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - None',
 'Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Other',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -   TensorFlow Hub ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  PyTorch Hub ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  Huggingface Models ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  Timm ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  Jumpstart ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  ONNX models ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  NVIDIA NGC models  ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice -  Kaggle datasets ',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice - No, I do not download pre-trained model weights on a regular basis',
 'Do you download pre-trained model weights from any of the following services? (Select all that apply) - Selected Choice - Other storage services (i.e. google drive)',
 'Which of the following ML model hubs/repositories do you use most often? - Selected Choice',
 'Select the title most similar to your current role (or most recent title if retired): - Selected Choice',
 'In what industry is your current employer/contract (or your most recent employer if retired)? - Selected Choice',
 'What is the size of the company where you are employed?',
 'Approximately how many individuals are responsible for data science workloads at your place of business?',
 'Does your current employer incorporate machine learning methods into their business?',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Analyze and understand data to influence product or business decisions',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Build prototypes to explore applying machine learning to new areas',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Build and/or run a machine learning service that operationally improves my product or workflows',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Experimentation and iteration to improve existing ML models',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Do research that advances the state of the art of machine learning',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - None of these activities are an important part of my role at work',
 'Select any activities that make up an important part of your role at work: (Select all that apply) - Other',
 'What is your current yearly compensation (approximate $USD)?',
 'Approximately how much money have you spent on machine learning and/or cloud computing services at home or at work in the past 5 years (approximate $USD)?\n (approximate $USD)?',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Amazon Web Services (AWS) ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Microsoft Azure ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Google Cloud Platform (GCP) ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  IBM Cloud / Red Hat ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Oracle Cloud ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  SAP Cloud ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  VMware Cloud ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Alibaba Cloud ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Tencent Cloud ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Huawei Cloud ',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice - None',
 'Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice - Other',
 'Of the cloud platforms that you are familiar with, which has the best developer experience (most enjoyable to use)? - Selected Choice',
 'Do you use any of the following cloud computing products? (Select all that apply) - Selected Choice -  Amazon Elastic Compute Cloud (EC2) ',
 'Do you use any of the following cloud computing products? (Select all that apply) - Selected Choice -  Microsoft Azure Virtual Machines ',
 'Do you use any of the following cloud computing products? (Select all that apply) - Selected Choice -  Google Cloud Compute Engine ',
 'Do you use any of the following cloud computing products? (Select all that apply) - Selected Choice - No / None',
 'Do you use any of the following cloud computing products? (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice - Microsoft Azure Blob Storage',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice -  Microsoft Azure Files ',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice -  Amazon Simple Storage Service (S3)  ',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice -  Amazon Elastic File System (EFS)  ',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice - Google Cloud Storage (GCS)   ',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice -  Google Cloud Filestore ',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice - No / None',
 'Do you use any of the following data storage products? (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - MySQL ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - PostgreSQL ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - SQLite ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Oracle Database ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - MongoDB ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Snowflake ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - IBM Db2 ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Microsoft SQL Server ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Microsoft Azure SQL Database ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Amazon Redshift ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Amazon RDS ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Amazon DynamoDB ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Google Cloud BigQuery ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Google Cloud SQL ',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - None',
 'Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Amazon QuickSight',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Microsoft Power BI',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Google Data Studio',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Looker',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Tableau',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Qlik Sense',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Domo',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - TIBCO Spotfire',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Alteryx ',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Sisense ',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - SAP Analytics Cloud ',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Microsoft Azure Synapse ',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Thoughtspot ',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - None',
 'Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Amazon SageMaker ',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Azure Machine Learning Studio ',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Google Cloud Vertex AI',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  DataRobot',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Databricks',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Dataiku',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Alteryx',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Rapidminer',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  C3.ai',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  Domino Data Lab ',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice -  H2O AI Cloud ',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice - No / None',
 'Do you use any of the following managed machine learning products on a regular basis? (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  Google Cloud AutoML ',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  H2O Driverless AI  ',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  Databricks AutoML ',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  DataRobot AutoML ',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -   Amazon Sagemaker Autopilot ',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -   Azure Automated Machine Learning ',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice - No / None',
 'Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  TensorFlow Extended (TFX) ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  TorchServe ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  ONNX Runtime ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  Triton Inference Server ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  OpenVINO Model Server ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  KServe ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  BentoML ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  Multi Model Server (MMS) ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  Seldon Core ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice -  MLflow ',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice - None',
 'Do you use any of the following products to serve your machine learning models?  (Select all that apply) - Selected Choice - Other',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Neptune.ai ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Weights & Biases ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Comet.ml ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  TensorBoard ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Guild.ai ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  ClearML ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  MLflow ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Aporia ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Evidently AI ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Arize ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  WhyLabs ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  Fiddler ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice -  DVC ',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice - No / None',
 'Do you use any tools to help monitor your machine learning models and/or experiments? (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  Google Responsible AI Toolkit (LIT, What-if, Fairness Indicator, etc) ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  Microsoft Responsible AI Resources (Fairlearn, Counterfit, InterpretML, etc) ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  IBM AI Ethics tools (AI Fairness 360, Adversarial Robustness Toolbox, etc ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  Amazon AI Ethics Tools (Clarify, A2I, etc) ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  The LinkedIn Fairness Toolkit (LiFT) ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  Audit-AI ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice -  Aequitas ',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice - None',
 'Do you use any of the following responsible or ethical AI products in your machine learning practices?  (Select all that apply) - Selected Choice - Other',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  GPUs ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  TPUs ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  IPUs ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  RDUs ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  WSEs ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  Trainium Chips ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice -  Inferentia Chips ',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice - None',
 'Do you use any of the following types of specialized hardware when training machine learning models?  (Select all that apply) - Selected Choice - Other',
 'Approximately how many times have you used a TPU (tensor processing unit)?',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Twitter (data science influencers)',
 "Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Email newsletters (Data Elixir, O'Reilly Data & AI, etc)",
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Reddit (r/machinelearning, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Kaggle (notebooks, forums, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Course Forums (forums.fast.ai, Coursera forums, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - YouTube (Kaggle YouTube, Cloud AI Adventures, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Podcasts (Chai Time Data Science, O’Reilly Data Show, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Blogs (Towards Data Science, Analytics Vidhya, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Journal Publications (peer-reviewed journals, conference proceedings, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Slack Communities (ods.ai, kagglenoobs, etc)',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - None',
 'Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Other']
df.stack()
0      Duration (in seconds)                                Duration (in seconds)
       Q2                                             What is your age (# years)?
       Q3                                  What is your gender? - Selected Choice
       Q4                               In which country do you currently reside?
       Q5                       Are you currently a student? (high school, uni...
                                                      ...                        
23997  Q17_2                                                          TensorFlow 
       Q18_1                                        Linear or Logistic Regression
       Q18_2                                     Decision Trees or Random Forests
       Q21_9                    No, I do not download pre-trained model weight...
       Q44_12                                                               Other
Length: 858725, dtype: object
dft = df.copy()
dft.T
0 1 2 3 4 5 6 7 8 9 ... 23988 23989 23990 23991 23992 23993 23994 23995 23996 23997
Duration (in seconds) Duration (in seconds) 121 462 293 851 232 277 1550 501 787 ... 245 402 603 557 153 331 330 860 597 303
Q2 What is your age (# years)? 30-34 30-34 18-21 55-59 45-49 18-21 18-21 30-34 70+ ... 18-21 55-59 35-39 40-44 22-24 22-24 60-69 25-29 35-39 18-21
Q3 What is your gender? - Selected Choice Man Man Man Man Man Woman Man Man Man ... Man Man Man Man Man Man Man Man Woman Man
Q4 In which country do you currently reside? India Algeria Egypt France India India India Germany Australia ... India Ukraine India India India United States of America United States of America Turkey Israel India
Q5 Are you currently a student? (high school, uni... No No Yes No Yes Yes Yes No No ... Yes Yes No No Yes Yes Yes No No Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Q44_8 Who/what are your favorite media sources that ... NaN NaN NaN Blogs (Towards Data Science, Analytics Vidhya,... Blogs (Towards Data Science, Analytics Vidhya,... Blogs (Towards Data Science, Analytics Vidhya,... Blogs (Towards Data Science, Analytics Vidhya,... Blogs (Towards Data Science, Analytics Vidhya,... NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Q44_9 Who/what are your favorite media sources that ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN Journal Publications (peer-reviewed journals, ... NaN NaN NaN NaN
Q44_10 Who/what are your favorite media sources that ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Q44_11 Who/what are your favorite media sources that ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN None NaN NaN NaN NaN NaN
Q44_12 Who/what are your favorite media sources that ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN Other NaN NaN NaN NaN NaN NaN NaN Other

296 rows × 23998 columns

df.isna().sum()
Duration (in seconds)        0
Q2                           0
Q3                           0
Q4                           0
Q5                           0
                         ...  
Q44_9                    20193
Q44_10                   22271
Q44_11                   22729
Q44_12                   23162
aggpro                   20072
Length: 297, dtype: int64
dfid.groupby(['Q2'])[['student?_Yes','student?_No']].sum()
student?_Yes student?_No
Q2
18-21 4386.0 173.0
22-24 2981.0 1302.0
25-29 1976.0 2496.0
30-34 887.0 2085.0
35-39 627.0 1726.0
40-44 464.0 1463.0
45-49 290.0 963.0
50-54 163.0 751.0
55-59 99.0 512.0
60-69 64.0 462.0
70+ 24.0 103.0
dfid.groupby(['Q4'])[['student?_Yes','student?_No']].sum()
student?_Yes student?_No
Q4
Algeria 46.0 16.0
Argentina 110.0 94.0
Australia 56.0 86.0
Bangladesh 182.0 69.0
Belgium 8.0 43.0
Brazil 437.0 396.0
Cameroon 50.0 18.0
Canada 87.0 170.0
Chile 50.0 65.0
China 290.0 163.0
Colombia 137.0 119.0
Czech Republic 10.0 39.0
Ecuador 40.0 14.0
Egypt 240.0 143.0
Ethiopia 57.0 41.0
France 74.0 188.0
Germany 32.0 67.0
Ghana 68.0 39.0
Hong Kong (S.A.R.) 17.0 41.0
I do not wish to disclose my location 23.0 19.0
India 4967.0 3825.0
Indonesia 228.0 148.0
Iran, Islamic Republic of... 74.0 46.0
Ireland 17.0 36.0
Israel 33.0 69.0
Italy 54.0 128.0
Japan 86.0 470.0
Kenya 134.0 67.0
Malaysia 33.0 41.0
Mexico 192.0 188.0
Morocco 119.0 58.0
Nepal 54.0 21.0
Netherlands 13.0 95.0
Nigeria 482.0 249.0
Other 708.0 722.0
Pakistan 418.0 202.0
Peru 73.0 48.0
Philippines 47.0 61.0
Poland 29.0 84.0
Portugal 34.0 53.0
Romania 21.0 29.0
Russia 160.0 164.0
Saudi Arabia 32.0 52.0
Singapore 22.0 46.0
South Africa 58.0 51.0
South Korea 104.0 213.0
Spain 80.0 177.0
Sri Lanka 43.0 34.0
Taiwan 70.0 172.0
Thailand 53.0 79.0
Tunisia 96.0 29.0
Turkey 175.0 170.0
Ukraine 40.0 39.0
United Arab Emirates 20.0 74.0
United Kingdom of Great Britain and Northern Ireland 87.0 171.0
United States of America 924.0 1996.0
Viet Nam 135.0 77.0
Zimbabwe 32.0 22.0
dfid['Q3'].value_counts()
Man                        18266
Woman                       5286
Prefer not to say            334
Nonbinary                     78
Prefer to self-describe       33
Name: Q3, dtype: int64
# most used programming languages used on a regular basis
# Q12_1
dfpro['Q12_1'].value_counts()
dfpro['Q12_2'].value_counts()
R                                                                                                          4571
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R       1
Name: Q12_2, dtype: int64
for q in range(1,14):
    print(dfpro[f"Q12_{q}"].value_counts())
Python                                                                                                          18653
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python        1
Name: Q12_1, dtype: int64
R                                                                                                          4571
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R       1
Name: Q12_2, dtype: int64
SQL                                                                                                          9620
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL       1
Name: Q12_3, dtype: int64
C                                                                                                          3801
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C       1
Name: Q12_4, dtype: int64
C#                                                                                                          1473
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C#       1
Name: Q12_5, dtype: int64
C++                                                                                                          4549
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C++       1
Name: Q12_6, dtype: int64
Java                                                                                                          3862
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java       1
Name: Q12_7, dtype: int64
Javascript                                                                                                          3489
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript       1
Name: Q12_8, dtype: int64
Bash                                                                                                          1674
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash       1
Name: Q12_9, dtype: int64
PHP                                                                                                          1443
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - PHP       1
Name: Q12_10, dtype: int64
MATLAB                                                                                                          2441
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB       1
Name: Q12_11, dtype: int64
Julia                                                                                                          296
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia      1
Name: Q12_12, dtype: int64
Go                                                                                                          322
What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Go      1
Name: Q12_13, dtype: int64
dfpro['Q12_1'].isna().sum()
5344
for q in range(1,14):
    print(dfpro[f"Q13_{q}"].value_counts())
JupyterLab                                                                                                                                                    4887
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - JupyterLab        1
Name: Q13_1, dtype: int64
 RStudio                                                                                                                                                    3824
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  RStudio        1
Name: Q13_2, dtype: int64
 Visual Studio                                                                                                                                                    4416
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Visual Studio        1
Name: Q13_3, dtype: int64
 Visual Studio Code (VSCode)                                                                                                                                                    9976
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Visual Studio Code (VSCode)        1
Name: Q13_4, dtype: int64
 PyCharm                                                                                                                                                    6099
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  PyCharm        1
Name: Q13_5, dtype: int64
  Spyder                                                                                                                                                     2880
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Spyder         1
Name: Q13_6, dtype: int64
  Notepad++                                                                                                                                                     3891
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Notepad++         1
Name: Q13_7, dtype: int64
  Sublime Text                                                                                                                                                     2218
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Sublime Text         1
Name: Q13_8, dtype: int64
  Vim / Emacs                                                                                                                                                     1448
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -   Vim / Emacs         1
Name: Q13_9, dtype: int64
 MATLAB                                                                                                                                                    2302
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  MATLAB        1
Name: Q13_10, dtype: int64
 Jupyter Notebook                                                                                                                                                   13684
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Jupyter Notebook        1
Name: Q13_11, dtype: int64
IntelliJ                                                                                                                                                   1612
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - IntelliJ       1
Name: Q13_12, dtype: int64
None                                                                                                                                                   409
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice - None      1
Name: Q13_13, dtype: int64
dfpro['Q13_11'].value_counts()
# most used is python, best ide is Jupyter, then Vs code
 Jupyter Notebook                                                                                                                                                   13684
Which of the following integrated development environments (IDE's) do you use on a regular basis?  (Select all that apply) - Selected Choice -  Jupyter Notebook        1
Name: Q13_11, dtype: int64
# most popular machine learning framworks - scikit learn most popualr, tensorflow, keras, xgboost, interestingly enough, fastai got 
# 648 responses and pytorch 5191
for q in range(1,16):
    print(dfpro[f"Q17_{q}"].value_counts())
  Scikit-learn                                                                                                                                   11403
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -   Scikit-learn         1
Name: Q17_1, dtype: int64
  TensorFlow                                                                                                                                   7953
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -   TensorFlow        1
Name: Q17_2, dtype: int64
 Keras                                                                                                                                   6575
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Keras        1
Name: Q17_3, dtype: int64
 PyTorch                                                                                                                                   5191
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  PyTorch        1
Name: Q17_4, dtype: int64
 Fast.ai                                                                                                                                   648
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Fast.ai       1
Name: Q17_5, dtype: int64
 Xgboost                                                                                                                                   4477
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Xgboost        1
Name: Q17_6, dtype: int64
 LightGBM                                                                                                                                   1940
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  LightGBM        1
Name: Q17_7, dtype: int64
 CatBoost                                                                                                                                   1165
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  CatBoost        1
Name: Q17_8, dtype: int64
 Caret                                                                                                                                   821
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Caret       1
Name: Q17_9, dtype: int64
 Tidymodels                                                                                                                                   547
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Tidymodels       1
Name: Q17_10, dtype: int64
 JAX                                                                                                                                   252
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  JAX       1
Name: Q17_11, dtype: int64
 PyTorch Lightning                                                                                                                                   1013
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  PyTorch Lightning        1
Name: Q17_12, dtype: int64
 Huggingface                                                                                                                                   1332
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice -  Huggingface        1
Name: Q17_13, dtype: int64
None                                                                                                                                  1709
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - None       1
Name: Q17_14, dtype: int64
Other                                                                                                                                  620
Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply) - Selected Choice - Other      1
Name: Q17_15, dtype: int64
# most popular ml algorithms - linear or logi regression , the decision trees and random forest, gradient boosting machines, CNN, 

for q in range(1,15):
    print(dfpro[f"Q18_{q}"].value_counts())
Linear or Logistic Regression                                                                                                                     11338
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Linear or Logistic Regression        1
Name: Q18_1, dtype: int64
Decision Trees or Random Forests                                                                                                                     9373
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Decision Trees or Random Forests       1
Name: Q18_2, dtype: int64
Gradient Boosting Machines (xgboost, lightgbm, etc)                                                                                                                     5506
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Gradient Boosting Machines (xgboost, lightgbm, etc)       1
Name: Q18_3, dtype: int64
Bayesian Approaches                                                                                                                     3661
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Bayesian Approaches       1
Name: Q18_4, dtype: int64
Evolutionary Approaches                                                                                                                     823
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Evolutionary Approaches      1
Name: Q18_5, dtype: int64
Dense Neural Networks (MLPs, etc)                                                                                                                     3476
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Dense Neural Networks (MLPs, etc)       1
Name: Q18_6, dtype: int64
Convolutional Neural Networks                                                                                                                     6006
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks       1
Name: Q18_7, dtype: int64
Generative Adversarial Networks                                                                                                                     1166
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Generative Adversarial Networks       1
Name: Q18_8, dtype: int64
Recurrent Neural Networks                                                                                                                     3451
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Recurrent Neural Networks       1
Name: Q18_9, dtype: int64
Transformer Networks (BERT, gpt-3, etc)                                                                                                                     2196
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Transformer Networks (BERT, gpt-3, etc)       1
Name: Q18_10, dtype: int64
Autoencoder Networks (DAE, VAE, etc)                                                                                                                     1234
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Autoencoder Networks (DAE, VAE, etc)       1
Name: Q18_11, dtype: int64
Graph Neural Networks                                                                                                                     1422
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Graph Neural Networks       1
Name: Q18_12, dtype: int64
None                                                                                                                     1326
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - None       1
Name: Q18_13, dtype: int64
Other                                                                                                                     538
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Other      1
Name: Q18_14, dtype: int64
dfpro['Q18_7'].value_counts()
Convolutional Neural Networks                                                                                                                     6006
Which of the following ML algorithms do you use on a regular basis? (Select all that apply): - Selected Choice - Convolutional Neural Networks       1
Name: Q18_7, dtype: int64
# marjority are from india and are men, apparently most reccurent salary is from 0-999$, and the majority are not in school..
dfpro['Q29'].value_counts()
$0-999                                                          1112
10,000-14,999                                                    493
30,000-39,999                                                    464
1,000-1,999                                                      444
40,000-49,999                                                    421
100,000-124,999                                                  404
5,000-7,499                                                      391
50,000-59,999                                                    366
7,500-9,999                                                      362
150,000-199,999                                                  342
20,000-24,999                                                    337
60,000-69,999                                                    318
15,000-19,999                                                    299
70,000-79,999                                                    289
25,000-29,999                                                    277
2,000-2,999                                                      271
125,000-149,999                                                  269
3,000-3,999                                                      244
4,000-4,999                                                      234
80,000-89,999                                                    222
90,000-99,999                                                    197
200,000-249,999                                                  155
250,000-299,999                                                   78
300,000-499,999                                                   76
$500,000-999,999                                                  48
>$1,000,000                                                       23
What is your current yearly compensation (approximate $USD)?       1
Name: Q29, dtype: int64
dfpro['Q30'].value_counts() # majority dont spend anything or max between 0-999$, people who put down more is prob for education
$0 ($USD)                                                                                                                                                                           2822
$100-$999                                                                                                                                                                           2078
$1000-$9,999                                                                                                                                                                        1469
$1-$99                                                                                                                                                                              1449
$10,000-$99,999                                                                                                                                                                      480
$100,000 or more ($USD)                                                                                                                                                              186
Approximately how much money have you spent on machine learning and/or cloud computing services at home or at work in the past 5 years (approximate $USD)?\n (approximate $USD)?       1
Name: Q30, dtype: int64
# most popular cloud computing platforms -- AWS, GCP, Azure
for q in range(1,13):
    print(dfpro[f"Q31_{q}"].value_counts())
 Amazon Web Services (AWS)                                                                                                              2346
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Amazon Web Services (AWS)        1
Name: Q31_1, dtype: int64
 Microsoft Azure                                                                                                              1416
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Microsoft Azure        1
Name: Q31_2, dtype: int64
 Google Cloud Platform (GCP)                                                                                                              2056
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Google Cloud Platform (GCP)        1
Name: Q31_3, dtype: int64
 IBM Cloud / Red Hat                                                                                                              287
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  IBM Cloud / Red Hat       1
Name: Q31_4, dtype: int64
 Oracle Cloud                                                                                                              230
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Oracle Cloud       1
Name: Q31_5, dtype: int64
 SAP Cloud                                                                                                              107
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  SAP Cloud       1
Name: Q31_6, dtype: int64
 VMware Cloud                                                                                                              155
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  VMware Cloud       1
Name: Q31_7, dtype: int64
 Alibaba Cloud                                                                                                              76
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Alibaba Cloud      1
Name: Q31_8, dtype: int64
 Tencent Cloud                                                                                                              56
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Tencent Cloud      1
Name: Q31_9, dtype: int64
 Huawei Cloud                                                                                                              47
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice -  Huawei Cloud      1
Name: Q31_10, dtype: int64
None                                                                                                             1167
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice - None       1
Name: Q31_11, dtype: int64
Other                                                                                                             217
Which of the following cloud computing platforms do you use? (Select all that apply) - Selected Choice - Other      1
Name: Q31_12, dtype: int64
# most popular storage  - MySQL, Postgres, MSSQL,SQLlite
for q in range(1,17):
    print(dfpro[f"Q35_{q}"].value_counts())
MySQL                                                                                                                                                                2233
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - MySQL        1
Name: Q35_1, dtype: int64
PostgreSQL                                                                                                                                                                1516
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - PostgreSQL        1
Name: Q35_2, dtype: int64
SQLite                                                                                                                                                                1159
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - SQLite        1
Name: Q35_3, dtype: int64
Oracle Database                                                                                                                                                                688
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Oracle Database       1
Name: Q35_4, dtype: int64
MongoDB                                                                                                                                                                1031
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - MongoDB        1
Name: Q35_5, dtype: int64
Snowflake                                                                                                                                                                399
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Snowflake       1
Name: Q35_6, dtype: int64
IBM Db2                                                                                                                                                                192
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - IBM Db2       1
Name: Q35_7, dtype: int64
Microsoft SQL Server                                                                                                                                                                1203
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Microsoft SQL Server        1
Name: Q35_8, dtype: int64
Microsoft Azure SQL Database                                                                                                                                                                520
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Microsoft Azure SQL Database       1
Name: Q35_9, dtype: int64
Amazon Redshift                                                                                                                                                                380
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Amazon Redshift       1
Name: Q35_10, dtype: int64
Amazon RDS                                                                                                                                                                505
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Amazon RDS       1
Name: Q35_11, dtype: int64
Amazon DynamoDB                                                                                                                                                                356
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Amazon DynamoDB       1
Name: Q35_12, dtype: int64
Google Cloud BigQuery                                                                                                                                                                690
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Google Cloud BigQuery       1
Name: Q35_13, dtype: int64
Google Cloud SQL                                                                                                                                                                439
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Google Cloud SQL       1
Name: Q35_14, dtype: int64
None                                                                                                                                                               955
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - None      1
Name: Q35_15, dtype: int64
Other                                                                                                                                                               217
Do you use any of the following data products (relational databases, data warehouses, data lakes, or similar)? (Select all that apply) - Selected Choice - Other      1
Name: Q35_16, dtype: int64
# most popular business intelligence tool - Other, tableau, Powerbi
for q in range(1,16):
    print(dfpro[f"Q36_{q}"].value_counts())
Amazon QuickSight                                                                                                             224
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Amazon QuickSight      1
Name: Q36_1, dtype: int64
Microsoft Power BI                                                                                                             1658
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Microsoft Power BI       1
Name: Q36_2, dtype: int64
Google Data Studio                                                                                                             643
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Google Data Studio      1
Name: Q36_3, dtype: int64
Looker                                                                                                             166
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Looker      1
Name: Q36_4, dtype: int64
Tableau                                                                                                             1732
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Tableau       1
Name: Q36_5, dtype: int64
Qlik Sense                                                                                                             207
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Qlik Sense      1
Name: Q36_6, dtype: int64
Domo                                                                                                             44
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Domo     1
Name: Q36_7, dtype: int64
TIBCO Spotfire                                                                                                             86
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - TIBCO Spotfire     1
Name: Q36_8, dtype: int64
Alteryx                                                                                                              132
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Alteryx       1
Name: Q36_9, dtype: int64
Sisense                                                                                                              38
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Sisense      1
Name: Q36_10, dtype: int64
SAP Analytics Cloud                                                                                                              106
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - SAP Analytics Cloud       1
Name: Q36_11, dtype: int64
Microsoft Azure Synapse                                                                                                              167
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Microsoft Azure Synapse       1
Name: Q36_12, dtype: int64
Thoughtspot                                                                                                              22
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Thoughtspot      1
Name: Q36_13, dtype: int64
None                                                                                                             2050
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - None       1
Name: Q36_14, dtype: int64
Other                                                                                                             191
Do you use any of the following business intelligence tools? (Select all that apply) - Selected Choice - Other      1
Name: Q36_15, dtype: int64
# automated ml learning tools
for q in range(1,8):
    print(dfpro[f"Q38_{q}"].value_counts())
    
 Google Cloud AutoML                                                                                                                    463
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  Google Cloud AutoML       1
Name: Q38_1, dtype: int64
 H2O Driverless AI                                                                                                                     122
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  H2O Driverless AI        1
Name: Q38_2, dtype: int64
 Databricks AutoML                                                                                                                    193
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  Databricks AutoML       1
Name: Q38_3, dtype: int64
 DataRobot AutoML                                                                                                                    125
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -  DataRobot AutoML       1
Name: Q38_4, dtype: int64
  Amazon Sagemaker Autopilot                                                                                                                    261
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -   Amazon Sagemaker Autopilot       1
Name: Q38_5, dtype: int64
  Azure Automated Machine Learning                                                                                                                    323
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice -   Azure Automated Machine Learning       1
Name: Q38_6, dtype: int64
No / None                                                                                                                   3536
Do you use any of the following automated machine learning tools?  (Select all that apply) - Selected Choice - No / None       1
Name: Q38_7, dtype: int64
(dfpro['Q11'].value_counts()/dfpro['Q11'].value_counts().sum()) *100 
1-3 years                                                            27.787816
< 1 years                                                            23.464120
3-5 years                                                            14.623129
5-10 years                                                           10.996386
I have never written code                                             8.763552
10-20 years                                                           7.748236
20+ years                                                             6.612459
For how many years have you been writing code and/or programming?     0.004302
Name: Q11, dtype: float64
a = pd.Series(dfpro['Q11'])
a.pop(0)
'For how many years have you been writing code and/or programming?'
b = a.value_counts()
b.plot.bar()
<AxesSubplot:>