Snowflake SnowPro Advanced: Data Scientist Certification Sample Questions:
1. You are tasked with training a model within Snowflake to predict customer churn for a telecommunications company. The dataset is stored in a Snowflake table named 'CUSTOMER DATA. The features include 'age', and 'data_usage'. The target variable is 'churned' (boolean). You want to use the SNOWFLAKE.ML.ANACONDA INTEGRATION to leverage Scikit-learn for model training. Which of the following code snippets correctly performs model training with Snowflake ML, addressing potential issues like feature scaling and data type handling within the stored procedure?
A)
B)
C)
D) 
2. You're a data scientist analyzing sensor data from industrial equipment stored in a Snowflake table named 'SENSOR READINGS' The table includes 'TIMESTAMP' , 'SENSOR ID', 'TEMPERATURE', 'PRESSURE', and 'VIBRATION'. You need to identify malfunctioning sensors based on outlier readings in 'TEMPERATURE' , 'PRESSURE' , and 'VIBRATION'. You want to create a dashboard to visualize these outliers and present a business case to invest in predictive maintenance. Select ALL of the actions that are essential for both effectively identifying sensor outliers within Snowflake and visualizing the data for a business presentation. (Multiple Correct Answers)
A) Implement a clustering algorithm (e.g., DBSCAN) within Snowflake using Snowpark Python to group similar sensor readings, identifying outliers as points that do not belong to any cluster or belong to very small clusters.
B) Calculate basic statistical summaries (mean, standard deviation, min, max) for each sensor and each variable C TEMPERATURE, 'PRESSURE, and 'VIBRATION') and use that information to filter down to the most important sensor, prior to using the other techniques.
C) Calculate Z-scores for 'TEMPERATURE, 'PRESSURE, and 'VIBRATION' for each 'SENSOR_ID within a rolling window of the last 24 hours using Snowflake's window functions. Define outliers as readings with Z-scores exceeding a threshold (e.g., 3).
D) Directly connect the 'SENSOR_READINGS' table to a visualization tool and create a 3D scatter plot with 'TEMPERATURE, 'PRESSURE, and 'VIBRATION' on the axes, without any pre-processing or outlier detection in Snowflake.
E) Create a Snowflake stored procedure to automatically flag outlier readings in a new column 'IS OUTLIER based on a predefined rule set (e.g., IQR method or Z-score threshold), and then use this column to filter data for visualization in a dashboard.
3. Your team has deployed a machine learning model to Snowflake for predicting customer churn. You need to implement a robust metadata tagging strategy to track model lineage, performance metrics, and usage. Which of the following approaches are the MOST effective for achieving this within Snowflake, ensuring seamless integration with model deployment pipelines and facilitating automated retraining triggers based on data drift?
A) Storing model metadata in a separate relational database (e.g., PostgreSQL) and using Snowflake external tables to access the metadata information. Implement custom stored procedures to synchronize metadata between Snowflake and the external database.
B) Utilizing Snowflake's INFORMATION SCHEMA views to extract metadata about tables, views, and stored procedures, and then writing custom SQL scripts to generate reports and track model lineage. Combine this with Snowflake's data masking policies to control access to sensitive metadata.
C) Using Snowflake's built-in tag functionality to tag tables, views, and stored procedures related to the model. Implementing custom Python scripts using Snowflake's Python API (Snowpark) to automatically apply tags during model deployment and retraining based on predefined rules and data quality checks.
D) Relying solely on manual documentation and spreadsheets to track model metadata, as automated solutions introduce unnecessary complexity and potential errors.
E) Leveraging a third-party metadata management tool that integrates with Snowflake and provides a centralized repository for model metadata, lineage tracking, and data governance. This tool should support automated tag propagation and data drift monitoring. Use Snowflake external functions to trigger alerts based on metadata changes.
4. You are using Snowpark for Python to build a feature engineering pipeline for a machine learning model that predicts customer churn. The data is stored in a Snowflake table called 'CUSTOMER DATA' , and you want to create new features based on time-series data within the table. You need to calculate the 'Recency' feature (days since the last transaction) and 'Frequency' feature (number of transactions in the last 3 months). Considering performance and best practices, which Snowpark approach would you choose?
A) Fetch the entire 'CUSTOMER DATA table into a Pandas DataFrame using , then use Pandas' time-series functions to calculate 'Recency' and 'Frequency'. After feature engineering, load the Pandas DataFrame back into Snowflake.
B) Create a Python UDF using Pandas to calculate 'Recency' and 'Frequency'. Apply this UDF to the 'CUSTOMER DATA' table through Snowpark, processing the data row by row.
C) Write a stored procedure in SQL that calculates 'Recency' and 'Frequency' using SQL window functions, and then call this stored procedure from your Snowpark Python code.
D) Write custom Python code in a Snowpark UDF to retrieve each transaction for a customer and calculate recency and frequency directly in Python without pandas.
E) Use Snowpark DataFrame API to perform window functions within Snowflake to calculate 'Recency' and 'Frequency' directly, leveraging Snowflake's processing power without transferring data to the client.
5. You are working with a Snowflake table named 'CUSTOMER DATA' that contains personally identifiable information (PII), including customer names, email addresses, and phone numbers. Your team needs to perform exploratory data analysis on this data to understand customer demographics and behavior. However, you must ensure that the PII is protected and that only authorized personnel can access the sensitive information. Which of the following strategies should you implement in Snowflake to achieve secure EDA?
A) Apply dynamic data masking to the entire 'CUSTOMER_DATA' table, masking all columns by default, and provide decryption keys only to authorized users.
B) Grant 'SELECT privileges on the 'CUSTOMER DATA' table to all data scientists, and rely on them to avoid querying PII columns directly.
C) Create a view on top of that excludes the PII columns (e.g., name, email, phone). Grant 'SELECT privileges on this view to data scientists. Also implement data masking policies on the 'CUSTOMER DATA' table for the PII columns and grant 'SELECT on the table to specific roles requiring access to the masked values.
D) Use transient tables to store the customer data after PII is obfuscated, drop the table and reload new data daily.
E) Create a copy of the 'CUSTOMER DATA table without the PII columns and grant 'SELECT' privileges on this copy to the data scientists. Use masking policies on the original table.
Solutions:
| Question # 1 Answer: C | Question # 2 Answer: A,B,C,E | Question # 3 Answer: C,E | Question # 4 Answer: E | Question # 5 Answer: C,E |














1286 Customer Reviews
Quality and ValueITCertKing Practice Exams are written to the highest standards of technical accuracy, using only certified subject matter experts and published authors for development - no all study materials.
Tested and ApprovedWe are committed to the process of vendor and third party approvals. We believe professionals and executives alike deserve the confidence of quality coverage these authorizations provide.
Easy to PassIf you prepare for the exams using our ITCertKing testing engine, It is easy to succeed for all certifications in the first attempt. You don't have to deal with all dumps or any free torrent / rapidshare all stuff.
Try Before BuyITCertKing offers free demo of each product. You can check out the interface, question quality and usability of our practice exams before you decide to buy.
