Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: get65

Databricks Updated Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Questions and Answers by cassius

Page: 5 / 9

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Overview :

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5 – Python
Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Dumps
Vendor: Databricks Certification: Databricks Certification
Questions: 136 Q&A's Shared By: cassius
Question 20

19 of 55.

A Spark developer wants to improve the performance of an existing PySpark UDF that runs a hash function not available in the standard Spark functions library.

The existing UDF code is:

import hashlib

from pyspark.sql.types import StringType

def shake_256(raw):

return hashlib.shake_256(raw.encode()).hexdigest(20)

shake_256_udf = udf(shake_256, StringType())

The developer replaces this UDF with a Pandas UDF for better performance:

@pandas_udf(StringType())

def shake_256(raw: str) -> str:

return hashlib.shake_256(raw.encode()).hexdigest(20)

However, the developer receives this error:

TypeError: Unsupported signature: (raw: str) -> str

What should the signature of the shake_256() function be changed to in order to fix this error?

Options:

A.

def shake_256(raw: str) -> str:

B.

def shake_256(raw: [pd.Series]) -> pd.Series:

C.

def shake_256(raw: pd.Series) -> pd.Series:

D.

def shake_256(raw: [str]) -> [str]:

Discussion
Question 21

34 of 55.

A data engineer is investigating a Spark cluster that is experiencing underutilization during scheduled batch jobs.

After checking the Spark logs, they noticed that tasks are often getting killed due to timeout errors, and there are several warnings about insufficient resources in the logs.

Which action should the engineer take to resolve the underutilization issue?

Options:

A.

Set the spark.network.timeout property to allow tasks more time to complete without being killed.

B.

Increase the executor memory allocation in the Spark configuration.

C.

Reduce the size of the data partitions to improve task scheduling.

D.

Increase the number of executor instances to handle more concurrent tasks.

Discussion
Kingsley
Do anyone guide my how these dumps would be helpful for new students like me?
Haris May 12, 2026
Absolutely! They are highly recommended for anyone looking to pass their certification exam. The dumps are easy to understand and follow, making it easier for you to study and retain the information.
Yusra
I passed my exam. Cramkey Dumps provides detailed explanations for each question and answer, so you can understand the concepts better.
Alisha May 7, 2026
I recently used their dumps for the certification exam I took and I have to say, I was really impressed.
Kylo
What makes Cramkey Dumps so reliable? Please guide.
Sami May 24, 2026
Well, for starters, they have a team of experts who are constantly updating their material to reflect the latest changes in the industry. Plus, they have a huge database of questions and answers, which makes it easy to study and prepare for the exam.
Hendrix
Great website with Great Exam Dumps. Just passed my exam today.
Luka May 1, 2026
Absolutely. Cramkey Dumps only provides the latest and most updated exam questions and answers.
Question 22

A data engineer is building an Apache Spark™ Structured Streaming application to process a stream of JSON events in real time. The engineer wants the application to be fault-tolerant and resume processing from the last successfully processed record in case of a failure. To achieve this, the data engineer decides to implement checkpoints.

Which code snippet should the data engineer use?

Options:

A.

query = streaming_df.writeStream \

.format("console") \

.option("checkpoint", "/path/to/checkpoint") \

.outputMode("append") \

.start()

B.

query = streaming_df.writeStream \

.format("console") \

.outputMode("append") \

.option("checkpointLocation", "/path/to/checkpoint") \

.start()

C.

query = streaming_df.writeStream \

.format("console") \

.outputMode("complete") \

.start()

D.

query = streaming_df.writeStream \

.format("console") \

.outputMode("append") \

.start()

Discussion
Question 23

A developer is trying to join two tables, sales.purchases_fct and sales.customer_dim, using the following code:

Questions 23

fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'))

The developer has discovered that customers in the purchases_fct table that do not exist in the customer_dim table are being dropped from the joined table.

Which change should be made to the code to stop these customer records from being dropped?

Options:

A.

fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'), 'left')

B.

fact_df = cust_df.join(purch_df, F.col('customer_id') == F.col('custid'))

C.

fact_df = purch_df.join(cust_df, F.col('cust_id') == F.col('customer_id'))

D.

fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'), 'right_outer')

Discussion
Page: 5 / 9

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5
PDF

$36.75  $104.99

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Testing Engine

$43.75  $124.99

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Testing Engine

$57.75  $164.99