| Exam Name: | Google Professional Data Engineer Exam | ||
| Exam Code: | Professional-Data-Engineer Dumps | ||
| Vendor: | Certification: | Google Cloud Certified | |
| Questions: | 400 Q&A's | Shared By: | elowen |
You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:
Decoupling producer from consumer
Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely
Near real-time SQL query
Maintain at least 2 years of historical data, which will be queried with SQ
Which pipeline should you use to meet these requirements?
You monitor and optimize the BigQuery instance for your team. You notice that a particular daily report that uses a large JOIN operation is consistently slow. You want to examine the query's execution plan to identify potential performance bottlenecks within the JOIN as quickly as possible. What should you do?
Your company wants to implement a Retrieval-Augmented Generation (RAG) system to allow employees to query an extensive knowledge base of internal documents, such as policy manuals and project reports. You need to prepare this unstructured text for embedding to be used in the RAG system. What should you do to ensure the system can retrieve the most relevant information?
You are using BigQuery with a regional dataset that includes a table with the daily sales volumes. This table is updated multiple times per day. You need to protect your sales table in case of regional failures with a recovery point objective (RPO) of less than 24 hours, while keeping costs to a minimum. What should you do?