Author thumbnail

Raja's Data Engineering

Databricks | Spark: Learning Series

619,715 views
109 items
Last updated on Apr 1, 2024
public playlist
01. Databricks: Spark Architecture & Internal Working Mechanism
41:34
02. Databricks | PySpark: RDD, Dataframe and Dataset
12:41
03. Databricks | PySpark: Transformation and Action
16:15
04. On-Heap vs Off-Heap| Databricks | Spark | Interview Question | Performance Tuning
11:56
05. Databricks | Pyspark: Cluster Deployment
15:08
06. Databricks | Pyspark| Spark Reader: Read CSV File
17:06
07. Databricks | Pyspark:  Filter Condition
14:28
08. Databricks | Pyspark: Add, Rename and Drop Columns
12:03
09. Databricks  | PySpark Join Types
14:28
10. Databricks | Pyspark:  Utility Commands - DBUtils
28:43
11. Databricks | Pyspark: Explode Function
15:24
12. Databricks | Pyspark: Case Function (When.Otherwise )
13:34
13. Databricks | Pyspark: Union & UnionAll
10:25
14. Databricks | Pyspark: Pivot & Unpivot
11:33
15. Databricks| Spark | Pyspark | Read Json| Flatten Json
9:35
16. Databricks | Spark | Pyspark | Bad Records Handling | Permissive;DropMalformed;FailFast
7:24
17. Databricks & Pyspark: Azure Data Lake Storage Integration with Databricks
14:43
18. Databricks & Pyspark: Ingest Data from Azure SQL Database
12:08
19. Databricks & Pyspark: Real Time ETL Pipeline Azure SQL to ADLS
17:04
20. Databricks & Pyspark: Azure Key Vault Integration
9:32
21. Databricks| Spark Streaming
18:12
22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce
21:11
23. Databricks | Spark | Cache vs Persist | Interview Question | Performance Tuning
18:56
24. Databricks| Spark | Interview Questions| Catalyst Optimizer
19:42
25. Databricks | Spark | Broadcast Variable| Interview Question | Performance Tuning
13:33
26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning
17:14
31. Databricks Pyspark: Handling Null - Part1
10:05
32. Databricks| Pyspark| Handling Null Part 2
14:17
33. Databricks | Spark | Pyspark | UDF
10:08
34. Databricks - Spark: Data Skew Optimization
15:03
35. Databricks & Spark: Interview Question - Shuffle Partition
5:52
36. Databricks: Autoscaling | Optimized Autoscaling
6:00
37. Databricks | Pyspark: Dataframe Checkpoint
8:14
38. Databricks | Pyspark | Interview Question | Compression Methods: Snappy vs Gzip
10:30
39. Databricks | Spark | Pyspark Functions| Split
10:41
40. Databricks | Spark | Pyspark Functions| Arrays_zip
17:14
41. Databricks | Spark | Pyspark Functions| Part 2 : Array_Intersect
5:11
42. Databricks | Spark | Pyspark Functions| Part 3 : Array_Except
4:48
43. Databricks | Spark | Pyspark Functions| Part 4 : Array_Sort
4:12
44. Databricks | Spark | Python Functions| Join
7:24
45. Databricks | Spark | Pyspark | PartitionBy
13:09
46. Databricks | Spark | Pyspark | Number of Records per Partition in Dataframe
5:53
47. Databricks | Spark | Pyspark | Null Count of Each Column in Dataframe
2:59
48. Databricks - Pyspark: Find Top or Bottom N Rows per Group
9:08
49. Databricks & Spark: Interview Question(Scenario Based) - How many spark jobs get created?
6:01
50. Databricks | Pyspark: Greatest vs Least vs Max vs Min
6:56
51. Databricks | Pyspark | Delta Lake: Introduction to Delta Lake
10:27
52. Databricks| Pyspark| Delta Lake Architecture: Internal Working Mechanism
30:13
53. Databricks| Pyspark| Delta Lake: Solution Architecture
7:53
54. Databricks | Delta Lake| Pyspark: Create Delta Table Using Various Methods
11:56
55. Databricks| Pyspark| Delta Lake: Delta Table Instance
11:29
56. Databricks| Pyspark | Delta Lake: Different Approaches to Insert Data Into Delta Table
10:47
57. Databricks| Pyspark| Delta Lake: Different Approaches to Delete Data from Delta Table
8:44
58. Databricks | Pyspark | Delta Lake : Update Delta Table
7:49
59. Databricks Pyspark:Slowly Changing Dimension|SCD Type1| Merge using Pyspark and Spark SQL
11:30
60. Databricks & Pyspark: Delta Lake Audit Log Table with Operation Metrics
17:12
61. Databricks | Pyspark | Delta Lake : Slowly Changing Dimension (SCD Type2)
20:03
62. Databricks | Pyspark | Delta Lake: Time Travel
8:47
63. Databricks | Pyspark| Delta Lake: Restore Command
7:28
64. Databricks | Pyspark | Delta Lake: Optimize Command - File Compaction
13:16
65. Databricks | Pyspark | Delta Lake: Vacuum Command
15:32
66. Databricks | Pyspark | Delta: Z-Order Command
14:16
67. Databricks | Pypark | Delta: Schema Evolution - MergeSchema
7:53
68. Databricks | Pyspark | Dataframe InsertInto Delta Table
8:09
69. Databricks | Spark | Pyspark | Data Skewness| Interview Question: SPARK_PARTITION_ID
7:02
70. Databricks| Pyspark| Input_File_Name: Identify Input File Name of Corrupt Record
10:47
71. Databricks | Pyspark | Window Functions: Lead and Lag
15:04
72. Databricks | Pyspark | Interview Question: Explain Plan
27:27
73. Databricks | Pyspark | UDF to Check if Folder Exists
6:45
74. Databricks | Pyspark | Interview Question: Sort-Merge Join (SMJ)
16:46
75. Databricks | Pyspark | Performance Optimization - Bucketing
22:03
76. Databricks|Pyspark:Interview Question|Scenario Based|Max Over () Get Max value of Duplicate Data
8:27
77. Databricks | Pyspark | Create_map(): Convert Dataframe Columns to Dictionary (Map Type)
8:34
78. Databricks | Pyspark | Performance Optimization: Delta Cache
7:47
79. Databricks | Pyspark | Split Array Elements into Separate Columns
9:56
80. Databricks | Pyspark | Tips: Write Dataframe into Single File with Specific File Name
12:09
81. Databricks | Pyspark | Workspace Object Access Control
8:52
82. Databricks | Pyspark | Databricks Secret Scopes: Azure Key Vault Backed Secrets
19:12
83. Databricks | Pyspark | Databricks Workflows: Job Scheduling
17:17
84. Databricks | Pyspark | Azure Data Factory + Azure Databricks: Execute Notebook Via ADF
13:23
85. Databricks | Pyspark | Notebook Activity in Azure Data Factory with Input Parameter
10:27
86. Databricks | Pyspark | Notebook Activity in Azure Data Factory with Output Parameter
8:35
87. Databricks | Pyspark | Real Time Project: ETL Pipeline Integrating ADF, ASQL, ADLS, Key Vault
22:01
88. Databricks |Pyspark |Notebook Scheduling through Schedule Based Trigger using Azure Data Factory
13:11
89. Databricks | Pyspark | Notebook Scheduling through Event Based Trigger using Azure Data Factory
15:35
90. Databricks | Pyspark | Interview Question: Read Excel File with Multiple Sheets
10:13
91. Databricks | Pyspark | Interview Question |Handlining Duplicate Data: DropDuplicates vs Distinct
11:41
92. Databricks | Pyspark | Interview Question | Performance Optimization: Select vs WithColumn
11:33
93. Databricks | Pyspark | Interview Question | Schema Definition: Struct Type vs Struct Field
15:40
94. Databricks | Pyspark | Interview Question | Schema Definition: Struct Type vs Map Type
13:22
95. Databricks | Pyspark | Schema | Different Methods of Schema Definition
15:32
96. Databricks | Pyspark | Real Time Scenario | Schema Comparison
12:34
97. Databricks | Pyspark | Data Security: Enforcing Column Level Encryption
11:48
98. Databricks | Pyspark | Interview Question: Pyspark VS Pandas
9:09
99. Databricks | Pyspark | Real Time Use Case: Generate Test Data - Array_Repeat()
13:52
100. Databricks | Pyspark | Spark Architecture: Internals of Partition Creation Demystified
55:50
106.Databricks|Pyspark|Automation|Real Time Project:DataType Issue When Writing to Azure Synapse/SQL
14:06
107. Databricks | Pyspark| Transformation: Subtract vs ExceptAll
8:37
108. Databricks | Pyspark| Window Function: First and Last
12:27
112. Databricks | Pyspark| Spark Reader: Skip First N Records While Reading CSV File
6:31