educative.io

Educative

How to use spark UDF to scale up when doing full-pass transformation like global average or standard deviation?

Since when you partition the table, you only have a subset of the training examples, you could only do individual transformations. Is that right?


Type your question above this line.

Course: https://www.educative.io/collection/10370001/6068402050301952
Lesson: https://www.educative.io/collection/page/10370001/6068402050301952/6473937794891776

Hi Y_C!
If I have understood your question right, we are transforming transformations on the group of data that we have partitioned from the original data.

Hope it helps. If you still have any queries please let us know.
Thank you.

What I mean is that I can’t do transformations like normalise the data, cause that you need to have the mean and standard deviation of the entire dataset not just a subset of it.