Here are summary of top Machine Learning algorithms for beginner in Data Science:

**Linear Regression**: a supervised learning algorithm used for predicting a continuous variable. Linear Regression is a statistical method used for modelling the linear relationship between a dependent variable and one or more independent variables. It tries to fit a linear equation to the observed data points and can be used to predict future values of the dependent variable based on the values of the independent variables. The goal is to find the best fitting line through the data points.**Logistic Regression**: a supervised learning algorithm used for predicting a binary outcome. Logistic Regression is a statistical method used for binary classification problems, where the goal is to predict a binary outcome (such as yes/no, true/false, 0/1) based on one or more independent variables. It models the relationship between the independent variables and the log odds of the binary outcome, using a logistic function to transform the dependent variable into a probability between 0 and 1. The estimated probability can then be used to make predictions about the binary outcome.**Decision Trees**: a supervised learning algorithm used for both classification and regression tasks. Decision Trees is a tree-based machine learning algorithm that can be used for both regression and classification problems. It is used for solving predictive modelling problems, by creating a tree-like model of decisions and their possible consequences. The tree is built by recursively splitting the data into subsets based on the values of the input features, with the goal of creating homogeneous subsets (called “leaves”) that correspond to the target variable. The final tree can be used to make predictions for new data points by following the splits in the tree until a leaf node is reached, where the prediction is made based on the majority class or the mean value in the case of regression.**Random Forest**: an ensemble learning method for decision trees that improves their accuracy and reduces overfitting. Random Forest is an ensemble machine learning algorithm that can be used for both regression and classification problems. It combines multiple decision trees to make more robust and accurate predictions than a single decision tree. In a Random Forest, multiple trees are built using bootstrapped samples of the data and a random subset of features is considered for splitting at each node. The final prediction is made by combining the predictions from all the trees (e.g., by taking the average in regression, or by majority voting in classification). This combination of many weak trees can lead to a reduction in overfitting and improved generalization performance.**Gradient Boosting**: an ensemble learning method that combines several weak models to create a strong predictive model. Gradient Boosting is a machine learning technique for regression and classification problems that builds a model in a forward stage-wise fashion, where each subsequent model tries to correct the mistakes of the previous model. The technique is called “Gradient Boosting” because it uses gradient descent optimization to minimize the loss function that is being optimized. In each iteration, the algorithm fits a weak learner (e.g., a decision tree) to the negative gradient (or “residual”) of the loss function with respect to the current prediction. The predictions from all the trees are then combined to form the final prediction. By fitting the weak learner to the negative gradient, the algorithm aims to correct the previous model’s prediction errors. The process is repeated until a pre-defined number of iterations is reached or the performance on a validation set stops improving.**Support Vector Machines**(SVMs): a supervised learning algorithm used for classification tasks. Support Vector Machines are a supervised machine learning algorithm used for classification and regression analysis. The goal of SVM is to find the hyperplane that best separates the data points into different classes in the case of classification or predicts the target values in the case of regression. In SVM, the hyperplane is chosen in such a way that it maximizes the margin, which is the distance between the hyperplane and the closest data points (also known as support vectors). These closest data points are more relevant for the classification or regression task and have the greatest impact on the position of the hyperplane. SVM can also handle non-linearly separable data by transforming the input data into a higher-dimensional space using a technique called kernel trick, where a linear boundary can be found to separate the classes.**k-Nearest Neighbors**(k-NN): a supervised learning algorithm used for classification and regression tasks. k-Nearest Neighbors is a simple, instance-based machine learning algorithm used for both classification and regression problems. In k-NN, the prediction for a new data point is based on the majority vote or average of the k nearest data points in the training set, where k is a user-defined parameter. In classification, the prediction for a new data point is the most common class among its k nearest neighbors in the training set. Meanwhile, in regression, the prediction for a new data point is the average of the target values of its k nearest neighbors in the training set. The distance between the data points is usually calculated using Euclidean distance, although other distance metrics can also be used. k-NN is a lazy learning algorithm, meaning that the computation is delayed until a prediction is requested, making it computationally efficient. However, this can result in a high memory requirement if the training set is very large.**Neural Networks**: a supervised learning algorithm used for a wide variety of tasks such as classification, regression, and generation. Neural Networks are a family of machine learning algorithms inspired by the structure and function of the human brain. A Neural Network consists of multiple interconnected nodes, known as artificial neurons, which are organized into layers. Each neuron receives inputs from other neurons, processes the inputs using an activation function, and produces an output that is passed on to other neurons in the next layer. The connections between neurons are represented by weights that can be adjusted during training to minimize the error between the predicted output and the actual target values. Neural Networks can be used to model complex non-linear relationships between inputs and outputs, and have been very successful in tasks such as image and speech recognition, natural language processing, and playing games. The design of Neural Networks can vary greatly, including the number of layers, the type of activation functions, and the type of learning algorithm used for training.**Convolutional Neural Networks**(CNNs): ConvNets or CNNs are a type of Neural Network specifically designed for image recognition and processing tasks. They are used for tasks such as image classification, object detection, and segmentation. A CNN consists of multiple layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layer is the core building block of a CNN, where a filter slides over the input image, computing dot products between the filter and overlapping regions of the image. These dot products are used to produce a feature map that encodes local patterns in the image. Pooling layers are used to reduce the spatial dimensions of the feature maps and make the network invariant to small translations in the input. Finally, the fully connected layer takes the feature maps and produces the final prediction using a softmax activation function for classification problems or a linear activation function for regression problems. CNNs have been very successful in image recognition tasks due to their ability to automatically learn hierarchical representations of the input data, including edges, shapes, textures, and higher-level features. They are also computationally efficient, as they can leverage the two-dimensional structure of the input data to reduce the number of parameters in the network.**Recurrent Neural Networks**(RNNs): a type of Neural Network specifically designed for processing sequential data such as time series, text, and speech. They are used for tasks such as language translation, speech recognition, and sentiment analysis. In an RNN, the hidden state of the network at one time step is passed as input to the next time step, allowing the network to maintain an internal representation of the entire sequence. This makes RNNs well suited to modelling sequences where the output at each time step depends on the inputs at all previous time steps. The hidden state is updated at each time step using an activation function, and the final output is typically produced by applying a fully connected layer to the final hidden state. One limitation of traditional RNNs is that they have difficulty modelling long-term dependencies in sequences, as the gradient signals can either vanish or explode during training, causing the network to forget or not effectively incorporate information from earlier time steps. To address this, variants of RNNs have been developed, such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs), which use gating mechanisms to allow the network to selectively retain or discard information from the hidden state.

Here is a list of further readings for machine learning:

- “An Introduction to Statistical Learning” by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
- “Pattern Recognition and Machine Learning” by Christopher Bishop
- “Machine Learning” by Tom Mitchell
- “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien Géron
- “The Hundred-Page Machine Learning Book” by Andriy Burkov
- “Data Science from Scratch” by Joel Grus
- “Python Machine Learning” by Sebastian Raschka
- “Applied Predictive Modeling” by Max Kuhn and Kjell Johnson
- “Introduction to Machine Learning with Python” by Andreas Müller and Sarah Guido.

Note: These books cover the basics of machine learning, as well as more advanced topics, and are suitable for individuals with varying levels of experience and knowledge.