Make Franchise Association Sound

Franchises are an attractive option for entrepreneurs who want to start their own business. In contrast to setting up independent storefronts from scratch, franchises offer operators the benefit of…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Midway Blog Post

Team Name: 42

Team Member: Cangcheng Tang, Guanzhong Chen, Shiyu Liu

Initial Blog Post: https://medium.com/@tangcc35/initial-35c49e9fc6f4

Our baseline model training aims to ascertain that our data structure and initial model are compatible. convolutional neural network was chosen as the baseline model for this training, as mentioned in our first journal. By investigating public online notebooks, we first built a CNN consisting of 20 convolutional layers with batch normalization and dropouts after max-pooling layers. This neural network takes a 64x64 image as input after cropping and resizing. The output layer is designed to be three parallel layers with 168 nodes, 11 nodes, and 7 nodes, respectively.

Our first dataset was successfully fitted into the CNN model. The training results were relatively promising: within 30 epochs training on only one training dataset, the training accuracy and validating accuracy are all above .9. The accuracy of classifying both vowel diacritics and consonant diacritics has reached .97. We suppose there is a potential improvement in predicting the grapheme roots due to its relatively low accuracy of .93. We then trained all four data sets, resulting in an overall increase in all of the three accuracy scores.

Finally, we built a tensorboard for visualizing the training history, as shown in Figure 1. The accuracy plot shows that both training and validation accuracy generally keep increasing across epochs, and the loss plot shows that the loss function generally decreases, indicating there is nearly no overfitting problem for the current model.

Figure1. The tensorboard for visualizing training history. It can be observed nearly no overfitting problem for the current model.

After we have the baseline model, we have tried three approaches to improve the baseline model. Some of these approaches work well in this context. However, some of them do not work well.

1. More Conv

The first modification we tried is on the convolutional layers. We wanted to add more Conv2D layers with fewer filters in the beginning to improve the model’s interpretation of the characters. But after 30 epochs, the accuracy of three targets saw no increase.

2. Manipulating dense layers after flatten

Originally, we only had two dense layers after flattening the convolutional layers in our network. We view this as translating the information Conv2D interpreted into the sense of Bengali characters. To augment this mapping process, we tried different combinations of neuron and dense numbers. Although the accuracy is increasing faster in the first few epochs, the final results stay generally the same as before.

3. Increase dense layers in separate predictions

We also increased the number and neurons of dense layers for predicting each target. The purpose of this is to increase the specificity of the dense layers used for predicting each target. After several experiments, we found the optimal combination: two 256-dense layers in front of Root prediction, 256 and 128-dense layers in front of both Vowel and Consonant. This resulted in a 1.4% increase in Root accuracy, a 1% increase in Vowel and Consonant.

Searching other people’s work and architecture may also be helpful in model training. We have done research on some of the other’s work that gives good accuracy. We can find their network structures posted in the notebooks. There are several interesting ones. One of them uses SEResNeXt model to do the training. The author applies transfer learning technique by using a pretrained model and fine-tuning weights parameters. The author then applies leaky ReLu activation combining with pooling and several linear layers to complete the network. Another interesting one we have observed is to use crazy nested convolutional neural networks with a very deep depth. Basically it applies Conv2D again and again with ‘SAME’ padding and ReLu activation with some size of kernel. After about 4–5 Conv2D layer, it applies Batch Normalization combining with MaxPool2D and Dropout. Then the author repeats this process to generate multiple outputs.

During our current training process, specifically, we include learning rate scheduling techniques to control the learning rates for root, vowel, and consonant. In Keras, that is keras.callbacks.ReduceLROnPlateau method. We set up patience with 3 in case it stops early. We also set up the minimum learning rate as some very small number so it will not go to 0 that the network is not learning.

For hyperparameter choosing, we have tried different combinations of hyperparameters, including the number of neurons in a dense layer. After all trials, we found the optimal combination: two 256-dense layers in front of Root prediction, 256 and 128-dense layers in front of both Vowel and Consonant. This leads to a better result as stated above. We also tried different sizes of filters as our hyperparameters. However, we believe that it does not have the impact as largest as the number of neurons to the resulting accuracy.

Our goal for the next step is to further improve the model. By improving the model, we can try to apply the transfer learning to fine-tune the model parameters. We also look forward to including bagging in this project. Hopefully, we will get a higher accuracy.

Add a comment

Related posts:

What if? Search Engines provided websites the search queries of the user

Concept illustrating the benefits of peaking into the search queries of the user. A Web developer’s Point of View When I was looking for a shutterbug for my baby boy’s photoshoot I browsed through…

Kagame Writers Are Becoming Funnier

Kagame writers are writing crazy stuff. The latest is a piece titled ”David Himbara has no moral authority to lecture anyone on Rwanda’s politics.” The writing then goes on to claim that I demanded a…