Add a comment. Active Oldest Votes. Improve this answer. Thanks for the great comment So, pretty much, something's back prop not working correctly on my net — Nate Cook3.
Probably, yeah Perhaps I am not that much of a help as I am doing neural network stuff for less than a year To be honest, your code looks quite confusing to me. The quality of code is very important if you try to implement solutions for complex problems, otherwise you end up having several bugs hidden in some lines of code and debugging becomes a catastrophe. I would suggest you refactor as much as possible first, before starting to debug the code.
Trust me, this eliminates bugs magnitudes faster. Show 3 more comments. Sign up or log in Sign up using Google. Sign up using Facebook.
Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Does ES6 make JavaScript frameworks obsolete? Podcast Do polyglots have an edge when it comes to mastering programming Featured on Meta.
Now live: A fully responsive profile. Visit chat. Linked 3. My network is fed by word embeddings and position embeddings only, so I'm wondering if this behaviour can be motivated by i a network architecture that is too complex for the task, ii a network architecture that is too simple to model the complexity for the task, iii the inputs are not so informative to discriminate the classes, so the network learns with difficulty and thus slowly.
Generally speaking, larger networks need more time to converge. However the model you describe is pretty small so it shouldn't take too long. A side question: suppose epochs are enough. When I need to compare performance of many models or to compare different hyper-parameter combinations , which score I need to pick and compare? The "last best" f1 score say at epoch or the last reported score i.
If you want to compare performance, you should compare the best f1 for each model. If you want to compare convergence speed then you could compare the f1 at X epochs. As I wrote in the comment, you may try to use Backtracking Gradient descent, which automatically choses learning rates for you, and for which convergence can be rigorously proven under the weakest possible assumptions until now. For details you can see my answer in this link:.
Does gradient descent always converge to an optimum? The above answer is quite self explanatory. One small thing I would like to add that too many epochs may lead to overfitting. The model will keep learning as many epochs we may allow. So there must be a limit to the number of epochs. Sign up to join this community.
The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Why my network needs so many epochs to learn? Ask Question. Asked 2 years, 5 months ago. Active 2 years, 5 months ago. Viewed 10k times. This seems not to be the case, so I'm asking for an advice to more navigated neural network users: Why the network is still learning after so many epochs and so slowly?
Do you have any suggestions? Let me know if something is unclear! Improve this question. You can try to use Backtracking gradient descent, which is less complicated and is also adaptive.
Add a comment. Active Oldest Votes. To answer your questions : Why the network is still learning after so many epochs and so slowly? Change Language. Related Articles. Table of Contents. Save Article. Improve Article. Like Article. Last Updated : 08 Jun, Reshaping data-Adding number of channels as 1 Grayscale images. Encoding labels to a binary class matrix. MaxPooling2D 2 , 2.
Next keras. Recommended Articles. Article Contributed By :. Easy Normal Medium Hard Expert.
0コメント