|
A1:
There are several issues :
First, you have to compute your proposed anchors to fit the larger final grid and input resolution so all of your objetcs are still visible by the network. Then, will you have enough memory to store the corresponding net weights and activations while training and testing?
I'd say, if you can find a way to cut them in 4 without impeding your goal (boxes cut in half and annoyances like that), definitely go for it.
A2:
Be careful when you resize, to not crush your small objects. I'm using 1920x1080 images that are resized to 832x416 to get a 26x13 grid, i guess 832x832 (so a 26x26 grid) would be good for you.
A3:
the grid number is "set" according to your input size. That is because of the successive conv/pooling, reducing the feature map size to a ratio of 1/32th of the input.
It's also why, when computing your boxes centroids for the anchors in the region layer, you don't keep them as a fraction of the original image size but rather as a their size on the cells grid.
|
|