I have developed a preliminary machine learning model that, trained only with a handful of features, is able to predict the share of female graduates in STEM disciplines, primarily ICT.
I do not claim this is a realistic model in any way. Firstly, I am not competent to accomplish this job in a rigorous way and, secondly, relevant data is still too scarce.
However, the procedure I have followed may provide a few hints on what descriptors may be relevant and how to identify them. For that purpose and that purpose only, I have made available a bargain-basement report with a provocative name. No offence intended.
I have also made the scripts and most of the data (some files seems to be too big) available in GitHub.