Showing posts with label Emory BMI. Show all posts
Showing posts with label Emory BMI. Show all posts

Friday, November 11, 2022

5 years...

Emory 5 years, November 19, 2022
So I am 5 years with the Department of Biomedical Informatics, Emory University this month. However, that is counting in the 7 months I spent in 2016, March - October. If not for that, my continuous 5 years would be on June 19, 2023. In both cases, I am reaching the 5 years soon. This is very long, especially for me as I consider myself nomadic with frequent migrations. This period also includes my 3.5 years of postdoc life since I defended my PhD thesis in 2019 July (ULisboa/Portugal) and 2019 August (UCLouvain/Belgium). The pandemic, travel restrictions, pandemic-related challenges, and other factors made the time go faster, I guess? The years 2020 - 2022 feel like a big monolith, rather than 3 whole years. It is not to say all the three years were same. But they kinda went fast. Really fast. Despite the 5 long years (or 4.5 years, if I do not include the 2016 stint), I was also working with different PIs over my time here. So it was not like I was working on the same projects or topic/domain for the whole time.

In addition to this Emory 5 years, this year feels special to me for two more reasons along the memory lane. First one is, it is 10 years since I left Sri Lanka for my grad school. Then it is also 3 decades since I turned 5. I started counting decades from 15, considering my earliest memories are from when I was 5 years old. So, this year, as I am 35, makes me 3 decades of memories (1992 - 2002, 2002 - 2012, 2012 - 2022). Surely, the last decade (2012 - 2022) was the most eventful and remarkable. I hope the next decade 2022 - 2032 will bring me interesting memories too!

I was interacting with my mentees who apply for gradschool this year. That made me recall my time applying for gradschool in autumn 2011 and my first semester in ULisboa/Portugal in autumn 2012. We had a cool bunch of Erasmus Mundus EMDC MSc 2012 - 2014 group. Our EMDC batch was the middle one among the 5 years program. We had 2 years of seniors before us, and 2 years of juniors after us. That left me in a great spot to observe the past and future of EMDC masters program as I did that. (Same situation with my Erasmus Mundus Ph.D., EMJD-DC as well.) Some of us did a PhD. Others joined the industry after the MSc. Among those who left academia, some started companies. Most others joined big companies in Europe and elsewhere. A few others joined some crazy startups. Some of us left Europe and ended up in third countries (i.e., countries other than our home country and countries in the EU). Some stayed in Europe. Others went back to their home country. Among those who did a PhD, some did an Erasmus Mundus PhD (like me), while others did some other PhD program. Again, after PhD, some joined the industry while the others continued with academia. Among those who continued with academia, the lucky/smart ones became a tenure-track professor instantly, while the others like me became a postdoc. Now, I have the capability to reach out to everyone to give the exact numbers for these categories. But I think that is not important as this is just for my personal blog after all.
Neurasmus page showing students and alumni

EMDC first years were either in Portugal or Spain, where we all spent the 3rd semester in KTH/Sweden. For the forth semester, some of us came back to our host countries (Portugal or Spain), some stayed in Sweden (mostly just with KTH, while some went on to do an industrial internship at Sportify), and some did an internship in a third European country (such as France, Germany, or Switzerland). I came back to ULisboa/Portugal, making it ULisboa/Portugal -> KTH/Sweden -> ULisboa/Portugal for my MSc. That also means, I was one of the few with minimal diverse experiences during my MSc, as I spent time in just two countries where many of us were in three countries. I compensated for this lack of migrations during my Ph.D., with my 3 international research internships (Croatia, USA, and Saudi Arabia) that spanned continents, in addition to my two universities (ULisboa/Portugal and UCLouvain/Belgium). I started my affiliation with ULisboa in 2012 August and spent 7 years doing my MSc and Ph.D., making it the longest university of my life so far, even longer than my 5 years (so far) at Emory University. #WeAreTécnico!

Today a Twitter friend introduced me to Neurasmus, a EU joint-degree MSc program influenced by Erasmus Mundus, but with a neuroscience focus. They also have a nice web page that documents all the students and alumni of the programs in a single place neatly! I wish our Erasmus Mundus programs had something similar. We actually have one for our Ph.D. (EMJD-DC), although not as fancy. But we do not have one for our MSc (EMDC). I remember some of us from the EMDC 2012 intake made a pact to visit Lisboa in 2022 August/September (marking the 10 years). But a pandemic and 10 years have changed many things.
https://twitter.com/pradeeban/status/1580967108321587203?s=20&t=RegDHN3nybldtJxjMNPwzQhttps://twitter.com/pradeeban/status/1580967108321587203?s=20&t=RegDHN3nybldtJxjMNPwzQhttps://twitter.com/pradeeban/status/1580967108321587203?s=20&t=RegDHN3nybldtJxjMNPwzQhttps://twitter.com/pradeeban/status/1580967108321587203?s=20&t=RegDHN3nybldtJxjMNPwzQhttps://twitter.com/pradeeban/status/1580967108321587203?s=20&t=RegDHN3nybldtJxjMNPwzQ

Sunday, April 10, 2022

Google Summer of Code (GSoC) 2022

This year, Google Summer of Code (GSoC) is open to anyone who is a newbie to open source. Not just students. As such, we replace the term "Student" with "Contributor" this year.

Given below is an introductory presentation to GSoC 2022. There are some more important changes to GSoC this year compared to the past years. Specifically, workload has been made flexible this year. Some projects are medium size, requiring a half-time effort (18 hrs/week, 175 hours in total) as in 2021. Other projects are large projects (full-time, 35 hrs/week, 350 hours in total). Make sure to find the project that fits your availability. Of course, large projects yield to double the stipend compared to the medium size project, as your effort will be double too. Additional flexibility also include the potential to complete the GSoC up to 2 months later, with the mentors' prior approval. 

2020 and before, the potential to commit full-time to GSoC was a major deciding factor. Mentors would avoid selecting candidates who already have an on-going internship or a job unless they still exhibit the potential to commit the same amount of time and effort. This year, just like 2021, allows the half-time (medium-size projects) contributors to have other internships in parallel.

Mentors and collaborators can even schedule their work hours the way they see fit.

Good luck!

 

Thursday, April 15, 2021

Viseu - Virtual Internet Services at the Edge

Indeed, I have named yet another project after a Portuguese town - a tradition I started during my PhD days with Sendim, Óbidos, and Évora. This time, I have Viseu. I am loving my footnotes for each of these works.


Image

Image Image

Image

Tuesday, January 26, 2021

Google Summer of Code (GSoC) 2021

There are a few specific changes to GSoC this year compared to the past years. Specifically, workload and the student stipend have been made half in 2021. This change is highlighted as a direct outcome of the COVID19 pandemic and the challenges of committing full-time to GSoC during these unpredictable days. As such, we do not know yet whether these changes will remain the same in 2022 and later or will GSoC revert to its full-time commitment (35hrs/week) in the upcoming years. 

In the past years, the potential to commit full-time to GSoC was a major deciding factor. We would actively avoid selecting candidates who already have an on-going internship or a job unless they exhibit the potential still to commit the same amount of time and effort. However, since GSoC is officially part-time this year, I see the ability to commit full-time to GSoC is not a deciding factor. Mentors and student can even schedule their work hours the way they see fit. For instance, work full time for specific days or weeks, just making sure the deadlines are met.

Given below is an introductory presentation to GSoC 2021. Good luck!

Friday, March 13, 2020

Coronavirus pandemic and working from home

The novel coronavirus has turned this year upside down with a pandemic. Starting the coming Monday, we all decided to work from home to minimize human interactions. We are not sure when we will go back to work from the office. I have rarely worked from home except during my MSc/Ph.D. days. So this period brings me back my memories.

Although I have worked from home rarely during the past ~2 years in Atlanta, those were often only a day in a few months. Since this time is going to be a continuous long-term work from home, I decided to set up some own self-policies to ensure I work a sufficient amount of time, while also not overworking. With deadline-induced panic, during my Ph.D. days, I have overworked at times, like 60 hours non-stop with no sleep in between once. But then I also had more flexibility to take days off following such stints. It was well-balanced. But now, we are preparing for an outbreak. I will keep these personal work hours policies updated with amendments if the work-from-home drags on for a long time.

Personal work hour policies during the work-from-home era of the Coronavirus pandemic

I made these policies to emulate my work habits as if I am working from my actual office.

1. Work from a specific "office" area at home
My "home office."
Today I brought back my office laptop home. I have a home laptop with all the stuff I needed to do my office work configured. But, since this is going to be a long time, I will work only on my office laptop during work hours. I have also set up our dining table as my office table. My other laptop table remains in my room with my personal laptop on it.

2. Use the office laptop for office work only.
I will use it for work only, and everything else on my personal laptop. Conversely, I won't use my own laptop to do office work during these days.

3. Wake up and work at the same time as usual
Every day I arrive at work at 7:25 a.m. and leave at 4:45 p.m. I will continue the same work hours during these days. Similarly, during the weekdays, I will continue to wake up at the same time (6 a.m).

4. No music
I like to listen to music. Mainly Romanian, Chinese, and Indian music. But I never listen to music at work (except with an exception during my time in Saudi Arabia where I used music and an earphone to combat the background noise). I like quiet environments to work. My apartment is quiet, especially given that currently, I am home alone. I will maintain it by not introducing any music.

5. No Twitter
Twitter is the only active social media I currently use. I will not use Twitter except during lunchtime while I am working at home. Although I still check Twitter once in a while at work from my office, I will avoid that when I work from home to avoid getting dragged into a long conversation. These days, with the pandemic and panic going on, it is easy to lose track of time on Twitter.

6. Wear proper clothes 
When I am working from home, I will wear regular clothes, instead of working in underwear.

7. No alcohol
I always have some alcohol in my fridge. I am not addicted. I drink when I want to. I am very alcohol-tolerant. But as a practice, I won't drink during my work hours. 

8. Follow the lunch timing
Usually, I cook during the evenings after work for my dinner and lunch the next day. I pack the lunch for the next day in a lunch box so that I can use the microwave to heat it. I will continue this practice, rather than trying to cook during lunch hours.

9. No personal calls
Unless it is a quick one. Typically, I do not make calls from the office since there is no place to make calls from. That means, lack of privacy, and if I did make a call that would disturb others.

10. Make random meaningless coffee-time video calls to my colleagues
This one actually depends on my colleagues. If they do not agree, this won't work. :P I like people - I am an extrovert since 2012. I do not want to be away from humanity for too long. At work, we often make our small fun activities, like making poems and talking nonsense. It is nice to continue the tradition. These calls are not the same as the progress meetings or weekly standups. Those meetings have a purpose.

11. Go for coffee with colleagues
In case of a severe outbreak, this would break the purpose of working from home. But I think at least for now, a coffee walk with my colleagues who live close by is not a big problem (as long as they are up for that). At work, we often go for coffee at least once a week.

12. Track how everything goes
I just want to work as usual. Not to work slower or not to work to death. I still estimate we have 2 more weeks of regular life. But if things get worse by then, we need to be ready for a bumpy ride.


This year so far has been horrible. Previously 2016 and 2007 were the only two years I felt hopeless. This pandemic and some other problems made my 2020 go bad. But today, I met my mentor, and her positive words made me optimistic again. I believe we all will overcome this pandemic together. Whatever that doesn't wipe the humanity off the earth, will make us stronger.

Monday, January 20, 2020

The Birkman Method

We recently had a Birkman session at the university. This is the summary map I got. So I am usually an extrovert - which I think is correct.

Image
 The detailed components go like this.
Image

Then my interests are as below.

Image
 Finally, my top career areas to explore:

Image

That's it folks. Birkman uncle told me I should manage a Sri Lanka restaurant. lol. Restaurante Sri Lanka em Parque das Nações (Ano 2048) 😇

Saturday, November 2, 2019

Universities as mentoring organizations

A fraction of the session participants.
(Photo by akram9)
We had several exciting unconference sessions and talks at the Google Summer of Code Mentor Summit 2019. I proposed and coordinated the unconference session titled "Universities as mentoring organizations," the Sunday 20th of October 9:00 - 10:00 a.m., at the room "Studio-2" of Mariott Munich. We had around 25 active participants from multiple organizations, representing several universities from countries including the USA, Russia, Germany, and Brazil. We also had mentors from umbrella organizations such as CBioPortal, which are based mainly on universities and research labs. Later on the same day, I coordinated a session on "The great proposals" at "Studio-3" from 2:30 - 3:30 p.m.
 
The session notes are recorded in the GSoC notes. Find all the sessions with their notes linked here. In this post, I elaborate on the session "Universities as mentoring organizations" again for a wider audience. These notes are from the thoughts of the mentors from the communities involved in the discussion, and they will reflect the communities involved. Discussions are grouped under topics, rather than by the person who discussed it in detail.


Administrative Challenges from the Universities
The discussions started interactively with self-introductions from the participating mentors and their mentoring organizations. Tiago and Frederico represented their mentoring organization, which is a university in Brazil. They noted the challenges in convincing the university administration to join the GSoC and then accepting payments from Google. Their organization was the first Brazilian university to become a GSoC mentoring organization. The university administration was expecting documents from Google in Brazilian Portuguese to consider them official. The universities expected the supporting documents could come from one of Google's Brazil offices rather than from Mountain View to make the language requirement met. This expectation created some translation requirements, and these cause some additional burden on the GSoC organization administrators. We hope that such issues would be sorted out from the universities, potentially with some help from Google, so that there would be more participation from International (i.e., non-English-speaking) universities as mentoring organizations.

GSoC Students Help Ph.D. Research
There were several observations on how GSoC students help with the implementation of research ideas. Several Ph.D. implementation works remain closed due to implementations with significant technical debt. Support from GSoC helps make the code more readable and reusable, and thus support open science and open-source contributions from the research universities. Hence GSoC students help with the host universities' research open-source. Stephanie from UC Santa Cruz stressed how GSoC helped their Ph.D. and postdoc researchers. She mentioned how the undergrad GSoC students worked with their researchers, and this was a mutually benefitting task. No one felt mentoring as a burden. Instead, they saw it as a way to build their communities. She highlighted that promoting GSoC across the departments would be easy by stressing how the GSoC students help with implementation over the summer, paid by Google. There were observations on developing projects that are helpful to Ph.D. research work. For instance, there are projects that need to get done, but the mentor (Ph.D. student) doesn’t want to do it themselves. GSoC students can be more motivated to do such implementation work, even with little scientific or research impact.

Motivating students into GSoC
Philipp and Karlheinz represented the University of Munich. They highlighted that their Ph.D. students work as GSoC mentors. However, they also stressed that it is hard to motivate students from their university to join GSoC as students due to summer holidays overlapping with the GSoC timeframe. Students do go off on a well-deserved vacation rather than taking a summer internship. Furthermore, there were observations that there are several university open source projects that are not connected to or well-received by general open-source software communities. GSoC is indeed bridging this gap, making the code quality better. I (Pradeeban from EmoryBMI - Emory University, Department of Biomedical Informatics) mentioned how we motivate our students to join GSoC while discouraging them from joining our university/department as their mentoring organization. This is mainly because we want them to build new collaborations. Collaborations inside the university or department do not require a GSoC.

GSoC in professional life
The impact of GSoC is long-lasting. Some of us found our postdoc advisors and employers through GSoC. So GSoC works as a recruiting platform for mentors as well. Furthermore, it fosters international collaborations between universities. We had research papers as outcomes of the GSoC. Nikita from a Russian university stressed the importance of GSoC and similar programs in the students' careers. Sebastian Diaz from Harvard highlighted the misalignment of university projects and open source.  While challenging, programs such as GSoC help fill this gap. Tobias highlights that their target students mostly include Ph.D. students, as they are more suited to work on the proposed projects, involving significant research component. GSoC also helps them get more funding for full-time developers, as it is considered a full-time job for the students.

Best Students Join the Universities
Aadi, a high school student from India, observed how the best students like to work with the universities that function as GSoC mentoring organizations, as this helps them with their future graduate studies and research. The observation is that the mentors from such organizations are professors and experienced researchers from the universities. This is also a mutually beneficial relationship - universities as mentoring organizations get to have the best students. Indeed, a win-win. I joined OMII-UK as a student in 2010 and Emory BMI in 2014 and 2015 (2016 onward as a mentor and then joined 2018 as an employee, and 2019 a postdoc). My interest for OMII-UK was driven by the fact that the EPCC research team from Edinburgh University was part of OMII-UK, where I worked with them on their OGSA-DAI platform for GSoC 2010.

Why not enough open-source from universities?
Arav Singhal, a student from Rice University, stressed he prefers more open-source development in his university. We agreed that a large number of research teams keep their source closed until their paper is published. Even after that, when the code is made public, most of the time, the code is not reusable, as it is usually developed as a prototype with little attention to code readability and engineering best practices. GSoC helps fix this by building up coding skill sets in the students.

Small open source presence in a country
Deniz is from Turkey. He was a GCI student in ScoReLab (an open-source community originated from the University of Colombo, Sri Lanka). He highlighted how Turkey has a small open-source presence. The lack of local open-source communities makes motivating students to join GSoC harder. The challenge starts with introducing open-source and then GSoC, among the potential students. It is essential to create larger local open source communities to build a diverse expanding FOSS contribution.

Universities/Entities collaborating as a single mentoring organization
Werner highlighted the collaboration between multiple universities as a mentoring organization. He observes that sometimes, organizations do not make it explicit enough that the organization is a research department or a collaboration of such entities. Ino from cBioPortal highlighted how hospitals and research institutes collaborate under cBioPortal for the GSoC. He also noted the positive outcomes, including peer-reviewed publications. Mentioning GSoC in CV - both as a student and a mentor - is rewarding. There were suggestions on including open source software development in university curriculums.


Starting a new GSoC organization
Akram, a researcher from the University of Tennessee (Dept. of Bioinformatics), helped his department apply and get selected as a mentoring organization as a new (i.e., first-time) GSoC organization when he moved to the current university as a postdoc from another university. He observes that the GSoC has indeed resulted in peer-reviewed journal publications. He stressed the importance of open source for university. There seems to be a common observation that receiving funding from Google (or any similar company) becomes harder due to the university regulations. We discussed how to start as a mentoring organization. We need to have precise project ideas. Of course, having established open source projects would be a big plus. However, there are also concerns about having mentors and retaining them. How to grow our mentoring community beyond the walls of the department/university? GSoC students-turning-mentors can be a solution. We also need to be clear on the bioinformatics side on what can be open source, as we deal with sensitive health data.

Recruiting Mentors
One challenge to address is how to leverage more departments/mentors from the university. Some of us view it as building the community. This requires a significant effort from the organization administrator, to convince the fellow faculty, postdocs, researchers, and staff to be GSoC mentors. Sometimes, it is faster just to do the development ourselves, rather than mentoring a student to do it. However, such mentoring can be a rewarding experience for early career researchers. We all agreed that GSoC was a productive use of our time and not overhead. We also need a "mentor pool." Projects should have back-up mentors and an active co-mentors. This helps with avoiding mentor burn-out, ensuring successful completion of GSoC.

GSoC as a funding mechanism
Some view GSoC projects as a way to fund implementation activities that other funding entities won’t financially support. For instance, development is often seen as not novel. Therefore, complete implementation and maintenance efforts do not receive sufficient funding from the funding agents. However, such maintenance and incremental developments are crucial for the usability of the project. GSoC helps improve the usability and maintenance of the code.


Did we miss anything? Also, did I fail to include any crucial aspects discussed in the session? Please share your thoughts on this topic as comments.

Saturday, October 26, 2019

Google Summer of Code Mentor Summit 2019

The mentor summit in full swing
This is the first time the Google Summer of Code (GSoC) mentor summit came outside of the CA/USA, and the first time to Europe. The mentor summit was in Munich this year, and I am back to the summit after 8 years. Last time I represented AbiWord in 2011 in GooglePlex (the GoogleHQ in MountainView, CA, USA). This time, the Department of Biomedical Informatics, Emory University as both the primary organization administrator as well as a mentor. We traveled from Atlanta to Munich for the mentor summit. I am a happy participant of GSoC for 10 years, under various capacities - student, mentor, and organization administrator. I have been with AbiWord, OMII-UK, Emory BMI between 2009 - 2019.

As the mentor summit was in Munich, Google had organized it longer than usual. Usually, the mentor summits were just 2 days. This time, it was 3 days. The first day was a free day for social activity: a visit to Nymphenburg Palace, followed by a scavenger hunt. The remaining 2 days were the actual mentor summit. The format did not change at all, after almost a decade. It was unconference, where we propose our own topics in a post-it note among the available 8+ slots per session. Then folks join a session that they are interested in (or propose a session of their own). This time I coordinated 2 sessions. The first one was on "Universities as mentoring organizations" and the other one was on "The great proposals". The first one had active participation from around 25 mentors, with most of them representing a university or an umbrella organization formed by universities. The second one was a session I coordinated as the last session of GSoC, mostly because the only other session was "Fail your students," and I did not want to end the mentor summit on a negative note.

I enjoyed coordinating the 2 sessions as well as attending some exciting sessions in all the slots. Of course, I missed several sessions that I would like to participate as I can be physically present in only one session in any given timeslot. I met several mentors from multiple mentoring organizations. We also had a few free days in Europe since the mentor summit was just 3 days, and we were in Europe for a week. We used those days to explore Germany, Austria, and the Baltic nations. Quite a packed trip, indeed.

It was a rewarding experience to attend the mentor summit and proactively contribute to the summit. I made connections with many of the mentor summit participants. Please keep in touch, if you are one of those who attended the mentor summit. :)

Sunday, March 10, 2019

[GSoC] What if the project idea that I am working on doesn't get into the final list?

I have received this question in the past and recently. I decided to post my answer as a blog post for future references.

A student asks, "There are like 11 project ideas for your organization. Based on my observation, I see that you have received only 4 - 6 final projects accepted, despite having 10+ project ideas. Now I work hard for a project idea, and what if you guys decide not to select the project idea because that idea is not important to your organization? Will my hard work go wasted? So is this pure luck?"

I decided to leave a detailed answer as the primary organization administrator for GSoC/Emory BMI.

tl;dr: Work on your project idea and be the best. Avoid worrying about the selection process.

The long answer: One thing I can assure is, most GSoC organizations select students. Not project ideas. As much as I know, GSoC organizations (including us) don't list project ideas that are not the priority to them. That means, all the ideas have near-equal priority (of course, there will be minor variations among the project priorities).

In the worst case, I have experienced in the past once when 2 of the best students apply to the same project, the mentors tried to encourage the 2nd best student to apply for another project in the same organization. Every organization is different, and every mentor is different. So don't quote me on this with another organization.

Wearing my ex-GSoC student hat, my advice to current potential students is, try to be the best among the organization's potential applicants. Remember you also can apply for up to 3 projects if you are worried a lot. I have been a GSoC student 4 times successfully (now you can be a student only twice. But those days, no such limitation).  2 out of 4 times, I applied to more than one project. During my first GSoC as a student (2009), I applied for 2 projects in the same organization. The mentors choose one project  that was the most relevant for them among the 2, I applied. In 2014, I applied to 3 different organizations and 2 chose my proposal (and of course in that case, mentors try to contact you to find your first option, or sometimes agree upon themselves which organization should accept you. You can of course work on only one GSoC project at a year. No exception).

Now, my only worry as an organization administrator that Google may not give the maximum number of slots we (open source organizations) want. Google has a limited budget and a number of students in mind. It is around 1000 students. Therefore, many organizations do not get the number of students they want. Of course, we always can ask for one additional slot than they offered. But still, there is no guarantee. I ***guess*** we will get 4 - 5 slots this year. But that is just a guess. But I strongly believe, we will get at least 2 slots. Definitely not possible to get 11 slots. That means, some of our ideas won't have students working on them as a result. But that is life. ;)

To answer your comments on the discussions in Quora on GSoC: Take them with a grain of salt. Many of them i) are subjective, ii) plainly wrong despite their high up-votes iii) often come from someone who has never been a student or a mentor, iv) based on limited experience, or v) outdated as GSoC has changed its rules during the past 15 years.

Now, stop worrying and focus on working on the project ideas. Optimism and hard work go a long way! Make sure to do your research and show your talents to the mentor. We can only choose one student for a project idea, and naturally, that goes to the best candidate.

Sunday, March 3, 2019

Google Summer of Code: The best of both worlds - II

Google Summer of Code (GSoC) is an exciting program dedicated to open source projects, funded by Google. I have been a student 4 times - 2009 with AbiWord, 2010 with OMII-UK, 2014 and 2015 with Emory BMI. I have been a mentor 4 times too - 2011, 2012, and 2013 with AbiWord and 2016 with Emory BMI. After a short break, now I am happy to be the primary organization administrator for Emory BMI - my first time as an organization administrator, my 9th year of involvement with GSoC. In this post, I plan to share a few suggestions to the potential students.

 While I have written several posts in the past on this topic, several things have changed recently. Most importantly, now the students can do a maximum of 2 GSoCs in their lifetime as a student. There was no such limit during those days, and some of us ended up with 4 or more years as GSoC students. This change means, if you have been a GSoC student twice before (for example, GSoC 2016 and GSoC 2018), you cannot be a student anymore. That means I can never be a GSoC student again. :P Second, previously there were only 2 evaluations. Now there are 3. 2016 and before, everyone received the same stipend. Now, it depends on the country of your university (most likely this is your resident country). Also, you can apply only up to 3 projects now. Previously this upper limit was higher (It was reduced from 20 to 5, and eventually became 5). The reduced limit aims to favor quality instead of quantity.

The beauty of open source projects come into play that it gives you a chance to work on something that you are interested in, in your way. But not many of us get time to dedicate to an open source project due to our other commitments and as we are busy with our regular work, study, and related activities. Now let's come to the topic.

Why GSoC?

GSoC is an annual program from Google for the university students of age 18 and more. Each student codes for her preferred open source organization for 3 months. Google coordinates and awards the successful students. Though open source organizations are run mostly by volunteer developers, Google pays the students. The exact money you will get depend on the location of your country, varying between 3000 $ - 6600$. For example, if you study in Portugal, the total you will get is 4200$. If you study in Sri Lanka, this will be 3000$. You will also receive a certificate, an awesome t-shirt, and some small gifts! Hence you can focus entirely on the program during the 3 months.

3 milestones. 
    • First Evaluation (paid ~July 1): 30%
    • Second Evaluation (paid ~July 29): 30%
    • Final Evaluation (paid ~September 5): 40%

      Some statistics of 2019
      • 206 Organizations
      • Expecting more than 2000 mentors and co-mentors.
      • Expecting more than 1000 successful students.

      The success rate is pretty high!
      Historically, 1 in 4 students gets accepted to GSoC. That means, for every 4 complete project proposals, 1 got selected. Once the student is selected, the chance of her completing the project successfully is much higher. In the past years, the success rate was around 85 - 90% for the students who are accepted. The high success rate is because the mentors and the organization are with the student to provide him assistance and guidance, whenever is needed.

      The passion for open source and the desire to be an outstanding student are considered to be the primary reasons for a student to participate in the Google Summer of Code. Not forgetting to mention, earning money for the summer.

      A computer with the Internet connection, knowledge and experience in the domain, and the motivation are the required to participate. Of course, you should be interested in contributing to the particular open source organization.

      Don't forget to check the timeline and adhere to it strictly.

      Before you begin..
      • Google Summer of Code is all about being Open Source.
        • Get your basics and motives right.
      • Netiquettes.
      • Sign up to the mailing lists (if any).
      • Join the relevant Slack/IRC channels (if any).

      Technologies
      • Version Control Systems
        • These days, it is mostly Git. But this depends on your organization.
      • Build Tools
        • Mostly Maven.
      • IDEs (Integrated Development Environments)
        • IntelliJ IDEA, Eclipse, Microsoft Visual Studio, ..
        • Of course, specific to the programming language
      • Issue Tracker
        • GitHub, Jira, Bugzilla, ..

      Communicating with the team..
      .. and the mentor, over the Internet..
      • Slack or Internet Relay Chat (IRC)
      • Mailing Lists
        • Dev, User, Commit lists, sub-groups, ..
      • Issue Tracker
      • Forums and wiki
      • Blogs
      • Only if the mentor proposes - Skype, Personal Mails, gtalk, conference calls, ..

      Network Etiquettes
      • Be Specific and clear.
      • Research (google.. ;) or Go through the previous Slack/mailing list messages) before asking.
      • Be helpful to others.
      • Be ethical; respect.
      • NO CAPS! (UNLESS YOU ARE SHOUTING!)
      • Don't take messages personally.
      • Dn't snd ur sms msgs to thrds or lsts.

      Proper Addressing..over the lists/irc/..
      • Address the devs and users properly.
        • First Name or Preferred calling name.
        • No excuse not knowing the mentors' names. Mentors' names are listed under the project descriptions.
        • NO Sir, Madam, bro, sis, pal..
          • Even if you know them, personally.
          • Especially, don't say "Dear Sir/Madam". Can be perceived as too pretentious, lazy, or impersonal.
        • No Mr., Dr., or Prof. either.
      • Be gender neutral.
        • “Folks” over “Guys and Girls”.
      • Not too personal.
        • “Hi” is preferred more than “Dear”.

      Mailing lists
      • Post only to the relevant list.
      • Check the mail archives first.
        • To avoid getting RTFW/RTFM responses.
      • No [URGENT]/[IMPORTANT] tags.
      • No unnecessary attachments.
      • No Cross Posting.
        • Stick to the proper mailing list only.
      • Don't hijack threads.
      • Don't post off-topic.

      IRC/Slack Etiquettes
      • Be a reader first when you enter a Slack channel. Check the previous discussions before asking a question. Most likely your question is already answered. I know, "No question is a bad question". But remember, you are being evaluated for GSoC slots (Even if we want to mentor all of you, we cannot. GSoC gives only a certain number of slots to each organization, and mentors have limited time too). You don't want to come across as someone who is lazy.
      • Refer to others using their irc/slack nick.
        • Whenever my irc nick is mentioned, I get a pop up message from my irc client such as pidgin.
        • Don't use @channel in your message at all. This will send a notification to everyone. You don't want to send a message "@Channel, I have submitted my proposal".
      • Don't expect immediate replies; wait.
      • Avoid these common mistakes.

      Find a mentoring organization..
      • More than 200!
      • Find the organization you like the most.

      Find THE right project..    
      Once you have found the right organization(s), that matches your interest and expertise, you have to go through the ideas list (e.g.: the project ideas of Emory BMI). You can apply up to 3 projects in total - either from the same organization or from 2 or 3 organizations.


      Get to know more about the projects
      • Talk to the mentor(s) Assigned by the organization for each project idea.
        • Use the recommended channels. 
        • Avoid the temptation to send private emails, LinkedIn messages, private Slack messages, etc. These are not going to help you. Use the appropriate public channels (unless advised by your mentor otherwise).
      • Mailing lists and archives.
      • Issue Tracker
        • Open issues or tickets
          • New features/enhancements (RFE)
          • Bugs (easy/difficult and normal/critical)

      What makes you special?
      • Experience
        • Being a great user doesn't mean that you can be a good developer.
        • Demonstrate your ability to perform the task with prototypes and bug fixes.
        • Possess the required skills.
          • If you have no clue about C, you will be unlikely (although I sound negative to say this) to get selected for a project that requires expertise in C. Better luck applying for a project that you know the development language and the tools.
      • Complete the prerequisites for the project and additional requirements related to the project proposal from the mentors.
        • The prerequisites such as a screenshot/video of a working deployment/demo/prototype, pull requests, or some prototype code are there for some reason. Don't ignore them if you want to be considered seriously.
      • Your interests and motivation
        • Pick something you enjoy doing.
        • Being a great developer doesn't mean that you can be a good contributor.
        • What makes you the right person?
      • Willingness to contribute to the community beyond the time frame of GSoC. 
        • Usually, Open Source organizations want committers and longtime volunteers - Not just students!
      Experience
      • Languages
        • Java, C++, C, ..
        • Not much time to learn a new language (?)
      • Prove It with pull requests to existing bugs or feature requests in the code base!
        • Submit well-tested and complete pull requests. Don't send half-baked pull requests that won't even compile or breaks a functionality. Remember, you are consuming the mentor's time who might be answering up to 100 of students (depending on the popularity of their project ideas). Also, for most of us mentors, GSoC is just a part-time activity or a hobby. It is not our full-time job (in contrast, the selected students are expected to consider GSoC as a full-time job during the 3 month coding period).
        • Assist other students!!!
        • Project expertise
          • Bug reports and fixes.
          • Go through the archives, wikis, and web sites.

      Opportunities..
      • A project that matches your previous work experience.
        • Choose the right project.
      • Timezone Difference 
        • Use it effectively - Most students prefer to work in nights too, as they may have lectures in the mornings.
      • Multiple Applications (3!) If you apply for multiple projects, make sure all of them are of high quality. Remember the equation of quality vs. quantity.
      • Communicate early and often.
      • Be heard, visible, responsive,  and quick!
        • Ask questions, and more importantly answer others' questions.

      Apply

      Register as a student for GSoC, as the first step of the application procedure. Make sure to follow the deadline and submit the application and the student proof such as a transcript or student ID indicating that you are currently an active student (as important as the GSoC proposal, as far as Google is concerned) on time. No execuses and no exceptions. If you miss the deadlines set by Google, your mentoring organization cannot help you, even if they want to. Share your draft proposal with mentors early on, and iterate over it with the mentors' feedback.

      Apply on Google's site, at the earliest possible, as you can edit it later, until the last minute. Check often for the mentors' comments and attend to them. Only the organization mentors can see your proposal unless you decide to make it public. Please don't wait until the last minute to submit the proposal and the other necessary documents. There will definitely be a powercut or an Internet outage right at the deadline. I have warned you. ;) Submit early and continue editing. That's the approach.

      How to impress the mentors/developers?
      • Stick to the organization's template.
      • Abstract.
      • Introduce yourself properly.
        • Focus on the relevant facts.
        • Why do you fit? Your skill sets.
        • List of the pull requests or patches (if any) you have submitted. Make sure to identify the working/accepted ones.
      • Project Goals
        • Proves you got them correct.
      • Deliverables
        • Code, documentation, test cases, binary releases, ..
      • Description - can also be given along with the timeline
        • Benefits to the organization and other projects  
      • Timeline
      • Links - References and additional details.

      After the submission..
      • Don't go invisible!
        • Evaluation is still going on.. ;)
      • You may be asked to provide additional information.
        • Pull requests, prototypes, demos, patches, ..
        • Screenshots.
        • A Skype Interview request from the mentors!!
      • Start coding on your project - if you have time and of course it will help in the selection process (It is going on once you have submitted).
      • Be motivated.

      Got Selected? Community Bonding Period!!!

      Don't panic. You have 3 more weeks, just to mingle with the developers and the code base. Mentors are there to help you! Keep touch with the developers and users. Learn the project by going through the code base and documentation such as coding styles and coding guidelines. This will help you understand the project idea more. Come up with a design and start with simple hacks.

      Coding

      Finally comes the coding - the easiest task of all. Commit often, if you are given committership. When committing or sending pull requests or patches, make sure to include meaningful Commit messages. Get feedback from the mentor(s) on your commits or patches frequently. Keep the community updated. Committing or sending patches daily would be a good approach.

      Plan for the 3 evaluations early, with the mentor. This will help you reach the target successfully. You might also need to revisit the project goals if required, during the milestones.



      Conclude/Continue

      Whatever coding or related job done on your project after the GSoC deadline will not be considered part of your summer of code, and will be considered a volunteer work on the project. Try to stick around your project community after the successful completion of your GSoC. You can aim to be a committer, long-term contributor, or even a mentor for GSoC next year.

      Monday, February 25, 2019

      Emory BMI and Google Summer of Code (GSoC) 2019

      Google has announced the list of accepted organizations for GSoC 2019. We (Emory BMI) have been selected once more as a mentoring organization, after our successful summers in 2016 and earlier. :) Our project ideas - https://github.com/sharmalab/Emory-BMI-GSoC-2019 Our ideas are on the topics of big data, data visualization, and machine learning, in the context of biomedical research.

      The stipend varies by the country of the student's university. For example, for Portuguese universities, it is 4200 $ in total (for a 3 month of coding - May 27, 2019 - August 19, 2019) for each successful student. Students can apply for projects from around 100 of mentoring organizations (including Emory BMI) that are accepted as a mentoring organization.

      Anyone can apply as long as they have never been a student in GSoC before (new regulations!) and currently a student of age 18 or more from a university/college.

      P.S: This will be my 9th year with GSoC and 5th time as a mentor. However, this is my first time as the primary organization administrator. :) I started my first GSoC as a student in 2009. It has been 10 years already.

      Friday, September 1, 2017

      On-Demand Service-Based Big Data Integration: Optimized for Research Collaboration

      Today I presented my paper "Obidos" at the VLDB DMAH workshop in Munich. The abstract and the presentation of the paper are given below:

      Abstract: Biomedical research requires distributed access, analysis, and sharing of data from various disperse sources in the Internet scale. Due to the volume and variety of big data, materialized data integration is often infeasible or too expensive including the costs of bandwidth, storage, maintenance, and management. Óbidos (On-demand Big Data Integration, Distribution, and Orchestration System) provides a novel on-demand integration approach for heterogeneous distributed data. Instead of integrating data from the data sources to build a complete data warehouse as the initial step, Óbidos employs a hybrid approach of virtual and materialized data integrations. By allocating unique identifiers as pointers to virtually integrated data sets, Óbidos supports efficient data sharing among data consumers. We design Óbidos as a generic service-based data integration system, and implement and evaluate a prototype for multimodal medical data.

      Please find the full text of the paper here and the presentation below:
      I mostly worked on this paper while I was doing my internship at Emory University. This is also my first paper to get accepted from UCLouvain/Belgium, under the supervision of Prof. Van Roy. In this presentation, I have also included "A tale of Ana, Abdul, Viktoria, Pereira, Chen, and Raj", a subtle message I wanted to include in my presentation for quite some time.

      Friday, August 12, 2016

      [GSoC 2016] MediCurator : Near Duplicate Detection for Medical Data Warehouse Construction

      This summer, at the Department of Biomedical Informatics, Emory University (Emory BMI), we have another set of intelligent students working on interesting projects. I have been mentoring Yiru Chen (Irene) from Peking University, on the project "MediCurator: Near Duplicate Detection for Medical Data Warehouse Construction" for the past couple of months. Currently we have reached the final stages of the project, as the student evaluation period starts on the 15th of August. This post is a summary of this successful GSoC, as well as a history behind the near duplicate detection efforts.

      The early history of MediCurator
      MediCurator was a research prototype that I initially developed based on my paper ∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data (CoopIS'15) as part of my data quality research, along with my GSoC 2015 work on data integration. The early results were presented as a poster at AMIA 2016 in San Francisco.


      MediCurator and Infinispan
      Now we have a more complete implementation of MediCurator and a use case for medical data, thanks to the support provided by GSoC. For her implementation, Irene did some benchmarks before choosing to go with the Infinispan's latest distributed streams for the distributed execution. (You may find some interesting discussion on the Infinispan distributed streams here.)

      MediCurator Usecase
      MediCurator is a data quality platform for the ETL workflows in data warehouse construction. It optimizes the bandwidth usage by avoiding the duplicate downloads, and optimizes the storage by eliminating the near duplicates in the warehouse thus increasing the data quality. When data is downloaded, the source locations are tracked, and when data is updated in the source at a latter time, the subsequent download process will download only the new data.

      Similarly, data is deduplicated at the data warehouse, as near duplicates could be present there since data is integrated from multiple data sources. Here the data pairs are evaluated for near duplicates in a distributed manner, with duplicate pairs stored separately, while the clean data stays in the warehouse. The duplicate detection workflow also considers the corrupted data/metadata, and synchronizes/downloads the clean data from the source.

      This is useful for medical images due to the large scale of the data, often binary in nature along with textual metadata. Efficiency of MediCurator is ensured through its in-memory data grid-based architecture. MediCurator fits well with the landscape of distributed data integration and federation platforms developed at Emory BMI.

      More Details on GSoC 2016
      Irene developed the entire code base from scratch as an open source project. MediCurator also has a ReadTheDocs* based documentation which gives more detailed description to the project. In addition, you may learn the summary of weekly progresses at Irene's blog. MediCurator's scope remained dynamic throughout the project. MediCurator has download tracking and detecting duplicates across the datasets online and offline, in addition to the near duplicate detection. Most of the code was developed exclusively having the cancer imaging archive (TCIA) as the core data source with DICOM as the default data format, while maintaining relevant interfaces and APIs for extension to other data sources and data types.

      Future Work
      The summer was productive. It included both research and implementations. The GSoC time is limited to 4 months (including the community bonding period), and we are reaching a successful end to a yet another Google Summer of Code. Nevertheless, we hope to work on a research publication with combined results on MediCurator, along with the previous ∂u∂u** and SDN-based Mayan (presented at ICWS 2016) approaches in November. This will be our first publication with Irene on her findings and implementations, with further evaluations on the clusters in INESC-ID Lisboa. More updates on this later (possibly after publishing the paper ;)).

      Concluding Remarks
      This is my 4th time in the Google Summer of Code as a mentor, and 3rd time as the primary mentor for a project. Previously I mentored 2 successful students in 2011 and 2012 for AbiWord. I wish every student success as they reach the end of their summer of code.

      * I recommend ReadTheDocs. You should give a try!
      ** You may find the paper on ∂u∂u interesting, if you are into data quality or distributed near duplicate detection.

      Wednesday, March 2, 2016

      Google Summer of Code 2016 with Biomedical Informatics, Emory University

      Biomedical Informatics, Emory University (Emory BMI) is once more in Google Summer of Code! Please find the project ideas from Emory University, here. We are using a Slack team (gsoc2016-bmi.slack.com) for communications among students and mentors.

      My journey with GSoC since 2009 has been pretty interesting, from student to mentor to student to mentor. 2009 (Student/AbiWord) -> 2010 (Student/OMII-UK) -> 2011 (Mentor/AbiWord) -> 2012 (Mentor/AbiWord) -> 2013 (Mentor/AbiWord) -> 2014 (Student/EmoryBMI) -> 2015 (Student/EmoryBMI) -> 2016 (Mentor/EmoryBMI).

      I have always enjoyed being a student as well as a mentor. However, I may not be able to become a student once more in the future, due to the new eligibility requirement introduced in the GSoC this year: veterans who have participated in more than twice cannot reapply as a student.

      What are the eligibility requirements for participation?

      • You must be at least 18 years of age
      • You must be a full or part-time student at an accredited university (or have been accepted as of April 22, 2016)
      • You must be eligible to work in the country you will reside in during the program
      • You have not already participated as a Student in GSoC more than twice
      • You must reside in a country that is not currently embargoed by the United States. See Program Rules for more information.

      Wednesday, August 12, 2015

      [KDD 2015] MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives


      One of my recent papers, "MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives" was presented at a KDD workshop this week in Sydney.

      I could not present the paper myself at KDD, as my Australian visa was delayed. Luckily, my friend Denis who lives in Sydney helped me by presenting the paper. Much thanks to him!


      Abstract
      With the growing adaptation of pervasive computing into medical domain and increasingly open access to data, metadata stored in medical image archives and legacy data stores is shared and synchronized across multiple devices of data consumers. While many medical image sources provide APIs for public access, an architecture that orchestrates an effective sharing and synchronization of metadata across multiple users, from different storage media and data sources, is still lacking. This paper presents MEDIator, a data sharing and synchronization middleware platform for heterogeneous medical image archives. MEDIator allows sharing pointers to medical data efficiently, while letting the consumers manipulate the pointers without modifying the raw medical data. MEDIator has been implemented for multiple data sources, including Amazon S3, The Cancer Imaging Archive (TCIA), caMicroscope , and metadata from CSV files for cancer images.

      Monday, April 21, 2014

      [GSoC 2014] Data Replication / Synchronization Tools


      Project Mentor: Ashish Sharma
      Short description: Consumers download the data by searching the image repository using the browser. The information that the consumer is interested in, gets updated whenever the data producers update or add patient information. The current download tool lacks the ability to track the relevant updates to the consumer. A pub-sub solution based on Apache CXF, utilizing the JAX-RS REST API of CXF will assist automated downloads to the consumers.


      I am happy to be back wearing the student hat in the GSoC after a long time, with the transition of Student (AbiWord) -> Student (OMII-UK) -> Mentor (AbiWord) -> Mentor (AbiWord) -> Mentor (AbiWord) -> Student (BMI-CCI). Interestingly, this is the 10th GSoC, and the 6th Google Summer of Code that I am involved in. This is also my 3rd time as a student, and the first time ending up in the de-duplication, with Freenet and Emory BMI, where both of them are two awesome project communities. My thanks goes to the mentors and developers from both organizations. Hope I will be able to work with Freenet in the upcoming years, as I could work with only one organization in a summer. I have already started playing with Freenet, as a side effect of the GSoC. :-)

      I will probably keep this blog updated with information related to the GSoC 2014. My sincere thanks goes to my supervisor Prof. Luis Veiga for his continuous motivation, and also for encouraging me in GSoC. My thanks also goes to OGSA-DAI mentors and developers from the EPCC at the Edinburgh University, which was the first university affiliated lab that I worked with, for a GSoC. I loved that experience. I loved being with AbiWord as a student and as a mentor. This year, AbiWord chose not to participate in the GSoC, after its successful participation since 2006 to 2013.