Sunday, March 27, 2016

Plagiarizers, Eastern Cultures, and GSoC.

Every time a plagiarism is detected or some copy-pasta is found, there is always someone who defend the offender as "It is common in the Eastern cultures. It is not considered a crime there. Plagiarism is not well-defined in those countries" Those who defend these offenders themselves come from the Asian countries, and while they are genuinely attempt to help a student from their region, they are doing more harm to the system than helping to fix it. As usual, we also have the plagiarism issue pop up in the GSoC, and people attempting to forgive the offenders in the name of culture.

Even in Eastern cultures, there is a clear difference between copying from another student vs. copying from elsewhere (such as the Internet, or seniors' notebook). While some students (the not-so-smart ones, obviously), tend to assume it is ok to copy text segments from the Internet, everyone (even the stupidest kid from the middle school) knows that copying another student is a serious offence. It is simple as that from the primary school that you may not just copy your friend's homework.

On the other hand, this year, I found at least 3 project proposals that were copied from the Internet, word-by-word. 2 of them did not even modify a single word from a white paper. There were some obviously vague and generic proposals (the student may have sent the same to multiple related organizations), and some proposals in the form of just a CV (with text copied from the project idea list, sometimes) even from students of higher academic level (such as being enrolled in a PhD from a well-known university).
Probably these students should understand that mere copy-pasting or similar vague proposals are not going to help them get selected. The mentors have big eyes. They are not going to get you in, just because you have some text entered in your proposal.

Regardless of whether this was due to plain ignorance or an attempt to cheat, these students just do not fit the GSoC, and probably they can re-apply next year. Someone who cannot just put up a simple proposal is not going to be a good student anyway. On the other hand, there are many strong proposals from the Eastern cultures. Defending, tolerating, or forgiving plagiarizers in the name of culture is counter-productive. I don't really think this has anything to do with the culture at all.

I myself am from a country of Eastern culture, and I was well-aware that plagiarism is not right from very young ages. I was taught so in the schools. Similarly, when I was mentoring students at the University of Moratuwa in Sri Lanka, I always warned them not to plagiarize work from the Internet. Even before I warned them, they were still aware themselves that that is not ok to copy other students' work. So if you think you may seek forgiveness to the offenders in the name of culture, you are just harming the system than helping to fix it.

Not really sure whether it is important to spend too much time discussing the bad proposals, as we are just seeking the best one (not ordering the proposals from best to worst).

Saturday, March 26, 2016

Data Café — A Platform For Creating Biomedical Data Lakes

I presented my GSoC 2015 work at AMIA 2016, as a podium abstract, along with a poster presentation. Given below is my presentation.

AMIA Joint Summits 2016 and MediCurator

I presented two of my papers at AMIA 2016 Joint Summits on Translational Science in San Francisco this week. One of them was a podium abstract, while the other was a poster. This was the first time I presented a poster, and I enjoyed it. You may find the slides below.

Near Duplicate Detection for Medical Imaging Data Warehouse Construction from Kathiravelu Pradeeban

Abstract: Medical data warehouses and image archives are constructed by integrating multiple private and public data sources. Finding almost identical entries is crucial for warehouse construction. Existing solutions tend to be too specific, such as Master Patient Index (MPI) for patient records. Multiple dimensions and attributes including medications, clinical, and pathological data should be considered for a complete duplicate detection and elimination. This paper describes MediCurator, a generic near duplicate detection platform for medical data warehouses. 

CURRENT CONFERENCE AND TRACK: CRI: Clinical and research data collection, curation, preservation, or sharing

I enjoyed the conference. However, San Francisco was unpleasantly dirty, specially the neighbourhood close to the conference location (Parc 55 Hilton).

Friday, March 18, 2016

[GSoC] Ensuring your emails are replied.

While the mentors try their best to answer all the queries from the students, sometimes due to the load of the queries they may tend to miss a few.
To ensure that your query is replied, make sure to,
1. Give enough details in the question for the mentors/developers to reply.
2. Indicate (through action than just words) that you are a strong candidate willing to learn and join the team.
3. Avoid asking questions that are well-covered in locations easy to find (such as FAQ).

While we tend to encourage all the questions, vague questions may give a signal to the mentors that you are not serious about your project, and probably too lazy to find the information yourself. I have received questions such as "Please let me know how to start". While this sounds a valid question, this probably is already covered in the project descriptions and discussed in the mailing lists, irc, etc. So you should rather inform the mentors what you have done, and how you expect them to help.
4. Avoid repeating the same question.
If you decide to contact them through private email, make sure to improve your question (as suggested in step#2). If your question is of the same nature as "Please help me start" with no reasonable input, it may still be ignored. While this sounds like arrogance, not every mentor has enough cycles to devote during the application period. Also a few of the mentors emails from around 50 - 100 students. Not all of the students are going to end up in the GSoC (some do not even apply to the project after querying about that) and a lot of them just do not give any decent attempt by themselves at all.

5. Avoid vague emails.

I received a private email:

"I am a passionate programmer experienced in Python and Java. Love to work with you for GSoC. Expecting your reply soon."

I have no clue what reply the person is expecting. Did they at least read the project ideas page and understood the correct communication channels? If so, they would not even have sent me a private email in the first place. Be clear. Don't send a random email just because you think it is good to introduce yourself personally. If you must, don't end your email with statements such as "Looking forward to hearing from you as soon as possible." There is nothing that deserves a reply, rather than a link to the project ideas page to these kinds of emails.

Wednesday, March 16, 2016

Applying to a project as a "team" in GSoC

This is often a question asked by the students and even mentors whether students can participate in GSoC as a team. The short answer is, No, according to the GSoC policies.
It is ok (and encouraged) that the students communicate and collaborate with the team/organization/community (including the other GSoC students, once they are accepted).

However, it is not ok that two students (presumably classmates or friends) apply together for a single project (or even two separate projects, where they explicitly indicate their interest to collaborate and do a joint team work with another student).

"We have two students that want to work on the same project together."

Here these two students want to work on the same project together. This means, they are considering themselves as a team, rather than considering the organization itself as the team. They are going to help each other and work together from the application period, which is of course unfair to the other students, if permitted. This is not (and never) permitted in GSoC. Students apply individually and code individually, while being part of the bigger community (i.e. the mentoring organization or a project collaborated by the mentoring organization if the mentoring organization is an umbrella organization such as Apache).

Allowing such teams or pairs will ruin the fairness. It has a dangerous assumption made by the students that both of them will successfully get accepted into GSoC and complete their projects on time. Once the GSoC timeline is over, of course, you may encourage the students to continue development as a pair or team as volunteers, as they will not be bound by the rules of GSoC anymore.

Saturday, March 5, 2016

Joining your "company" as a student in GSoC

This is another commonly asked question from the potential GSoC students. Basically they are currently an intern in a company, while still being a student in an accredited university. Good for them that their company is accepted as an open source mentoring organization in GSoC. Now they can choose to apply for the GSoC as a student.

If they are still interns during the GSoC, obviously they will be illegible to participate in the GSoC, as it requires full-time commitment. However, if they just concluded their internship, they may be able to apply depending on when they completed their previous employment/internship with that company with respective to the GSoC timeline. This is something the GSoC administrators should confirm.

Given below is taken from a long reply I sent to a student who asked a similar question in the GSoC public mailing list, regarding the potential to join his employer once more through GSoC as an intern:

Even if the GSoC rules do not prevent you from applying, there are many factors that you and your company should consider. You are currently an intern there, and as a company, you may instead communicate with them and let them know your interest to continue internship with them outside the GSoC frame.

I do not know how the mentors of your company would think. However, if I were a mentor from such a company, I would avoid finding a local intern once more through GSoC - it is like losing a great opportunity - a slot to find more contributors long-term. It also undermines the vision of the GSoC to enable remote collaboration globally among the students and the open source organizations.

Now, as a student, you also should consider the positives and negatives of choosing the same company. It certainly won't look really good in your CV if your GSoC was from the same company, compared to working for an organization remotely (that probably you found from the GSoC). You will also lose the great opportunity of working with other organizations and open source cultures. Moreover, you may also lose the opportunity to have the taste of GSoC (remote meetings, sending patches, working remotely with volunteers and professors, finding further opportunities, helping an open source organization and become a committer).

So if you think the company would be interested in hiring you again, you should rather communicate your interest to the company in extending your internship - as you are quite interested in their projects and technology. As long as GSoC is considered, I would encourage you to find other organizations - there are 180 - all of them are fantastic, and may offer similar projects to your current company's technologies.

Wednesday, March 2, 2016

Google Summer of Code 2016 with Biomedical Informatics, Emory University

Biomedical Informatics, Emory University (Emory BMI) is once more in Google Summer of Code! Please find the project ideas from Emory University, here. We are using a Slack team ( for communications among students and mentors.

My journey with GSoC since 2009 has been pretty interesting, from student to mentor to student to mentor. 2009 (Student/AbiWord) -> 2010 (Student/OMII-UK) -> 2011 (Mentor/AbiWord) -> 2012 (Mentor/AbiWord) -> 2013 (Mentor/AbiWord) -> 2014 (Student/EmoryBMI) -> 2015 (Student/EmoryBMI) -> 2016 (Mentor/EmoryBMI).

I have always enjoyed being a student as well as a mentor. However, I may not be able to become a student once more in the future, due to the new eligibility requirement introduced in the GSoC this year: veterans who have participated in more than twice cannot reapply as a student.

What are the eligibility requirements for participation?

  • You must be at least 18 years of age
  • You must be a full or part-time student at an accredited university (or have been accepted as of April 22, 2016)
  • You must be eligible to work in the country you will reside in during the program
  • You have not already participated as a Student in GSoC more than twice
  • You must reside in a country that is not currently embargoed by the United States. See Program Rules for more information.