Abstract: The emergence of social media enables people to interact with others on the web in ways that are media-rich ("updates" or "posts" can be text, photo, audio, video, etc), time-shifted (correspondence need not happen at once or within a pre-defined time frame), and social in nature. By utilizing social media, citizen science projects can potentially engage many participants to contribute their observations covering a large geographic region and over a long time period. This is an improvement, for example, over traditional biodiversity surveys which typically involve relatively few people in confined regions and periods.
As social media is not designed for scientific data collection and analysis, there is a problem in transferring unstructured information items (e.g. free-form text, unidentified images, etc.) often found in social media to structured data records for scientific tasks. To help bridge this gap, we propose an approach comprised of three steps: (1) Information Extraction, (2) Information Formalization, and (3) Information Reuse. We apply this approach to processing posts and comments from two Facebook interest groups on species observations. Our study demonstrates that with principled methods and proper tools, crowdsourced social media contents such as those from Facebook interest groups can be used for collaborative species identification and occurrence.