Merge of data from different systems using Fuzzy Matching

A fuzzy matching was used to combine the data from the two different sources. A selection of fuzzy string-matching algorithms was tested, for example Jaro-Winkler Distance, Levenshtein distance, Soundex or cosine similarity. The open-source algorithms can be very efficient and there is a selection to choose from depending on the use case.

Continue ReadingMerge of data from different systems using Fuzzy Matching

What Data Scientists should know about Data Security

Every data scientist, data analyst or data engineer rarely works only with open data, but with internal company data that is of great importance for business success. All the more reason why these experts in data storage and analysis should always think about data security and observe certain rules and principles. In addition to the technical security of the data, legal security also plays a role.

Continue ReadingWhat Data Scientists should know about Data Security