{"id":7615,"date":"2023-08-09T13:38:36","date_gmt":"2023-08-09T18:38:36","guid":{"rendered":"https:\/\/www.coginov.com\/how-does-an-experienced-linguist-optimize-the-machine-learning-of-a-system-that-identifies-personal-and-sensitive-data-in-an-organizations-documents\/"},"modified":"2024-01-09T13:14:07","modified_gmt":"2024-01-09T18:14:07","slug":"how-does-an-experienced-linguist-optimize-the-machine-learning-of-a-system-that-identifies-personal-and-sensitive-data-in-an-organizations-documents","status":"publish","type":"post","link":"https:\/\/www.coginov.com\/en\/how-does-an-experienced-linguist-optimize-the-machine-learning-of-a-system-that-identifies-personal-and-sensitive-data-in-an-organizations-documents\/","title":{"rendered":"How does an experienced linguist optimize the machine learning of a system that identifies personal and sensitive data in an organization&#8217;s documents?"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"7615\" class=\"elementor elementor-7615 elementor-7438\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-565259c elementor-section-boxed elementor-section-height-default elementor-section-height-default marketum_parallax_no\" data-id=\"565259c\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-be24d39\" data-id=\"be24d39\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-065803d elementor-widget elementor-widget-marketum_heading\" data-id=\"065803d\" data-element_type=\"widget\" data-widget_type=\"marketum_heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\n        <div class=\"marketum_heading_widget\">\n                                <h2 class=\"marketum_heading\">\n                        How does an experienced linguist optimize the machine learning of a system that identifies personal and sensitive data in an organization's documents?                     <\/h2>\n                            <\/div>\n        \t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9ea6664 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"9ea6664\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e32940f elementor-widget elementor-widget-text-editor\" data-id=\"e32940f\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong><span class=\"EOP SCXW209220973 BCX0\" data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\"><span class=\"EOP SCXW15649404 BCX0\" data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:240,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">By Marie Bourdon, Senior NLP Linguist, Head of Semantic Projects at Coginov <\/span><\/span><\/strong><\/p><p>An experienced linguist can play a crucial role in optimizing the machine learning of a system that identifies personal and sensitive data in an organization&#8217;s documents. Here&#8217;s how their expertise can contribute to the optimization process:<\/p><ol><li><span data-contrast=\"none\"><strong>Data Annotation and Training Set Creation<\/strong>: Linguists can assist in the annotation and labeling of training data for machine learning models. They can identify and tag personal and sensitive data elements within the documents, such as names, addresses, social security numbers, financial information, and medical records. Linguists understand the context and nuances of different languages and can accurately identify such data, ensuring high-quality training sets for the machine learning model.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li><li><span data-contrast=\"none\"><strong>Language-Specific Rules and Patterns<\/strong>: Linguists can develop language-specific rules and patterns to improve the accuracy of the system in identifying personal and sensitive data. They have a deep understanding of grammar, syntax, and linguistic structures, enabling them to identify unique patterns and linguistic cues that indicate the presence of sensitive information. Linguists can create rule-based systems or regular expressions that capture these patterns, enhancing the system&#8217;s performance in different languages.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li><li><span data-contrast=\"auto\"><strong>Addressing Language Ambiguities<\/strong>: Languages often have ambiguities and multiple meanings for certain terms or phrases. Linguists can address these ambiguities by creating context-specific rules or utilizing language models that consider the surrounding text to determine the correct interpretation. This ensures that the system accurately identifies personal and sensitive data in various linguistic contexts, reducing false positives or false negatives.<\/span><\/li><li><span data-contrast=\"auto\"><strong>Fine-tuning and Model Optimization<\/strong>: Linguists can contribute to the fine-tuning and optimization of machine learning models for identifying personal and sensitive data. They can analyze model outputs, review false positives and false negatives, and provide insights into the linguistic factors that may have influenced the model&#8217;s performance. Based on their analysis, linguists can recommend adjustments to the model architecture, feature engineering, or training methodologies to improve accuracy and precision.<\/span><\/li><li><span data-contrast=\"auto\"><strong>Multilingual Support<\/strong>: Organizations dealing with documents in multiple languages require a system that can handle diverse linguistic contexts. Linguists can support the development of multilingual models by providing expertise on language-specific nuances, cultural considerations, and variations in personal and sensitive data across different languages. They can contribute to training data collection, annotation guidelines, and linguistic resources to ensure the system performs effectively across various languages.<\/span><\/li><li><span data-contrast=\"auto\"><strong>Domain-Specific Language Knowledge<\/strong>: Linguists with domain-specific knowledge can enhance the performance of the system by incorporating industry-specific terms, jargon, or abbreviations into the training data and rules. They can identify specific terminology related to the organization&#8217;s industry or domain that may contain personal or sensitive information. By incorporating this expertise, the system can effectively identify such information within the relevant context, improving its accuracy and relevance for the organization&#8217;s specific needs.<\/span><\/li><li><span data-contrast=\"auto\"><strong>Evaluation and Error Analysis<\/strong>: Linguists can assist in evaluating the system&#8217;s performance and conducting error analysis. They can analyze system outputs, identify patterns of misclassifications or false positives, and provide insights into the linguistic factors contributing to these errors. This analysis can guide further iterations of the system, driving continuous improvement and ensuring the system evolves to handle new linguistic challenges.<br \/><br \/><\/span><\/li><\/ol><p>By leveraging their linguistic expertise, an experienced linguist can optimize the machine learning of a system that identifies personal and sensitive data in an organization&#8217;s documents. Their contributions in data annotation, language-specific rules, addressing language ambiguities, model optimization, multilingual support, domain-specific language knowledge, and error analysis lead to a more accurate and effective system, reducing risks associated with the mishandling of personal and sensitive information.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7cd5e20 elementor-widget elementor-widget-marketum_blockquote\" data-id=\"7cd5e20\" data-element_type=\"widget\" data-widget_type=\"marketum_blockquote.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\n        <div class=\"marketum_blockquote_widget\">\n            <div class=\"marketum_blockquote marketum_view_type_3\">\n                                        <div class=\"marketum_blockquote_marker_container\">\n                                                <div class=\"marketum_blockquote_marker marketum_blockquote_marker_1\"><\/div>\n                        <div class=\"marketum_blockquote_marker marketum_blockquote_marker_2\"><\/div>\n                                                <\/div>\n                        COGINOV <p>\n\nWe create innovative solutions. <p>\n\nCCOGINOV is recognized as a world leader in semantic technologies and information management. We are a Canadian software company offering our customers innovative solutions for managing structured and unstructured information. Our head office is based in Montreal.                    <p class=\"marketum_blockquote_author\"><span><\/span><\/p>\n                                <\/div>\n        <\/div>\n        \t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0268c73 elementor-widget elementor-widget-text-editor\" data-id=\"0268c73\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><span class=\"EOP SCXW95493622 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">Coginov&#8217;s Qore platform technology enhances the information value chain, transforming unstructured content into highly contextualized, accessible and valuable information. Coginov&#8217;s solutions enable you to capture, analyze, engage, automate and manage your information assets, with unrivalled accuracy and efficiency.<\/span><\/p><p><span class=\"TextRun SCXW238616509 BCX0\" lang=\"FR-CA\" xml:lang=\"FR-CA\" data-contrast=\"none\"><span class=\"NormalTextRun SCXW238616509 BCX0\">Discover our solutions\u00a0<\/span><a href=\"https:\/\/www.coginov.com\/en\/qoreaudit\/\"><span class=\"NormalTextRun SpellingErrorV2Themed SCXW238616509 BCX0\">QoreAudit<\/span><\/a><span class=\"NormalTextRun SCXW238616509 BCX0\">,\u00a0<\/span><a href=\"https:\/\/www.coginov.com\/en\/qore-ultima-2\/\"><span class=\"NormalTextRun SpellingErrorV2Themed SCXW238616509 BCX0\">QoreUltima<\/span><\/a><span class=\"NormalTextRun SCXW238616509 BCX0\">\u00a0and\u00a0<\/span><a href=\"https:\/\/www.coginov.com\/en\/qoremail\/\"><span class=\"NormalTextRun SpellingErrorV2Themed SCXW238616509 BCX0\">QoreMail<\/span><\/a><\/span><span class=\"EOP SCXW238616509 BCX0\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t\n    <div class=\"xs_social_share_widget xs_share_url after_content \t\tmain_content  wslu-style-1 wslu-share-box-shaped wslu-fill-colored wslu-none wslu-share-horizontal wslu-theme-font-no wslu-main_content\">\n\n\t\t\n        <ul>\n\t\t\t        <\/ul>\n    <\/div> \n","protected":false},"excerpt":{"rendered":"<p>An experienced linguist can play a crucial role in optimizing the machine learning of a system that identifies personal and sensitive data in an organization&#8217;s documents. <\/p>\n","protected":false},"author":11,"featured_media":7478,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"postBodyCss":"","postBodyMargin":[],"postBodyPadding":[],"postBodyBackground":{"backgroundType":"classic","gradient":""},"footnotes":""},"categories":[90],"tags":[128],"class_list":["post-7615","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cartographie-des-donnees","tag-data-mapping"],"_links":{"self":[{"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/posts\/7615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/comments?post=7615"}],"version-history":[{"count":7,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/posts\/7615\/revisions"}],"predecessor-version":[{"id":7772,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/posts\/7615\/revisions\/7772"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/media\/7478"}],"wp:attachment":[{"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/media?parent=7615"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/categories?post=7615"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.coginov.com\/en\/wp-json\/wp\/v2\/tags?post=7615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}